Parallel Fourier ptychographic microscopy reconstruction method based on FPGA

Hongyang Zhao; Wangwei Hui; Qing Ye; Kaicheng Huang; Qiushuai Shi; Jianguo Tian; Wenyuan Zhou

doi:10.1364/OE.478193

1. Introduction

The spatial bandwidth product (SBP) of an optical imaging system is proportional to the product of the field-of-view and the resolution of the optical system [1]. In microscopy imaging, large field-of-view and high resolution are both highly desired in biomedical applications such as pathology, hematology, and anatomy [2]. Traditional microscopes, which are limited by the SBP of optical imaging system, cannot obtain images with large-field and high-resolution at the same time. Fourier ptychographic microscopy (FPM) [2–7] is a computational imaging technique, which does not involve mechanical scanning and effectively overcomes the SBP limitation of imaging systems. After acquiring a sequence of low resolution images, a complex sample image with a large field-of-view and high resolution is computationally synthesized in the Fourier domain using the FPM reconstruction algorithm. Different from holographic imaging methods [8,9], FPM iteratively reconstructs the phase information of the sample through a reconstruction algorithm without relying on interference with a known reference beam. Since FPM does not require mechanical scanning and reference beams, it is very suitable for deployment in standard optical imaging systems. Over the past few years, FPM has shown great potential in various applications, such as 3D-imaging [10,11], digital pathology [12,13] and biomedical medicine [14–18].

However, FPM increases the SBP of the optical system at the cost of increased imaging time, which limits the application of FPM technology. Recently, many studies have been devoted to improving the speed of FPM. Multiplexed coded illumination methods [19–21] and parallel multi-camera methods [22,23] are proposed to improve the throughput of FPM data acquisition and reduce the time of the FPM acquisition process. The FPM reconstruction process consumes a lot of time same as the FPM acquisition process. It also needs accelerate the reconstruction process to improve the imaging speed of FPM. Some of the researchers have focused on designing better initializations and algorithms to reduce the iterative convergence time of FPM [24–27]. The other studies use a subset of the Fourier spectrum to reconstruct high-resolution FPM which takes advantage of the redundancy of the Fourier spectrum [28,29]. The above studies speed up the reconstruction process to a certain extent whilst can’t meet the further acceleration requirements of FPM. There are mainly two constraints that limit the further speed up of FPM reconstruction. One of which is that the FPM algorithm is executed sequentially. Because more than 35% aperture overlap percentage in the Fourier domain is required in order to accurately reconstruct intensity and phase information [30]. It makes the computational data dependencies and lead to the sequential computation. Another constraint is the use of general-purpose processor as the main computing unit for FPM reconstruction process. General-purpose processors are inefficient in FPM reconstruction operations due to the lack of dedicated computing architectures. In order to further improve the reconstruction efficiency of FPM, it is necessary to propose a nonsequenced FPM algorithm and design a dedicated computational architecture to break through these two constraints.

Field programmable gate arrays (FPGAs) have shown a great potential in computing optics. Compared with general-purpose processors, FPGAs have the characteristics of rich resources, flexible programmability, and high degree of parallelism. These characteristics allow FPGAs to be used to deploy specialized computing architectures for specific algorithms. FPGAs have been successfully applied in multi computational optical fields, such as adaptive optics [31], 3D reconstruction [32], optical coherence microscopy [33] and hologram generation [34]. Our previous work also shows a great potential of FPGAs in FPM reconstruction computations [35]. However, since the previous work did not break through the sequential running limit of the FPM reconstruction process, only a small fraction of the resources inside the FPGA are utilized, and it was difficult to further speed up the reconstruct process of FPM. Therefore, it is necessary to propose a parallel FPM reconstruction method and design a FPGA-based dedicated computational architecture.

In this paper, we propose a parallel FPM reconstruction computing method based on FPGA for the FPM reconstruction process. Using this method, we design a dedicated FPGA-based FPM reconstruction computing architecture. The architecture breaks through the computation constraints of the FPM reconstruction algorithm. We investigate a preprocessing algorithm for reconstruction which is called nearest non-overlapping sub-region search algorithm. By grouping sub-regions using this algorithm, the architecture can perform parallel computations on different sub-regions in the Fourier domain. At the same time, we customize a unified high-performance dedicated computing structure for this architecture. In the architecture, one accelerator core is used to compute one sub-region. The accelerator core is designed based on a fully pipeline. As we all know, the scalability of the architecture is very important. The control module and accelerator core are packaged as intellectual property (IP) cores to improve the scalability. According to the selected degree of parallelism, the architecture can flexibly select the number of accelerator cores, so that the architecture can be deployed in FPGAs with different resource scales to meet different application requirements. When the architecture requires more accelerator cores, we just need to mount more accelerator cores on the advanced extensible interface (AXI) bus. When the performance is chosen as the main metric, our computing architecture performs FPM reconstruction computation with a parallelism of 8 (FPGA 300 MHz), which is 74 times faster than the same computation in a CPU (Intel i7 8700 4.6 GHz). In FPM's large-field reconstruction, we reconstruct the large-field image by dividing the original image into small pieces, so we can use multiple architectures to realize parallel reconstruction. We show that by using 4 FPM reconstruct computing architectures with a parallelism of 4 in a FPGA, it is nearly 180 times faster than traditional methods. The paper is constructed as follows: in Section 2, we introduce the implementation of our architecture and its specific deployment in FPGA. In Section 3, we demonstrate the iterative convergence speed and accuracy of our architecture for the FPM recovery process at different degrees of parallelism. Finally, we summarize and discuss our work in Section 4.

2. Detail of the architecture

2.1 Parallel algorithm flow

Figure 1 shows the parallel computing flow of the architecture. In order to show the flow more clearly, we use a degree of parallelism of 4 to introduce. In different applications, the degree of parallelism of the computing flow can be flexibly selected according to the resources in the FPGA. As shown in Fig. 1, the parallel computing flow can be divided into four main steps. In the first step, we use the nearest non-overlapping sub-region search algorithm to generate a sequence of sub-region groups. Each group contains 4 non-adjacent sub-regions at most. Since the sub-regions of each group do not overlap, the architecture can compute multiple sub-regions in parallel in a single run. The second step is loading the data into accelerator cores based on the pre-grouped sequence of groups. As shown in Fig. 1, the data of 4 different sub-regions are loaded into 4 different accelerator cores. The third step is the same as the traditional FPM reconstruction algorithm, and its strategy is similar to the Gerchberg-Saxton method. Each accelerator core performs the IFFT, amplitude replacement and FFT sequentially based on the location of the sub-region. The fourth step is repeating steps 2-3 until all groups are completed. In the end, several iterations of the fourth step will accomplish FPM reconstruction process.

Fig. 1. The parallel computing flow of the architecture. Step 1: generation of parallel iterative sub-region sequence. Step 2: load data into accelerator cores based on parallel iterative sequence. Step3: accelerator cores complete the amplitude replacement. Set 4: Complete all iterations and repeat several times based on steps 2-3.

Download Full Size | PDF

An important part of the parallel computing pipeline is searching for non-overlapping sub-regions and realizing parallel computing. Step 1 in the flow is used to find non-overlapping sub-regions according to the chosen degree of parallelism. Since the location of the sub-region in the Fourier domain is determined by the FPM imaging system, step 1 only needs to be executed one time to generate a sequence of groups for a certain FPM imaging system. Due to the configurable degree of parallelism, the architecture can be deployed in FPGAs with different sizes to meet various application scenarios. It is worth noting that a number of sub-region groups cannot be assigned with sufficient non-overlap sub-regions when the degree of parallelism is too large. This problem will result in a waste of resources. Therefore, the degree of parallelism needs to be reasonably selected to improve the utilization of resources. We will discuss this issue in Section 2.2.

2.2 Implementation of parallel computation

Since the traditional FPM reconstruction algorithm requires data overlap in each iteration, only adjacent sub-regions can be calculated sequentially, which limits the speed of FPM reconstruction. Our architecture overcomes this limitation by using the nearest non-overlapping sub-region search algorithm. The key of this algorithm is searching for the nearest non-overlapping sub-regions. Figure 2 shows the flowchart of the nearest non-overlapping sub-region search algorithm. First we initialize the sub-region center coordinate queue $\textrm{S}(\textrm{k} )$ in the high-resolution spectrum based on the running order of the spiral-out and an empty parallel algorithm coordinate queue $\textrm{P}({\textrm{j},\textrm{i}} )$. The coordinates are used to represent the location of the sub-regions in the Fourier domain. The index k denotes the ${k^{th}}$ sub-region in the sequential computation. The index j denotes the ${j^{th}}$ sub-region group and the index i represents the ${i^{th}}$ sub-region in the group. The second step is to find the minimum non-adjacent sub-region of $\textrm{S}(0 )$ based on the minimum distance D and parallelism N. The next step is to take out the sub-region (including $\textrm{S}(0 )$) found in the previous step from the queue S, and add them to the queue P. The algorithm will repeat the above search steps until the queue $\textrm{S}(\textrm{k} )$ is empty.

Fig. 2. Flowchart of the nearest non-overlapping sub-region search algorithm.

Download Full Size | PDF

The algorithm is non-uniformly distributed and the number of sub-regions within groups is different. So it’s pointless to increase the degree of parallelism blindly. For example, with a constant number of sub-regions (the number of sub-regions in our experiments is 225), the number of sub-region groups is the same when N is 8 and 10, which is 32. In addition, the number of sub-region groups is same as the number of runs in one iteration. So the running time of N = 8 and 10 is the same. However, when N is 10, the architecture requires more accelerator cores which wastes resources. Therefore, it is necessary to choose an appropriate N to keep the balance between the resources and performance.

2.3 Deployment of the architecture

We deploy the architecture on Xilinx KCU1500 which is a data center board with a Kintex UltraScale XCKU1152FLVB2104E FPGA [36]. It communicates with the host computer through peripheral component interface express (PCIe). Figure 3 shows the block diagram of the architecture with a degree of parallelism of 4. As shown in Fig. 3(a), there are two main kinds of components in the architecture: data transmission units (e.g., the PCIe logic module and the memory control module) and data calculation units (e.g., the FPM control module and the accelerator cores). The PCIe logic module is used to handle the external data transmission of the FPGA. There is a 4 GB DDR4 SDRAM on the KCU1500 board, which is used to cache the initial iteration image data and the high-resolution spectrum during the computation. The memory control module conducts the data transfer between the SDRAM and FPGA. The FPM control module is used to control the accelerator core through the FPM running control bus. The accelerator core in Fig. 3(a) is the main computing part of the architecture and Fig. 3(b) shows the internal components of it. In order to show the architecture more clearly, we use the architecture with a parallelism of 4 which is consistent with section 2.1.

Fig. 3. The block diagram of the architecture. (a) the internal components of architecture, and (b) the main computing part of it

Download Full Size | PDF

The FFT/IFFT module is designed to process Fourier and inverse Fourier operations. The speed of the FFT/IFFT module greatly affects the iteration time of the FPM reconstruction process. As shown in Fig. 3(b), we design a full-pipeline structure based on the RADIX−${2^2}$ SDF architecture for the FFT/IFFT module [37,38]. The intensity replace module is designed to take place the amplitude of sub-regions by low-resolution intensity images in the FPM algorithm. The pupil constraint module is designed to add pupil restriction to the calculation data. All data calculation sub-modules in the accelerator core are optimized for data flow operations and are designed based on full pipeline and parallelism. There are two block RAMs used to buffer data in the accelerator core. The data transfer module is used to transmit data between the block memories and the other modules. In order to improve the scalability of the architecture, the data transmission module uses the AXI bus for data communication. The ping-pong technology is also used in the data transmission module to reduce the delay of data transmission and increase the efficiency of data transmission. The operation of the entire accelerator core is controlled by a finite state machine (FSM module). When the accelerator core completes the data operation, it sends the data operation completion signal to the FPM control module.

3. Results

3.1 Simulation

In order to evaluate the performance and accuracy of the method, we use the same simulation data to perform the FPM reconstruction through computing architectures with different degrees of parallelism and compare the reconstruction results with CPU. We use two images (512 × 512 pixels) as the magnitude and phase of the simulated object, respectively. It is chosen to simulate an FPM imaging system with a 15 × 15 LED array. The LED matrix was placed 75 mm below the specimen and the adjacent LEDs is 4 mm apart. The LED incident wavelength is 633 nm. The imaging pixel size of simulation is 3.96 µm and the NA of the simulation is 0.039 (0.638×). A set of 255 low-resolution intensity images with 64 × 64 pixels are generated based on the given parameters. We use this set of images as simulated data for FPM reconstruction.

Figure 4 shows FPM reconstruction results with different methods. Figure 4(a1) and Fig. 4(a2) show the original high-resolution intensity image and phase image. Figure 4(b1) and Fig. 4(b2) show the results reconstructed by the CPU, Fig. 4(c1) and Fig. 4(c2) show the results reconstructed by the architecture with parallelism of 2. Figure 4(d1) and Fig. 4(d2) show the results reconstructed by the architecture with parallelism of 4. Figure 4(e1) and Fig. 4(e2) show the results reconstructed by the architecture with parallelism of 8. We use the normalized root mean squared error (NRMSE) curve to measure the accuracy of individual results. The NRMSE here defined as:

(1)$$\textrm{NRMSE} = {\; }\sqrt {\frac{{\mathop \sum \nolimits_r {{({\widetilde {{\phi_n}}(r )- {\varphi_o}(r )} )}^2}}}{{\mathop \sum \nolimits_r {{({{\varphi_o}(r )} )}^2}}}} $$

where $\widetilde {{\phi _n}}$ represents the reconstructed data of the nth iteration, ${\varphi _o}$ represents the target data, r represents the pixel of the image. Figure 4(f) shows NRMSE curves of each iteration using different methods.

Fig. 4. FPM reconstruction results of different methods. (a1) and (a2) high-resolution intensity image and phase image, (b1) and (b2) the results reconstructed by the CPU, (c1) and (c2) the results reconstructed by the architecture with parallelism of 2, (d1) and (d2) the results reconstructed by the architecture with parallelism of 4, (e1) and (e2) the results reconstructed by the architecture with parallelism of 8, (f) NRMSE curves for each iteration with different methods.

Download Full Size | PDF

For CPU, the time is 2.893s to complete 10 iterations. For architecture parallelism of 2,4 and 8, the time is 0.112s, 0.055s and 0.039 s to complete 10 iterations of FPM reconstruction respectively. The acceleration ratio of the 2, 4 and 8 parallelism architecture to the CPU is x26, x51, and x74, respectively. As we can see from the result, the parallelism of the architecture is doubled, but the speedup is not doubled. There are two reasons for this issue. The one reason is that the nearest non-overlapping sub-region search algorithm cannot reduce the running group with the degree of parallelism grows. Another reason is the limited bandwidth of the off-chip memory, which limits the speed of data transfer and increases iteration time.

3.2 Experiments

In order to show the reconstruction results of the architecture experimentally, we built a FPM imaging system. The parameters are as follows. A 17 × 17 LED array with a central wavelength of 633 nm was used as the light source. The interval between the two adjacent LEDs is 4 mm. The object plane of the imaging system is located 111 mm above the array. Imaging system uses a bi-telecentric lens (DTCM110-26, magnification 0.638×, NA = 0.04). A CMOS industrial camera (GS3-U3-123S6M-C, FLIR) which has a pixel size of 3.45 µm is used to record images under the illumination of LEDs that light up in sequence. After all the image data are collected by the camera, the computer sends the data to the FPGA (KCU1500 data center board) by PCIe, and the FPGA runs the FPM iterative reconstruction algorithm to complete the reconstruction of high-resolution image.

In the process of deploying the architecture, performance and FPGA resources need to be considered together. Our FPGA boards feature four independent on-board memories. In order to ensure the high-speed transmission of data, we deploy four architectures in the FPGA to connect these four independent on-board memories respectively. Considering the resource scale of FPGA, the parallelism of these four architectures is selected to be 4. Using the above experimental instruments, we collect images of the USAF resolution target and use the FPGA and computer to reconstruct a high-resolution intensity image. As shown in Fig. 5 (converting 1,500 × 1,500 raw pixels to 12,000 × 12,000 pixels), the traditional method takes 617s in the iterative process of the FPM reconstruction algorithm, while the FPGA takes 3.427s, which is nearly 180 times faster than the traditional method without losing accuracy. Using the same experimental instruments, we collect images of the thyroid neoplasm tissue. Figure 6 shows the reconstruction results (converting 1,500 × 1,500 raw pixels to 12,000 × 12,000 pixels). The traditional method takes 616s in the iterative process of the FPM reconstruction algorithm, while the FPGA takes 3.429s.

Fig. 5. Experimental results of the resolution target. (a) low-resolution intensity image under illumination of the central LED, (b1) intensity image reconstructed via traditional FPM by CPU, (b2) the line profile from the selected position of the resolution target in Fig. 5(b1), (c1) intensity image reconstructed via our method by FPGA, and (c2) the line profile from the selected position of the resolution target in Fig. 5(c1).

Download Full Size | PDF

Fig. 6. Experimental results of thyroid neoplasm tissue. (a) low-resolution intensity image under illumination of the central LED, (b) the result reconstructed via traditional FPM by CPU, and (c) the result reconstructed via our method by FPGA.

Download Full Size | PDF

4. Discussion

In this paper, we propose a FPGA-based parallel FPM reconstruction method that can computes FPM reconstructions in parallel by pre-grouping using the nearest non-overlapping sub-region search algorithm. Different sub-regions can be computed in parallel, which increases the speed of FPM reconstruction. We also propose the corresponding computing architecture based on the method. Different from general-purpose processors, the FPGA deployed by our computing architecture is a dedicated FPM reconstruction computing unit, which not only has higher computational data transfer efficiency, but also leave out operations such as instruction fetching and decryption. These advantages make the architecture run faster and guarantee the low power consumption of the architecture. Our architecture is a system on chip (SoC) architecture. The control module and accelerator core of the architecture are packaged as IP cores. The purpose of this design makes it easier to expand the computing scale of the architecture and the functions of the architecture in actual deployment. For example, when the architecture needs to increase the image acquisition function, it only need to mount the relevant IP core on the AXI bus of the architecture to complete the function expansion. We believe that our work will provide a new perspective for FPM research and our proposed computing architecture will facilitate the implementation of FPM techniques.

Funding

National Natural Science Foundation of China (31527801).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. W. Lohmann, R. G. Dorsch, D. Mendlovic, C. Ferreira, and Z. Zalevsky, “Space–bandwidth product of optical signals and systems,” J. Opt. Soc. Am. A 13(3), 470 (1996). [CrossRef]

2. P. C. Konda, L. Loetgering, K. C. Zhou, S. Xu, A. R. Harvey, and R. Horstmeyer, “Fourier ptychography: current applications and future promises,” Opt. Express 28(7), 9603 (2020). [CrossRef]

3. X. Ou, R. Horstmeyer, C. Yang, and G. Zheng, “Quantitative phase imaging via Fourier ptychographic microscopy,” Opt. Lett. 38(22), 4845–4848 (2013). [CrossRef]

4. X. Ou, G. Zheng, and C. Yang, “Embedded pupil function recovery for Fourier ptychographic microscopy,” Opt. Express 22(5), 4960–4972 (2014). [CrossRef]

5. G. Zheng, C. Shen, S. Jiang, P. Song, and C. Yang, “Concept, implementations and applications of Fourier ptychography,” Nat. Rev. Phys. 3(3), 207–223 (2021). [CrossRef]

6. A. Pan, Y. Zhang, T. Zhao, Z. Wang, D. Dan, M. Lei, and B. Yao, “System calibration method for Fourier ptychographic microscopy,” J. Biomed. Opt. 22(9), 096005 (2017). [CrossRef]

7. A. Pan, Y. Zhang, K. Wen, M. Zhou, J. Min, M. Lei, and B. Yao, “Subwavelength resolution Fourier ptychography with hemispherical digital condensers,” Opt. Express 26(18), 23119–23131 (2018). [CrossRef]

8. B. Kemper and G. von Bally, “Digital holographic microscopy for live cell applications and technical inspection,” Appl. Opt. 47(4), A52–61 (2008). [CrossRef]

9. B. Rappaz, B. Breton, E. Shaffer, and G. Turcatti, “Digital holographic microscopy: a quantitative label-free microscopy technique for phenotypic screening,” Comb Chem High Throughput Screen 17(1), 80–88 (2014). [CrossRef]

10. S. Dong, R. Horstmeyer, R. Shiradkar, K. Guo, X. Ou, Z. Bian, H. Xin, and G. Zheng, “Aperture-scanning Fourier ptychography for 3D refocusing and super-resolution macroscopic imaging,” Opt. Express 22(11), 13586–13599 (2014). [CrossRef]

11. C. Zuo, J. Sun, J. Li, A. Asundi, and Q. Chen, “Wide-field high-resolution 3D microscopy with Fourier ptychographic diffraction tomography,” Opt. Lasers Eng. 128, 106003 (2020). [CrossRef]

12. R. Horstmeyer, X. Ou, G. Zheng, P. Willems, and C. Yang, “Digital pathology with Fourier ptychography,” Comput. Med. Imaging Graph. 42, 38–43 (2015). [CrossRef]

13. J. Chen, A. Wang, A. Pan, G. Zheng, C. Ma, and B. Yao, “Rapid full-color Fourier ptychographic microscopy via spatially filtered color transfer,” Photonics Res. 10(10), 2410–2421 (2022). [CrossRef]

14. A. J. Williams, J. Chung, X. Ou, G. Zheng, S. Rawal, Z. Ao, R. Datar, C. Yang, and R. Corte, “Fourier ptychographic microscopy for filtration-based circulating tumor cell enumeration and analysis,” J. Biomed. Opt. 19(6), 066007 (2014). [CrossRef]

15. J. Chung, X. Ou, R. P. Kulkarni, and C. Yang, “Counting White Blood Cells from a Blood Smear Using Fourier Ptychographic Microscopy,” PLoS One 10(7), e0133489 (2015). [CrossRef]

16. J. Kim, B. M. Henley, C. H. Kim, H. A. Lester, and C. Yang, “Incubator embedded cell culture imaging system (EmSight) based on Fourier ptychographic microscopy,” Biomed. Opt. Express 7(8), 3097–3110 (2016). [CrossRef]

17. X. Wang, T. Xu, J. Zhang, S. Chen, and Y. Zhang, “SO-YOLO Based WBC Detection With Fourier Ptychographic Microscopy,” IEEE Access 6, 51566–51576 (2018). [CrossRef]

18. A. Pan, C. Zuo, and B. Yao, “High-resolution and large field-of-view Fourier ptychographic microscopy and its applications in biomedicine,” Rep. Prog. Phys. 83(9), 096101 (2020). [CrossRef]

19. K. Huang, W. Hui, Q. Ye, S. Jin, H. Zhao, Q. Shi, J. Tian, and W. Zhou, “Compressed-sampling-based Fourier ptychographic microscopy,” Opt. Commun. 452, 18–24 (2019). [CrossRef]

20. L. Tian, X. Li, K. Ramchandran, and L. Waller, “Multiplexed coded illumination for Fourier Ptychography with an LED array microscope,” Biomed. Opt. Express 5(7), 2376–2389 (2014). [CrossRef]

21. L. Tian, Z. Liu, L.-H. Yeh, M. Chen, J. Zhong, and L. Waller, “Computational illumination for high-speed in vitro Fourier ptychographic microscopy,” Optica 2(10), 904–911 (2015). [CrossRef]

22. A. C. S. Chan, J. Kim, A. Pan, H. Xu, D. Nojima, C. Hale, S. Wang, and C. Yang, “Parallel Fourier ptychographic microscopy for high-throughput screening with 96 cameras (96 Eyes),” Sci. Rep. 9(1), 11114 (2019). [CrossRef]

23. P. C. Konda, J. M. Taylor, and A. R. Harvey, “Multi-aperture Fourier ptychographic microscopy, theory and validation,” Opt. Lasers Eng. 138, 106410 (2021). [CrossRef]

24. L. Valzania, J. Dong, and S. Gigan, “Accelerating ptychographic reconstructions using spectral initializations,” Opt. Lett. 46(6), 1357 (2021). [CrossRef]

25. C. Zuo, J. Sun, and Q. Chen, “Adaptive step-size strategy for noise-robust Fourier ptychographic microscopy,” Opt. Express 24(18), 20724–20744 (2016). [CrossRef]

26. J. Z. Jizhou Zhang, T. X. Tingfa Xu, X. W. Xing Wang, S. C. Sining Chen, and G. N. Guoqiang Ni, “Fast gradational reconstruction for Fourier ptychographic microscopy,” Chin. Opt. Lett. 15(11), 111702 (2017). [CrossRef]

27. J. Liu, Y. Li, W. Wang, J. Tan, and C. Liu, “Accelerated and high-quality Fourier ptychographic method using a double truncated Wirtinger criteria,” Opt. Express 26(20), 26556–26565 (2018). [CrossRef]

28. Y. Zhang, W. Jiang, L. Tian, L. Waller, and Q. Dai, “Self-learning based Fourier ptychographic microscopy,” Opt. Express 23(14), 18471 (2015). [CrossRef]

29. H. Mao, X. Wu, J. Zhao, G. Cui, and J. Hu, “An efficient Fourier ptychographic microscopy method based on optimized pattern of LED angle illumination,” Micron 138, 102920 (2020). [CrossRef]

30. J. Sun, Q. Chen, Y. Zhang, and C. Zuo, “Sampling criteria for Fourier ptychographic microscopy in object space and frequency space,” Opt. Express 24(14), 15765–15781 (2016). [CrossRef]

31. Y.-C. Wu, J.-C. Chang, and C.-Y. Chang, “Adaptive optics for dynamic aberration compensation using parallel model-based controllers based on a field programmable gate array,” Opt. Express 29(14), 21129–21142 (2021). [CrossRef]

32. G. Zhan, H. Tang, K. Zhong, Z. Li, Y. Shi, and C. Wang, “High-speed FPGA-based phase measuring profilometry architecture,” Opt. Express 25(9), 10553 (2017). [CrossRef]

33. P. Meemon, Y. Lenaphet, and J. Widjaja, “Spectral fusing Gabor domain optical coherence microscopy based on FPGA processing,” Appl. Opt. 60(7), 2069–2076 (2021). [CrossRef]

34. D. Dong, Y. Wang, A. Kadis, and T. D. Wilkinson, “Cost-optimized heterogeneous FPGA architecture for non-iterative hologram generation,” Appl. Opt. 59(25), 7540–7546 (2020). [CrossRef]

35. H. Zhao, W. Hui, Q. Ye, K. Huang, Q. Shi, J. Tian, and W. Zhou, “High-performance heterogeneous FPGA data-flow architecture for Fourier ptychographic microscopy,” Appl. Opt. 61(6), 1420 (2022). [CrossRef]

36. “Xilinx Kintex UltraScale FPGA KCU1500 Acceleration Development Kit,” https://www.xilinx.com/products/boards-and-kits/dk-u1-kcu1500-g.html.

37. S. He and M. Torkelson, “A new approach to pipeline FFT processor,” in Proceedings of International Conference on Parallel Processing (1996), pp. 766–770.

38. Wold and Despain, “Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementations,” IEEE Trans. Comput. C-33(5), 414–426 (1984). [CrossRef]

Parallel Fourier ptychographic microscopy reconstruction method based on FPGA

Abstract

1. Introduction

2. Detail of the architecture

2.1 Parallel algorithm flow

2.2 Implementation of parallel computation

2.3 Deployment of the architecture

3. Results

3.1 Simulation

3.2 Experiments

4. Discussion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Equations (1)

Optics Express