GPU accelerated real-time confocal fluorescence lifetime imaging microscopy (FLIM) based on the analog mean-delay (AMD) method

Byungyeon Kim; Byungjun Park; Seungrag Lee; Youngjae Won

doi:10.1364/BOE.7.005055

1. Introduction

Fluorescence lifetime imaging microscopy (FLIM) is a powerful functional imaging technique that has been applied in many bio-medical research fields to visualize localized environmental conditions, such as pH, ion concentration, refractive index, and the occurrence of fluorescence resonance energy transfer (FRET) [1–4]. It can effectively monitor bio-chemical information of interesting areas in a cell or tissue because the fluorescence lifetime is not sensitive to fluorescence intensity but only the energy transfer between the fluorophores and their environment. Various fluorescence dyes have been used for in vivo and ex vivo studies of local biochemical conditions to identify abnormal cells or tissues. Auto-fluorescence naturally emitted by endogenous fluorophores such as nicotinamide adenine dinucleotide, flavin, and collagen in living tissues can also be applied for detection of abnormal tissues [5]. These cancer-related auto-fluorescent molecules may have altered fluorescence lifetimes owing to local perturbations in oxygen concentration and pH. The fluorescence lifetime imaging technique has great potential to investigate unknown significant biomechanisms by functional analysis and it can be further realized as an important medical diagnostic tool for cancer detection.

Time-correlated single-photon counting (TCSPC) is a popularly used fluorescence lifetime measurement technique because it can measure the fluorescence lifetime accurately and be used to various bio applications such as fluorescence resonant energy transfer (FRET), fluorescence correlation spectroscopy (FCS), and single molecule detection [6–8]. However, the slow photon detection rate of the TCSPC method is a critical disadvantage for realization of real-time confocal FLIM based on a high-speed 2-axis point scanning setup. A photon-counting rate of 1 MHz can be obtainable even when a pulsed laser of 80 MHz repetition rate is employed owing to the requirements of single photon arrival conditions in the TCSPC method [6]. This means ideally that more than 1 s is needed to acquire more than 100 photons per single pixel for a 100 × 100 pixel image. Even though multi-channel TCSPC methods were developed to obtain higher photon detection rate up to a few tens of MHz, it is still difficult to apply the technique for observing dynamic phenomena within 1 s.

We have previously proposed the analog mean-delay (AMD) method for high-speed confocal fluorescence lifetime imaging with a fast measurement speed and high lifetime accuracy [9–12]. In theory, a fluorescence lifetime is calculated exactly by taking away the mean delay of an instrument response function (IRF) without a sample from the mean delay of a fluorescence signal. This relation is expressed as

τ = < T_{e} > - < T_{e}^{0} > \equiv (\frac{\int^{​} t \cdot i_{e} (t) d t}{\int^{​} i_{e} (t) d t}) - (\frac{\int^{​} t \cdot i_{i r f} (t) d t}{\int^{​} i_{i r f} (t) d t})

where i_e(t) is a measured fluorescence signal and i_irf(t) is the impulse response function (IRF) of a system. < T_e > and < T_e⁰ > are defined as the mean-delay of the fluorescence signal and the mean-delay of the IRF, respectively. The fluorescence lifetime can be extracted from the analog fluorescence pulse signal. Here, we assumed that the intensity of fluorescence light follows an exponential decaying form of i(t) = Ae^-t/τ, where A is a constant. The major advantage of this method is that the mean delay effect caused by a slow measurement system can be completely removed by measuring < T_e⁰ > from the IRF of a measurement system. The measurement speed can be very fast compared to the conventional TCSPC method because the AMD method can detect multiple photons simultaneously for a single excitation pulse. More accurate fluorescence lifetimes can be determined with more photons such that an accurate fluorescence lifetime image can be acquired quickly by the AMD method. However, additional technical requirements still exist before real-time confocal fluorescence lifetime imaging can be utilized as a real-time observing tool for dynamic biological reactions and medical diagnosis.

Recently, parallel data processing units such as general purpose computing on graphics processing units (GPGPU) and parallel computing with multiple CPU cores (CPU-OpenMP) have been developed for fast data processing. Especially, GPGPU has been usefully applied for real-time optical imaging schemes that require heavy computation power such as optical frequency domain imaging (OFDI) [13–15]. In the AMD method, extraction of the fluorescence lifetime for each pixel is independent of one another and this maximizes parallel processing efficiency. Therefore, GPU-acceleration is very appropriate for realization of real-time confocal fluorescence lifetime imaging by the AMD method. In the applications of GPGPU, data transmission (cudaMemcpy) from host memory to GPU memory is required before parallel data processing and this can lead to serious bottlenecks for realization of real-time confocal fluorescence lifetime imaging for large amounts of processing data. In the AMD method, short pulses of the fluorescence and IRF signals were spread in the time domain by a Gaussian Low Pass Filter (GLPF) to reduce data flow rate down to 100 MB/s, because fluorescence lifetime can be extracted exactly regardless of the IRF shape in the AMD method [9–12]. In our study, the data size of an AMD based confocal fluorescence lifetime image with 200 × 200 pixels was only 6 MB. We verified that transmission of 6 MB of data was performed within 1 ms from CPU memory to GPU memory. After transmitting the data to GPU memory, heavy data processing such as averaging, sub-pixel interpolation, and mean time extraction was handled in a short time by multiple parallel processors.

In this paper, we demonstrate GPU accelerated real-time confocal FLIM by the AMD method. First, the hardware setup and software flow of our GPU accelerated real-time AMD-FLIM are described. Then, processing time and lifetime measurement accuracy were analyzed using ideal fluorescence and IRF data sets created by Monte-Carlo Simulation (MCS). The ideal data set in host memory was transmitted to GPU memory and data averaging, sub-pixel interpolation, and mean-time extraction were performed by parallel GPU processing. The GPU processing times for the extraction of fluorescence lifetime images were 17.5 ms and 319.3 ms for images of 200 × 200 pixels and 1,000 × 1,000 pixels, respectively. Finally, the total frame rate of our system was verified by observing maize vascular tissue.

2. Methods

2.1 Hardware configuration of GPU accelerated real-time confocal AMD-FLIM

The schematic diagram of the GPU accelerated real-time confocal AMD-FLIM is described in Fig. 1. The optical measurement part of our system is almost identical to that used in our previous research [10]. In this study, a diode pulse laser with a center wavelength of 479 nm was used, with a pulse repetition rate, FWHM, and average power of 8 MHz, 30 ps, and 0.8 mW, respectively. The collimated pulse laser beam was transmitted to a dichroic mirror (MD499, Thorlabs) after passing through an optical short pass filter (SPF, FF01-498/SP-25, Semrock). Then, the beam passed through an x-y scanner (Cambridge technologies), consisting of a 4 kHz resonant x-scanner and a 500 Hz non-resonant y-scanner, a scan lens, tube lens, mirror, and objective lens (40x, UPlanFL, Olympus) to illuminate the sample. The actual power delivered to the sample was about 0.2 mW. The fluorescence signal from the sample was transmitted through the dichroic mirror and then further filtered by an optical long pass filter (LPF, FFLH0500, Thorlabs) with a cut-on wavelength of 500 nm. A multimode fiber (MMF) with a core diameter of 50 µm was used as a pin-hole in a confocal system and was connected to a photomultiplier tube (PMT, Hamamatsu, H10720-01). The electric pulse signal from the PMT was acquired by a digitizer with a bandwidth of 125 MB/s (National Instruments, PCI-5114) after passing through a lab-developed 10th Gaussian low-pass filter (GLPF) and electric amplifier (Minicircuits, TB-409). The cutoff frequency of the GLPF was about 50 MHz at −60 dB. The sampling rate and duration of the digitizer were 120 MS/s and 8.33 ns, respectively. The data processing system consisted of the digitizer, a data acquisition (DAQ) board (National Instruments, PCIE-6353), and a GPU (NVIDIA GeForce GTX TITAN Black) implemented in a HP Z420 workstation. The software was developed in the Microsoft Visual Studio 2013 environment with MFC, NI-DAQ API, NI-Scope API, and Open CV. The GPUs were programmed using NVIDIA’s CUDA (Common Unified Device Architecture) technology [16].

Fig. 1 Schematic diagram of the GPU accelerated real-time confocal AMD-FLIM.

Download Full Size | PDF

In this experiment, the fluorescence signal was separately measured after measurement of the IRF signal. To compensate zero positions between both signals in the time-domain, we applied an electrical referencing technique proposed in previous research [11]. A triggering signal from a laser source was simultaneously measured after passing through another GLPF by an extra channel (ch2) of the digitizer, while the fluorescence or IRF signal was measured by the other channel (ch1) of the same digitizer. First, the IRF signal and its electrical sync signal (a triggering signal from a laser source) were obtained by using a mirror as a sample. Then, GPU-accelerated real-time fluorescence lifetime imaging was performed by measurement of the fluorescence and electrical sync signal. Here, previously obtained mean times of the IRF signal and electrical sync signal were used to calculate the fluorescence lifetime.

To facilitate clear imaging, perfect synchronization of the x-y scanner, pulse laser, and digitizer, was implemented using a DAQ board with a digital trigger input and analog output ports, and a sync block composed of a D-FF (D-Flip Flop) digital circuit. First, the signal to operate the x-scan mirror was generated by the DAQ board, and then, sync signals of the x-scan driver and trigger signals of the laser pulses were injected into the sync block to generate a trigger signal as illustrated in Fig. 1. The sync block trigger signal was generated by the AND operation of the x-scan sync high level and the rising edge of the following first laser pulse trigger. The trigger signal generated by the sync block was used as a trigger for data acquisition of the digitizer.

2.2 System sequence for GPU accelerated real-time confocal AMD-FLIM

The system sequence flowchart for data processing is presented in Fig. 2. Three major jobs are executed on three threads: the y-scan thread, data sampling thread, and CUDA processing thread. The 4 kHz trigger signal was generated by the sync block at the instant that the x-scan 4 kHz resonant mirror is operated. The trigger signal was used for data acquisition by the digitizer and to control the y-scan mirror. The data sampling thread acquired fluorescence and electrical sync signal data for every half period of x-scan from two channels of the digitizer. This data was then copied from the on-board memory of the digitizer to the rolling buffer of the host memory in the PC via the PCI interface. Here, the fluorescence data was copied to odd-numbered rolling buffers and the electrical sync data was copied to even-numbered rolling buffers. The data sampling thread also triggers the CUDA processing thread, such that the CUDA processing thread is allocated the n^th and n + 1th buffer data for parallel algorithm processing.

Fig. 2 System sequence flowchart for the GPU-accelerated real-time confocal AMD-FLIM.

Download Full Size | PDF

2.3 Data processing flow in GPU core for real-time confocal FLIM

Figure 3 shows the processing flow in a GPU core. After finishing data copy, each GPU core is in charge of a single pixel, because the algorithm processing of individual pixels are independent of one another. Therefore, fluorescence lifetimes for all pixels can be simultaneously extracted.

Fig. 3 Process flow in GPU cores.

Download Full Size | PDF

Since the scanning frequency was 4 kHz and the repetition rate of the laser pulses was 8 MHz, there were 1,000 fluorescence pulses in a single horizontal line of the image. Adjacent pulse signals could be horizontally and vertically averaged in order to increase signal-to-noise ratio (SNR). The horizontal pulse averaging caused a decrease of the number of horizontal pixels and vertical pulse averaging caused a frame rate decrease. After data averaging, sub-pixel interpolation was performed by a spline interpolation method to fill 200 new data points between every neighboring two data points. Thus, the sampling interval of data becomes 0.04 ns. Then, the mean time of fluorescence pulses and electrical sync pulses for all pixels was calculated simultaneously with an 80 ns integration window size and 4 iterations. The integration window size of 80 ns was the optimum to obtain more than 99.9% of lifetime accuracy for less than 7 ns of lifetime, and 4 iterations found to be the optimum number for minimum lifetime error [12]. Finally, the fluorescence lifetime image was obtained by subtraction among the mean values of the fluorescence signals for each pixel, the IRF signal, and their electrical sync signals [11].

3. Results and discussion

3.1 Monte Carlo Simulation (MCS) to evaluate the algorithm and GPU processing time

In order to confirm the GPU processing part of our system, an ideal fluorescence and IRF data set was generated by Monte Carlo Simulation (MCS). In this simulation, the IRF was assumed an ideal Gaussian function with a FWHM of about 25 ns. In the first part of the MCS, fluorescence photons were generated via an ideal exponentially decaying probability function. Exponentially decaying fluorescence signals with certain average fluorescence photons were generated with a random number generator. Here, the initial data interval of a generated fluorescence signal was set to 1 ps. Second, convolution between the IRF and a generated fluorescence signal with discrete photons was performed to mimic an electrical waveform converted from an optical fluorescence signal by a detector and Gaussian low pass filter. Third, a discrete number of data points with a constant time interval of 8.33 ns were selected from the electrical waveform to simulate a sampled data set by a digitizer with a sampling frequency of 120 MHz. A single fluorescence and IRF pulse signal composed 15 discrete data points. In order to match the experimental conditions, a data set containing 6,000,000 data points was generated; split into 3,000,000 data points for the fluorescence signals acquired from channel 1 of the digitizer and the remaining for the IRF signals acquired from channel 2 of the digitizer. The 3,000,000 data points for the fluorescence signals were composed by 15 discrete data points for a single fluorescence pulse, 1,000 horizontal fluorescence pulses, and 200 vertical lines. For the IRF data set, 200,000 ideal IRF pulses were equally used. From the generated 1D data set by the MCS in PC host memory, the fluorescence intensity and lifetime images were extracted by the GPU data processing flow described in section 2.3. In this simulation, the electrical referencing technique was not considered since the zero position of fluorescence and IRF signal in time-domain perfectly matched each other. Figure 4 represents 200 × 200 pixels of the fluorescence intensity and lifetime images. Here, five horizontal pulses were averaged. The MCS data set was generated with various fluorescence intensities and lifetimes for various evaluations. Different fluorescence lifetimes of 1, 1.5, 2, 2.5, and 3 ns were used for the exponentially decaying probability function to generate discrete fluorescence photon signals in groups of 40 lines. In each fluorescence lifetime area, the random number generator was set to distribute average fluorescence photons of about 200, 400, 600, 800, and 1,000 in a single pixel every 8 lines. The fluorescence lifetime image was extracted by the GPU from the data set generated by the MCS. The algorithm was verified by calculating a figure of merit defined by $F = \sqrt{N_{a v g}} σ_{τ} / τ_{a v g}$ , where σ_τ is the standard deviation in repeated measurements of fluorescence lifetime, N_avg is the average number of photons used, and τ_avg is the average fluorescence lifetime [9,10,12]. In all lifetime measurements, we have F ≥ 1; the closer the value to unity, the better the performance of a FLIM method. The extracted F-values were about ‘1’ for every interested area from ①-1 to ⑤-5. Thus, our algorithm demonstrated accurate fluorescence lifetime extraction by the AMD method because only shot noise effects without any system noise were considered in this simulation. Table 1 represents simulation results for lifetimes of 1 ns and 3 ns.

Fig. 4 Fluorescence intensity and lifetime image extracted from ideally generated fluorescence and IRF data set.

Download Full Size | PDF

Table 1. Simulation results for lifetime of 1 ns and 3 ns.

View Table | View all tables in this article

3.2 Analysis of GPU processing time

The NVIDIA Visual Profiler was used as described in Fig. 5 to evaluate GPU processing time. For AMD based confocal fluorescence lifetime imaging with 200 × 200 pixels, 0.98 ms was taken to copy the fluorescence and IRF data from CPU memory to GPU memory. The time for parallel data processing to extract a lifetime image was 15.109 ms. The extracted data by the GPU was moved to CPU memory within 0.026 ms. In total, the GPU processing time was 17.519 ms, such that more than 50 frames per second (fps) could be achieved for confocal AMD-FLIM images of 200 × 200 pixels.

Fig. 5 NVIDIA visual profiler to evaluate GPU processing time. (The results for 200 × 200 fluorescence lifetime imaging with the AMD method).

Download Full Size | PDF

To compare the processing performance of GPU acceleration, the AMD based confocal fluorescence lifetime image was also extracted from the ideally generated data set by a single core CPU processor (Xeon(R) 3.6 GHz) using the same algorithm. Figure 6 shows the Visual Studio profiler evaluation of CPU processing time. In identical conditions, a CPU processing time of 3,350 ms was needed to process a 200 × 200 image. Thus, the GPU acceleration scheme was 191 times faster than single core CPU processing.

Fig. 6 Visual Studio profiler evaluation of CPU processing time. (The results show 200 × 200 fluorescence lifetime imaging with the AMD method.).

Download Full Size | PDF

We further evaluated GPU and CPU processing time for various image sizes of 100 × 100, 200 × 200, 500 × 500, 800 × 800, and 1,000 × 1,000 pixels, as described in Table 1. The CPU and GPU processing times shown in Table 2 were obtained by 3 test realizations. In all cases, the GPU-accelerated data processing was considerably faster than single core CPU processing. The difference between GPU and CPU processing time increased as the number of pixels increased. The GPU data processing was 149 and 262 times faster than that of the CPU for 100☓100 pixels and 1,000☓1,000 pixels, respectively. For 100, 200, 500, 800, and 1,000 vertical lines, physical scanning times for a single frame were 25 ms, 50 ms, 125 ms, 200 ms, and 250 ms, respectively, when a 4 kHz resonant scanner was utilized for horizontal scan. Even though the processing time for 1,000☓1,000 pixels was slightly slower than the physical scanning time, it can be sufficiently improved by adding extra GPU processing units such that GPU-accelerated data processing would not be the limiting factor for realizing real-time imaging.

Table 2. Processing time of CPU (Single core) and GPU.

View Table | View all tables in this article

3.3 GPU accelerated real-time confocal fluorescence lifetime imaging of maize vascular tissue

In order to demonstrate our technique, we measured vascular bundles of maize vascular tissue (Zea Mays) as shown in Fig. 7. Figure 7(a) shows the fluorescence intensity image extracted from the intensity summation of analog fluorescence pulse signals and Fig. 7(b) shows the AMD based fluorescence lifetime image (See Supplementary Material; Visualization 1). Here, the pixel size was 200 × 200 with 5 horizontal fluorescence pulses averaged and 200 subpixel interpolations between two adjacent data performed via spline interpolation. The integration window size for calculation of mean time was 80 ns and 4 iteration processes were performed. Before measurement of the fluorescence and its electrical sync data, the instrument response function (IRF) and its electrical sync data were measured and their mean times used to calculate the fluorescence lifetime. The field of view (FOV) was 176 µm ☓ 132 µm and an excitation power of 0.2 mW was injected into the sample. In this experiment, we achieved a frame rate ~13 fps for real-time confocal FLIM with 200 × 200 pixels. Here, the data fetching time was the limiting factor in our experiment because the data in the digitizer memory was fetched to PC memory via a low speed PCI interface of 133 MB/s bandwidth. However, this could be easily resolved by changing the digitizer to one compatible with the PCI-express interface whose bandwidth is more than 500 MB/s. Photobleaching and intensity variation occurred in the fluorescence intensity image, while the fluorescence lifetime information was uniformly distributed as shown in Fig. 7. Comparison to the fluorescence intensity image demonstrates that our technique can clearly distinguish the xylem vessel, xylem tracheid, and phloem sieve tube in the maize vascular tissue [17]. Using the same experimental conditions as for the maize vascular tissue, we measured Alexa Fluor 488 whose fluorescence lifetime was reported to be 4 ns, as described in Fig. 8 (See Supplementary Material; Visualization 2). The sample was a liquid drop of Alexa Fluor 488 diluted by phosphate buffered saline (PBS) on a glass slide. By comparison with the fluorescence intensity image, the fluorescence lifetime was found to be uniformly distributed with an averaged value of 4 ns.

Fig. 7 GPU accelerated real-time fluorescence lifetime imaging of maize vascular tissue. (200☓200 pixels) (a) fluorescence intensity image (b) fluorescence lifetime image. (see Visualization 1).

Download Full Size | PDF

Fig. 8 GPU accelerated real-time fluorescence lifetime imaging of Alexa Fluor 488. (200☓200 pixels) (a) fluorescence intensity image (b) fluorescence lifetime image. (see Visualization 2)

Download Full Size | PDF

4. Conclusion

In this study, we have demonstrated GPU accelerated real-time confocal FLIM based on the AMD method. Clear imaging was achieved by perfect synchronization between the x-y scanner, pulse laser, and digitizer by using a DAQ board and a D-FF digital circuit sync block. Data from the digitizer was copied to the rolling buffer of the host memory such that real-time data processing was possible without any data loss. The GPU processing algorithm was verified by an ideal fluorescence and IRF data set with various fluorescence lifetimes and photon numbers. We also compared GPU and CPU processing times with the ideal data set. For the confocal AMD-FLIM with 4 kHz resonant scanner, GPU processing time was faster than physical scanning time up to image sizes of 800 × 800 pixels, and was more than 149 times faster than single core CPU processing times. The addition of extra GPU units would allow processing time to be reduced such that GPU accelerated data processing is not a limiting factor for realizing real-time confocal AMD-FLIM in higher resolution images. We demonstrated the total frame rate of our system by observing maize vascular tissue. We achieved a frame rate of about 13 fps for an image of 200 × 200 pixels. The imaging time was degraded by slow data fetching time from the digitizer to host memory, but this can be easily resolved by utilizing a digitizer compatible with the PCI-express interface. This system can be utilized for observing dynamic biological reactions, medical diagnosis, and real-time industrial inspection.

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2015R1C1A1A02036475).

Acknowledgment

This research was helped by the Ministry of Trade, Industry & Energy (MOTIE, Korea) under Industrial Technology Innovation Program (No. 10063408).

References and links

1. H. C. Gerristen, A. Draaijer, D. J. van den Heuvel, and A. V. Agronskaia, “Fluorescence lifetime imaging in scanning microscopy,” in Handbook of Biological Confocal Microscopy, 3rd Ed., J. B. Pawley, ed. (Springer, 2006).

2. J. Neefjes and N. P. Dantuma, “Fluorescent probes for proteolysis: tools for drug discovery,” Nat. Rev. Drug Discov. 3(1), 58–69 (2004). [CrossRef] [PubMed]

3. A. Esposito, T. Tiffert, J. M. A. Mauritz, S. Schlachter, L. H. Bannister, C. F. Kaminski, and V. L. Lew, “FRET imaging of hemoglobin concentration in Plasmodium falciparum-infected red cells,” PLoS One 3(11), e3780 (2008). [CrossRef] [PubMed]

4. H. Wallrabe and A. Periasamy, “Imaging protein molecules using FRET and FLIM microscopy,” Curr. Opin. Biotechnol. 16(1), 19–27 (2005). [CrossRef] [PubMed]

5. J. McGinty, N. P. Galletly, C. Dunsby, I. Munro, D. S. Elson, J. Requejo-Isidro, P. Cohen, R. Ahmad, A. Forsyth, A. V. Thillainayagam, M. A. Neil, P. M. French, and G. W. Stamp, “Wide-field fluorescence lifetime imaging of cancer,” Biomed. Opt. Express 1(2), 627–640 (2010). [CrossRef] [PubMed]

6. W. Becker, A. Bergmann, M. A. Hink, K. König, K. Benndorf, and C. Biskup, “Fluorescence lifetime imaging by time-correlated single-photon counting,” Microsc. Res. Tech. 63(1), 58–66 (2004). [CrossRef] [PubMed]

7. T. H. Chia, A. Williamson, D. D. Spencer, and M. J. Levene, “Multiphoton fluorescence lifetime imaging of intrinsic fluorescence in human and rat brain tissue reveals spatially distinct NADH binding,” Opt. Express 16(6), 4237–4249 (2008). [CrossRef] [PubMed]

8. W. Becker, A. Bergmann, H. Wabnitz, D. Grosenick, and A. Liebert, “High count rate multichannel TCSPC for optical tomography,” Proc. SPIE 4431, 249–254 (2001). [CrossRef]

9. S. Moon, Y. Won, and D. Y. Kim, “Analog mean-delay method for high-speed fluorescence lifetime measurement,” Opt. Express 17(4), 2834–2849 (2009). [CrossRef] [PubMed]

10. Y. Won, S. Moon, W. Yang, D. Kim, W. T. Han, and D. Y. Kim, “High-speed confocal fluorescence lifetime imaging microscopy (FLIM) with the analog mean delay (AMD) method,” Opt. Express 19(4), 3396–3405 (2011). [CrossRef] [PubMed]

11. Y. J. Won, S. Moon, W. T. Han, and D. Y. Kim, “Referencing techniques for the analog mean-delay method in fluorescence lifetime imaging,” J. Opt. Soc. Am. A 27(11), 2402–2410 (2010). [CrossRef] [PubMed]

12. Y. J. Won, W. T. Han, and D. Y. Kim, “Precision and accuracy of the analog mean-delay method for high-speed fluorescence lifetime measurement,” J. Opt. Soc. Am. A 28(10), 2026–2032 (2011). [CrossRef] [PubMed]

13. K. Zhang and J. U. Kang, “Graphics processing unit-based ultrahigh speed real-time Fourier domain optical coherence tomography,” IEEE J. Sel. Top. Quantum Electron. 18(4), 1270–1279 (2012). [CrossRef]

14. K. K. C. Lee, A. Mariampillai, J. X. Z. Yu, D. W. Cadotte, B. C. Wilson, B. A. Standish, and V. X. D. Yang, “Real-time speckle variance swept-source optical coherence tomography using a graphics processing unit,” Biomed. Opt. Express 3(7), 1557–1564 (2012). [CrossRef] [PubMed]

15. J. Xu, K. Wong, Y. Jian, and M. V. Sarunic, “Real-time acquisition and display of flow contrast using speckle variance optical coherence tomography in a graphics processing unit,” J. Biomed. Opt. 19(2), 026001 (2014). [CrossRef] [PubMed]

16. NVIDIA, “NVIDIA CUDA C Programming Guide Version 7.0,” 2015.

17. J. D. Mauseth, Plant Anatomy (The Blackburn Press, 2008)

Name	Description
Visualization 1: MP4 (9532 KB)	AMD based real-time fluorescence lifetime imaging of Maize Vascular Tissue
Visualization 2: MP4 (9786 KB)	AMD based real-time fluorescence lifetime imaging of Alexa Fluor 488

Case	①-1	①-2	①-3	①-4	①-5	···	⑤-1	⑤-2	⑤-3	⑤-4	⑤-5
Average fluorescence lifetime (τ_avg, ns)	0.968	1.036	0.984	0.985	0.999	···	2.992	2.974	3.018	2.978	2.994
Average number of photons (N_avg)	202	408	590	814	1,010	···	201	401	600	818	992
Standard deviation (σ_τ, ns)	0.079	0.059	0.047	0.039	0.033	···	0.234	0.153	0.122	0.101	0.096
Figure of merit (F)	1.16	1.15	1.16	1.13	1.05	···	1.11	1.03	0.99	0.97	1.01

Image size (pixel☓pixel)	CPU (ms)	GPU (ms)	Physical scanning time (ms, 4 kHz resonant scanner)
100☓100	839 ± 4	5.6 ± 0.03 (☓149↑)	25
200☓200	3,350 ± 2	17.5 ± 0.03 (☓191↑)	50
500☓500	20,909 ± 11	80.8 ± 0.17 (☓259↑)	125
800☓800	53,496 ± 17	180.7 ± 0.78 (☓297↑)	200
1,000☓1,000	83,601 ± 8	319.3 ± 1.36 (☓262↑)	250

Case	①-1	①-2	①-3	①-4	①-5	···	⑤-1	⑤-2	⑤-3	⑤-4	⑤-5
Average fluorescence lifetime (τ_avg, ns)	0.968	1.036	0.984	0.985	0.999	···	2.992	2.974	3.018	2.978	2.994
Average number of photons (N_avg)	202	408	590	814	1,010	···	201	401	600	818	992
Standard deviation (σ_τ, ns)	0.079	0.059	0.047	0.039	0.033	···	0.234	0.153	0.122	0.101	0.096
Figure of merit (F)	1.16	1.15	1.16	1.13	1.05	···	1.11	1.03	0.99	0.97	1.01

Image size (pixel☓pixel)	CPU (ms)	GPU (ms)	Physical scanning time (ms, 4 kHz resonant scanner)
100☓100	839 ± 4	5.6 ± 0.03 (☓149↑)	25
200☓200	3,350 ± 2	17.5 ± 0.03 (☓191↑)	50
500☓500	20,909 ± 11	80.8 ± 0.17 (☓259↑)	125
800☓800	53,496 ± 17	180.7 ± 0.78 (☓297↑)	200
1,000☓1,000	83,601 ± 8	319.3 ± 1.36 (☓262↑)	250

GPU accelerated real-time confocal fluorescence lifetime imaging microscopy (FLIM) based on the analog mean-delay (AMD) method

Abstract

1. Introduction

2. Methods

2.1 Hardware configuration of GPU accelerated real-time confocal AMD-FLIM

2.2 System sequence for GPU accelerated real-time confocal AMD-FLIM

2.3 Data processing flow in GPU core for real-time confocal FLIM

3. Results and discussion

3.1 Monte Carlo Simulation (MCS) to evaluate the algorithm and GPU processing time

3.2 Analysis of GPU processing time

3.3 GPU accelerated real-time confocal fluorescence lifetime imaging of maize vascular tissue

4. Conclusion

Funding

Acknowledgment

References and links

Supplementary Material (2)

Cited By

Figures (8)

Tables (2)

Equations (1)

Biomedical Optics Express