Computational ghost imaging using a field-programmable gate array

Ikuo Hoshi; Tomoyoshi Shimobaba; Takashi Kakue; Tomoyoshi Ito

doi:10.1364/OSAC.2.001097

1. Introduction

Ghost imaging (GI) is an imaging method that has been intensively studied in recent years [1–4]. Unlike the usual imaging that uses charge-coupled devices, GI uses a single-pixel device as the light-receiving element. In GI, the object is illuminated by using light having spatially random patterns; then, the light that passes through the objects (or the light reflected by the objects) is collected by the lens. These lights are called object lights. The intensity of the object light is detected by a single-pixel element. Finally, the object image is reconstructed by calculating the correlation between the obtained object light intensities and the random illumination patterns used to obtain the object light intensities. Researchers have proposed GI-based methods for calculating the light-intensity distribution of the random illumination patterns on a computer [5,6]; this is called computational GI.

Computational GI is advantageous for measurements over broad wavelength bands; it is robust to disturbance and simplifies the optical system. These characteristics are expected to be applied in a wide range of fields such as bio imaging [7], remote sensing [8], and encryption [9]; computational GI is also helpful in taking three-dimensional measurements [10]. However, this method also has its disadvantages—the image quality of the reconstructed image is poor; the measurement time is high; and the reconstruction calculation is time consuming. Research has been conducted for improving the image quality by using modified correlation calculation [11–13], compressive sensing [14], and deep learning [15,16]. Research has also been conducted for shortening the measurement time [7,17,18].

Field-programmable gate array (FPGA)-based approaches have succeeded in acceleration of various calculations [19,20]. To accelerate the reconstruction calculation, we have designed a calculation circuit for computational GI, which can calculate the pixels of the reconstructed image in parallel. We implemented this circuit in a FPGA. The object light intensities obtained from the optical system were input to the FPGA, and a reconstructed image was obtained by calculating the correlation. The reconstruction time in FPGA for images having 32 × 32 pixels was 3 ms. This implies that this circuit can reconstruct images at a frame rate of 300 Hz or more.

In Section 2, we describe the principle of computational GI, and we describe the calculation circuit used for the computational GI. In Section 3, we show the results obtained by implementing the proposed circuit into the FPGA. We compare the calculation speed and evaluate the image quality of the reconstructed images. In Section 4, we summarize this research.

2. Hardware implementation of computational ghost imaging

Figure 1 presents a schematic of the computational GI used in this research. By using a digital mirror device (DMD) projector to illuminate an object with random illumination patterns, we obtained the time-series data of the object light intensities by using a photo detector and an analog-to-digital (AD) converter. The time-series data were sent to the memory of the FPGA. The parallel processing of the reconstruction calculation on the FPGA improved the speed. A universal serial bus (USB) interface was used for the communication between a personal computer and the FPGA board. Although it is ideal that the output of the AD converter is directly input to the FPGA, in order to simplify the circuit design, in the current system, the output of the AD converter is sent once to the computer via the USB. Thereafter, the data is transferred to the FPGA.

Fig. 1. Optical system with the FPGA for the computational ghost imaging.

Download Full Size | PDF

After the random illumination pattern was passed through the object, the object light intensity was collected by the lens and detected by the photo detector. The detected object’s light intensity ${S_i}$ is given as

(1)$${S_i} = \int\!\!\!\int {{I_i}(x,y)T(x,y)dxdy} ,$$

where ${I_i}({x,y} )$ is the distribution of the random illumination pattern, and $T({x,y} )$ is the transmittance of the object. The intensity of the random illumination pattern ${R_i}$ is given as

(2)$${R_i} = \int\!\!\!\int {{I_i}(x,y)dxdy}.$$

The following formula called the differential GI (DGI) [11–13] of a computational GI was used for the reconstruction:

(3)$$\langle{{O_i}(x,y)} \rangle = \langle{{S_i}{I_i}(x,y)} \rangle - \frac{{\langle{{S_i}} \rangle }}{{\langle{{R_i}} \rangle }}\langle{{R_i}{I_i}(x,y)} \rangle ,$$

where $\langle{O_i}({x,y} )\rangle$ represents the reconstructed image, and $\langle\cdots\rangle $ represents the ensemble average. In Eq. (3), $\langle{S_i}{I_i}({x,y})\rangle$ and $\langle{R_i}{I_i}({x,y})\rangle$ require $n \times x \times y$ times calculations, where n is the number of the random illumination patterns, and $x \times y$ is the number of the pixels. These operations are the most time consuming in DGI. $\langle{R_i}\rangle$ and $\langle{R_i}{I_i}({x,y})\rangle$ do not depend on objects; therefore, they can be calculated in advance. Instead of using central processing units (CPUs), we designed a dedicated circuit to accelerate the computation of Eq. (3).

We compared the image quality obtained by using the original computational GI [5] and the DGI under the same conditions. The reconstructed images are shown in Fig. 2. Figure 2(a) is the image reconstructed by using the computational GI, and Fig. 2(b) is the image reconstructed by using the DGI. We adopted the DGI for the hardware implementation because the image quality of the DGI was obviously better than that of the computational GI.

Fig. 2. Comparison of the images obtained by using (a) computational GI and (b) DGI. The number of pixels is $64\times 64$ and the number of random illumination patterns is 16,384.

Download Full Size | PDF

In Eq. (3), the division by $\langle{R_i}\rangle$ is a bottleneck in the hardware implementation. To simplify the hardware implementation, we reformulate Eq. (3) as follows:

(4)$$\langle{{R_i}} \rangle \langle{{O_i}(x,y)} \rangle = \langle{{R_i}} \rangle \cdot \langle{{S_i}{I_i}(x,y)} \rangle - \langle{{S_i}} \rangle \cdot \langle{{R_i}{I_i}(x,y)} \rangle .$$

The dedicated circuit generates random illumination patterns that are the same as the patterns displayed on the DMD projector. The pseudo-random number generators—linear congruential generators (LCGs), Mersenne Twister (MT), and the maximum length sequence (hereinafter “M-sequence”)—generate random illumination patterns. The reconstructed images generated by each method are shown in Fig. 3. Figures 3(a), 3(b), and 3(c) are the reconstructed images obtained by using LCGs, MT, and M-sequence, respectively. There were almost no differences in the image quality. In terms of the hardware implementation, we selected M-sequence as the pseudo-random number generator.

Fig. 3. Comparison of the reconstructed images using LCGs, MT, and the M-sequence. The number of pixels is $64\times 64$ and the number of random illumination patterns is 16,384.

Download Full Size | PDF

3. Designing the calculation circuit

The dedicated circuit consists of three parts: a receiver unit, a calculation unit, and a transmitter unit. The receiver unit and the transmitter unit are the USB transmission circuits between the host computer and the FPGA. The receiver unit is used for receiving the time-series data of the AD converter. The transmitter unit is used for sending the retrieved images to the computer. The calculation unit reconstructs images with $32 \times 32$ pixels. We used Xilinx Artix-7 XC7A100T-2 as the FPGA. The dedicated circuit was operated at 100 MHz. The input data was the object light intensity obtained by the AD converter. The output data was the reconstructed image.

The schematic of the calculation unit is shown in Fig. 4. All the arithmetic operations in the calculation circuit were developed using a fixed-point number. Figures 4 to 6 have several sets of three numbers in parentheses. Here, the first, second, and third numbers represent the sign bit, the number of bits of the integer part, and the decimal part of the fixed point number, respectively.

Fig. 4. Schematic of the calculation unit.

Download Full Size | PDF

When all the object light intensities were received, the calculation unit started by calculating the average $\langle{S_i}\rangle$. Then, the average $\langle{S_i}{I_i}({x,y})\rangle$ was calculated by the parallel calculator from the object light intensity $\langle{S_i}\rangle$ saved in memory and from the random illumination pattern ${I_i}({\mbox{x},\mbox{y}} )$ that was generated from the random number generator. The calculated $\langle{S_i}{I_i}({x,y})\rangle$ was saved in a random access memory (RAM). Subsequently, $\langle{R_i}\rangle\langle{O_i}({x,y})\rangle$ was calculated from $\langle{S_i}{I_i}({x,y})\rangle$, $\langle{S_i}\rangle$, $\langle{R_i}{I_i}({x,y})\rangle$, and $\langle{R_i}\rangle$. $\langle{R_i}{I_i}({x,y})\rangle$ and $\langle{R_i}\rangle$ were pre-calculated in the host computer and stored in a table and registers, respectively. Finally, the calculation unit sent $\langle{R_i}\rangle\langle{O_i}({x,y} )\rangle$ to the transmitter unit. Note that the $\langle{R_i}\rangle$ factor that appears on the left side of Eq. (4) can be omitted because it is a constant.

The details of the parallel calculator unit are shown in Fig. 5. This unit can simultaneously calculate 64 pixels in the reconstructed image because 64 calculation modules were operated in parallel. Figure 6 shows the details of the calculation module. The AND gates can be considered as 1-bit multipliers from the truth table. The 64 calculation modules process two lines of the reconstructed image (32 × 32 pixels); subsequently, they process the next two lines. All the calculated values were saved in the RAM shown in Fig. 4 via the multiplexer of Fig. 5.

Fig. 5. Schematic of parallel calculator unit.

Download Full Size | PDF

Fig. 6. Schematic of the calculation module.

Download Full Size | PDF

The random pattern generator using M-sequence is shown in Fig. 7. The boxes (called taps) with the notation M(s) were implemented by flip-flops; s indicates the index of the flip-flops. In this research, we generated a binary random number sequence by using the liner feedback shift register (LFSR). The feedback positions of the LFSR were determined by the longest polynomial [21]. The generator was necessary for producing 64-bit random numbers in parallel; therefore, the register needed to shift the current data to 64 bits per clock cycle.

Fig. 7. M-sequence with 64-bit liner feedback shift register.

Download Full Size | PDF

4. Result

In this study, the calculation time for image reconstruction when using a CPU was compared with the calculation time when using the FPGA. The number of random illumination patterns was 16,384. The calculation times of the FPGA were compared for 16 and 64 calculation modules. The transmission time between the FPGA and the host computer were not included in the calculation times. For the computing environment, we used Intel Core i5 4690 (clock frequency 3.50 GHz) as the CPU, a memory of 8.0 GB, Microsoft Windows 10 education as the operating system, and Microsoft Visual Studio C ++ 2015 as the compiler. The calculation times for the various devices are given in Table 1.

From Table 1, it is clear that the calculation using the FPGA was faster than that using the CPU. In addition, as the number of parallel modules was increased, the calculation speed was improved. In the 64 calculation modules, the dedicated circuit could calculate the reconstructed image at 3 ms. In other words, the circuit reconstructed images at a frame rate of over 300 Hz. As the result, the parallelization was effective. In Table 2, we show the FPGA resource utilization. In the table, “LUT” denotes the look-up table for implementing logic circuits, “LUTRAM” denotes that “LUT” is used as RAMs, “FF” denotes flip-flops, “BLOCKRAM” denotes the dedicated RAMs in the FPGA chip and “DSP” denotes the dedicated multipliers.

Table 1. Calculation times

View Table | View all tables in this article

Table 2. Resource utilization

View Table | View all tables in this article

A few main advantages of using FPGAs are as follows:

(a) The object light intensity of the AD converter can be directly received by the FPGA without going through any CPU or operating systems;
(b) The reconstruction calculation can be performed without CPUs; and
(c) The power consumption is low.

In particular, the first advantage becomes very important in applications that require precise timing control, such as cytometry [7]. In such applications, it is necessary to accurately control the timing (latency) from the reception of the input signals to the image reconstruction. It is very difficult for CPUs and GPUs to control the latency because the calculation paths in CPUs and GPUs are complex; in addition, the paths are controlled by operating systems. However, FPGAs can accurately control the latency easily.

We evaluated the image quality of the reconstructed images obtained by the CPU and FPGA in the numerical simulations. The calculations in the FPGA used the fixed-point number. The calculations in the CPU used the floating-point number. Each reconstructed image is shown in Fig. 8. Figure 8(a) is the original image. Figure 8(b) is the reconstructed image obtained by the FPGA, and Fig. 8(c) is the reconstructed image obtained by the CPU. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) were used for evaluating the image quality. The evaluated image quality is shown in Table 3. The qualitative and quantitative evaluations show that there is almost no difference between the image qualities.

Fig. 8. Reconstructed images obtained by the CPU and FPGA.

Download Full Size | PDF

Table 3. Numerical evaluation of image quality

View Table | View all tables in this article

Reconstructed images of the three objects using an actual optical system are shown in Fig. 9. We confirmed that the reconstructed images could be obtained by the FPGA.

Fig. 9. Original objects and reconstructed images using an actual optical system.

Download Full Size | PDF

Figure 10 shows reconstructed images in the number of random illumination patterns of 100, 500, and 1000. Figure 10(a) shows the original images (rectangle and cameraman). The calculation times using the FPGA were 16μs, 80μs, and 160μs in the number of random illumination patterns of 100, 500, and 1000, respectively.

Fig. 10. Reconstructed images in the number of random illumination patterns of (b) 100, (c) 500, and (d) 1000. (a) shows the original images.

Download Full Size | PDF

In Table 4, we evaluated FPGA resource utilization when increasing the number of pixels. When we use a larger FPGA (Xilinx Virtex-7XC7VX485tffg1761-2), we can implement the circuit that can calculate reconstructed images with 256 × 256 pixels into the FPGA. Figure 11 shows reconstructed images using numerical simulation in the cases of 32 × 32 pixels to 256 × 256 pixels. The number of random pattern illuminations is 16,384. The simulation was performed by using the same precision of the circuit. Note that we used the number of calculation modules was 16 due to the design of the circuit. The calculation time using the FPGA can be estimated by the number of pixels, the clock frequency, the number of measurements, and the number of the calculation modules. In the case of 256 × 256 pixels in Fig. 11(e), the calculation time is 671 ms (=256 × 256 pixels × 10 ns (=100 MHz) × 16,384 random illumination patterns / 16 calculation modules).

Table 4. Resource utilization

View Table | View all tables in this article

Fig. 11. Original objects and reconstructed images for each number of the pixels.

Download Full Size | PDF

In the current system (Fig. 1), the FPGA does not directly outputs the random patterns to the DMD, and does not directly receive the time-series data of the AD converter. The transfer of the random patterns from the computer to the DMD and the transfer of the time-series data of the AD converter are performed via the computer. Therefore, the total processing time is slow. The display time of one random pattern using OpenCV library is 100 ms and the acquisition time of one measurement by the AD converter is 7 ms. Therefore, the total display and acquisition time in 16,384 measurements is about 1,700 s. After all the measurements, the time-series data of the AD converter is transferred to the FPGA; then, the FPGA can calculate the reconstructed image in 3 ms.

To improve the total processing time, we will develop a system shown in Fig. 12 in our next work. We preset all random patterns in the DMD board and the preset random patterns can be changed by an external signal from the FPGA. The FPGA directly receives the output of the AD converter. This system will be dramatically improved the problems of the display and acquisition times of the current system.

Fig. 12. Our next system to improve the total processing time.

Download Full Size | PDF

5. Conclusion

In this research, we designed a dedicated circuit to reduce the time taken for the image reconstruction by using computational GI. The dedicated circuit could reconstruct images at a frame rate of over 300 Hz. The image quality of the reconstructed images obtained by the FPGA was almost the same as that obtained by the CPU. We also confirmed that the FPGA could obtain reconstructed images in an actual optical system. The circuit scale of the FPGA used in this research was small. Larger reconstructed images could be obtained at higher speeds by using large-scale FPGAs. In this research, we used random pattern illumination. The image quality is expected to improve if the Fourier basis, Hadamard basis and Wavelet basis are used for illumination [17,18,22]. In future, we plan to improve our dedicated circuit using upon the method.

References

1. T. B. Pittman, Y. H. Shon, D. V. Strekalov, and A. V. Sergienko, “Optical imaging by means of two photon quantum entanglement,” Phys. Rev. A 52(5), R3429–R3432 (1995). [CrossRef]

2. A. Gatti, E. Brambilla, M. Bache, and L. A. Lugiato, “Ghost imaging with thermal light: Comparing entangle-ment and classical correlation,” Phys. Rev. Lett. 93(9), 093602 (2004). [CrossRef]

3. A. Gatti, E. Brambilla, M. Bache, and L. A. Lugiato, “Correlated imaging, quantum and classical,” Phys. Rev. A 70(1), 013802 (2004). [CrossRef]

4. F. Feeri, D. Magatti, A. Gatti, M. Bache, E. Brambilla, and L. A. Lugiato, “High-resolution ghost image and ghost diffraction experiments with thermal light,” Phys. Rev. Lett. 94(18), 183602 (2005). [CrossRef]

5. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]

6. Y. Bromberg, O. Katz, and Y. Silberberg, “Ghost imaging with a single detector,” Phys. Rev. A 79(5), 053840 (2009). [CrossRef]

7. S. Ota, R. Horisaki, Y. Kawamura, M. Ugawa, I. Sato, K. Hashimoto, and K. Waki, “Ghost cytometry,” Science 360(6394), 1246–1251 (2018). [CrossRef]

8. B. I. Erkmen, “Computational ghost imaging for remote sensing,” J. Opt. Soc. Am. A 29(5), 782–789 (2012). [CrossRef]

9. P. Clemente, V. Durán, V. T. Company, E. Tajahuerce, and J. Lancis, “Optical encryption based on computational ghost imaging,” Opt. Lett. 35(14), 2391–2393 (2010). [CrossRef]

10. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D computational imaging with single-pixel detectors,” Science 340(6134), 844–847 (2013). [CrossRef]

11. W. Gong and S. Han, “A method to improve the visibility of ghost images obtained by thermal light,” Phys. Lett. A 374(8), 1005–1008 (2010). [CrossRef]

12. F. Ferri, D. Magatti, L. A. Lugiato, and A. Gatti, “Differential ghost imaging,” Phys. Rev. Lett. 104(25), 253603 (2010). [CrossRef]

13. B. Sun, S. S. Welch, M. P. Edgar, J. H. Shapiro, and M. J. Padgett, “Normalized ghost imaging,” Opt. Express 20(15), 16892 (2012). [CrossRef]

14. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. 95(13), 131110 (2009). [CrossRef]

15. T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. 413, 147–151 (2018). [CrossRef]

16. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]

17. Z. Xu, W. Chen, J. Penuelas, M. Padgett, and M. Sun, “fps computational ghost imaging using LED-based structured illumination,”,” Opt. Express 26(3), 2427–2434 (2018). [CrossRef]

18. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard single-pixel imaging versus Fourier single-pixel imaging,” Opt. Express 25(16), 19619–19639 (2017). [CrossRef]

19. J. L. V. M. Stanislaus and T. Mohsenin, “Low-complexity FPGA implementation of compressive sensing reconstruction,” 2013 International Conference on Computing, Networking and Communications (ICNC), 671–675 (2013).

20. M. Birk, M. Zapf, M. Balzer, N. Ruiter, and J. Becker, “A comprehensive comparison of GPU-and FPGA-based acceleration of reflection image reconstruction for 3D ultrasound computer tomography,” J. Real Time Image Process. 9(1), 159–170 (2014). [CrossRef]

21. P. Alfke, “Efficient shift registers, LFSR counters, and long pseudo-random sequence generators,” http://www.xilinx.com/bvdocs/appnotes/xapp052.pdf (1996).

22. K. M. Czajkowski, A. Pastuszczak, and R. Kotyński, “Single-pixel imaging with Morlet wavelet correlated random patterns,” Sci. Rep. 8(1), 466 (2018). [CrossRef]

Device	Calculation time [ms]
CPU (GI calculation [5])	49
CPU (DGI calculation)	45
FPGA (16 calculation modules)	10
FPGA (64 calculation modules)	3

	Utilization	Available	Utilization[%]
LUT	11,308	63,400	17.8
LUTRAM	1,450	19,000	7.6
FF	22,085	126,800	17.4
BLOCKRAM	5	135	3.7
DSP	2	240	0.8

Device	PSNR	SSIM
CPU (floating-point number)	24.19	0.98
FPGA (fixed-point number)	23.70	0.97

	$32 \times 32$ [pixel]	$64 \times 64$ [pixel]	$128 \times 128$ [pixel]	$256 \times 256$ [pixel]	Total resource
LUT	2,367(0.78%)	8,534(2.8%)	35,799(11%)	154,467 (51%)	303,600
LUTRAM	1,123(0.86%)	4,995(3.8%)	21,507(16%)	92,163 (70%)	130,800
FF	446(0.073%)	961(0.16%)	757(0.13%)	1,147 (0.19%)	607,200
BLOCKRAM	5(0.49%)	7(0.68%)	18(1.2%)	52(5.1%)	1,030
DSP	2(0.071%)	2(0.071%)	1(0.038%)	1(0.038%)	2,800

Device	Calculation time [ms]
CPU (GI calculation [5])	49
CPU (DGI calculation)	45
FPGA (16 calculation modules)	10
FPGA (64 calculation modules)	3

Computational ghost imaging using a field-programmable gate array

Abstract

1. Introduction

2. Hardware implementation of computational ghost imaging

3. Designing the calculation circuit

4. Result

5. Conclusion

References

Cited By

Figures (12)

Tables (4)

Equations (4)

OSA Continuum