High-resolution iterative reconstruction at extremely low sampling rate for Fourier single-pixel imaging via diffusion model

Xianlin Song; Xuan Liu; Zhouxu Luo; Jiaqing Dong; Wenhua Zhong; Guijun Wang; Binzhong He; Zilong Li; Qiegen Liu

doi:10.1364/OE.510692

1. Introduction

Fourier single-pixel imaging (FSPI) [1–3] is a computational imaging technology based on Fourier analysis theory [4,5]. The Fourier basis pattern [6–8] is employed as the structured light illumination to modulate the target, and the single-pixel photodetector is subsequently used to measure the reflected light intensity from the target. The Fourier spectrum is calculated by the acquired reflected light intensity. Finally, reconstruction is achieved through the inverse Fourier transform. Nowadays, Fourier single-pixel imaging has demonstrated significant application prospects in terahertz imaging, infrared imaging, phase imaging, real-time imaging [9–13] and many other fields. However, to reconstruct high-resolution image using FSPI, a large number of Fourier spectra needs to be acquired, resulting in a significant volume of data and slower imaging speed [14–17]. To handle the difficult, researchers have proposed numerous solutions. Zhang et al. [2] proposed that the frequency domain of image is sparse with the majority of image energy concentrated in the low-frequency region. Consequently, approximate image contours can be reconstructed via inverse Fourier transform after acquiring the low-frequency Fourier coefficients, thereby enhancing imaging efficiency. Nevertheless, the method necessitates sacrificing the resolution of reconstruction image to reduce the amount of data. Particularly, under low sampling rates (the ratio of the number of measurements to the total pixels of the reconstruction results), the reconstruction result may exhibit blurriness. Bian et al. [14] noted that the Fourier spectrum of natural images exhibits a conjugate property. After acquiring half of the Fourier spectrum, the remaining half of the Fourier spectra is derived through the conjugate property of the Fourier spectrum. The method can reduce data volume to some extent. Yet, a substantial volume of data remains necessary for achieving high-resolution reconstruction, thereby diminishing imaging efficiency. In addition, algorithms such as compressive sensing have been widely utilized in FSPI. Wenwen et al. [18] utilized variable density random sampling matrix to achieve random sampling, followed by the utilization of compressed sensing algorithm for processing the sparse Fourier spectrum, thereby recovering the high-frequency information. Unfortunately, the method necessitates a large number of iterations, which increases the computational complexity and demands a substantial quantity of computing resources to address the ill-posed problem. Xiao et al. [19] proposed a new Fourier single-pixel imaging method, which estimates the uncollected high-frequency coefficients by acquiring a small amount of low-frequency component, thereby reducing the number of measurements. On the other hand, the method requires an appropriate initial guess for cyclic inference, which is challenging to obtain in practical FSPI.

In recent years, with the development of deep learning, the immense potential of deep learning in the field of Fourier single-pixel imaging has been shown. Rizvi et al. [20] proposed the Deep Convolutional Autoencoder Network (DCAN). A small amount of low-frequency information sampled from the target is sparsely reconstructed and the reconstruction results are input into the pre-trained DCAN network for optimization, thereby reducing the blurring of the reconstruction results caused by the loss of high-frequency information. However, the end-to-end network used in this method has a significant requirement for training data, which usually necessitates a large number of training data to achieve satisfactory performance. Additionally, this kind of network is sensitive to the perturbation and noise of the input data, resulting in suboptimal optimization for real-sampled data. As the development of generative models, generative models have been widely utilized in image generation, image denoising, image inpainting and other fields [21–23]. Among generative models, Likelihood-based models [24–27] and generative adversarial networks (GAN) [28] have achieved tremendous success. Karim et al. [29] proposed a single pixel imaging reconstruction technique based on GAN. The method replaces the earlier end-to-end network and uses a better generative network as the optimization network. Through the sampling of the learned latent space, the reconstruction results become more realistic, thus improving both the quality and generalization of the reconstruction in comparison to the early end-to-end network. Likelihood-based models either necessitate specialized architecture to construct a normalized probabilistic model or are trained with a surrogate loss [30]. While GANs circumvent certain constraints associated with likelihood-based models, the adversarial training process can introduce instability during training [30], potentially resulting in non-convergence or overfitting.

Song et al. [31] proposed a score-based generative model (diffusion model), which adopts a more efficient sampling method to learn the probability distribution of given samples, thereby obtaining a probabilistic model. Subsequently, the model is used to fit the samples in order to generate the target image. Motivated by the model, an innovative high-resolution reconstruction method for FSPI via diffusion model [32] is proposed in this paper. Firstly, the Fourier basis pattern is continuously projected to the target and the low-frequency Fourier spectrum is calculated by the four-step phase shift algorithm [33–36]. Then, both the real-sampled spectrum and Gaussian random noise are input into the trained diffusion model, and the real-sampled data is utilized to constrain the generation direction of the model. Finally, the loss high-frequency detail information is continuously recovered and optimized by the model, resulting in a step-by-step reconstruction of a high-resolution image. Both simulations and experiments demonstrate that under the extremely low sampling rate (e.g., 1%), higher imaging quality can be achieved by the proposed method compared with the U-Net method and the traditional FSPI method. Regarding the coin simulation data, an improvement of 12% and 7% is observed in SSIM and PSNR, respectively, compared with the U-Net method. For the experimental coin data, an increase of approximately 147% and 123% is achieved in the edge intensity-based score for the obverse and reverse sides, respectively, with the proposed method. In conclusion, the proposed method allows for high-quality imaging under extremely low sampling rates.

2. Principles and methods

2.1 System design

The system can be categorized into two main components: data acquisition and reconstruction. The data acquisition section shown in the upper portion of Fig. 1(a) involves the emission of a light beam by a helium-neon laser (JDSU-1137, Power Technology, wavelength: 632.8 nm, beam diameter: 0.84 mm). The beam is subjected to expansion via a beam expanding system, with a magnification factor of 8.3, constituted by planoconvex lenses L1 (f = 18 mm) and L2 (f = 150 mm). Subsequently, the expanded beam is reflected by mirror M1 onto a digital micromirror device (DMD) (V-7001VIS, Aunion Tech, resolution: 1024 × 768, refresh rate: 22727 Hz). The computer transmits the binarized Fourier basis patterns (as indicated by the tag 2 in the Fig. 1(c)) to modulate the light beam using the DMD, and subsequently magnifies the modulated Fourier basis patterns utilizing a planoconvex lenses L3 (f = 150 mm). The first-order diffraction light is then directed onto the target through an iris. Utilizing a planoconvex lenses L4, the diffusely reflected light signal from the target is collected and uniformly focused onto an A4 paper. The function of the A4 paper is to act as a reflecting diffuser [2,37], most of the light can be diffusely reflected, and only a small amount of light passes through the A4 paper. The diffuse light of the target is further scattered by the A4 paper, so that the response of the detector is the inner product calculation between the target image and the Fourier base pattern [6,7]. The detected light signal is converted into a voltage signal by a single-pixel photodetector (DET025A/M, Thorlabs, wavelength range: 400-1700nm). Subsequently, the voltage signal is synchronously transferred to a computer through a data acquisition card (DAQ) (PicoScope 3405D, Pico Technology, bandwidth: 100 MHz, sampling rate: 1 GS/s). Finally, the computer performs reconstruction on the undersampled data. The corresponding photographs of the practical acquisition system are illustrated in Fig. 1(b).

Fig. 1. Scheme of the system and the photographs of practical system. (a) Scheme of the system. L1, …, L4 is planoconvex lenses; M1, mirrors; DMD, digital micromirror device; DAQ, data acquisition card. (b) photographs of practical system. (c) Control program based on Labview, the blue dashed rectangle is the Fourier basis pattern loading module, and the red dashed rectangle is the data card acquisition module. 1, the total number of Fourier basis patterns; 2, the Fourier basis patterns currently undergoing projection; 3, the path to save the Fourier basis pattern; 4, the path to save the data; 5, the detected signal corresponding to the projected Fourier basis pattern.

Download Full Size | PDF

As illustrated in the lower part of Fig. 1(a), the acquired low-frequency Fourier spectrum of the target is input into a pre-trained model, serving as a consistency constraint for iterative generation process of the model to achieve consistent updates. This mechanism ensures the accuracy of the generation results by guaranteeing the correctness of the generation direction. To further enhance data acquisition, a visual synchronous control program based on the LabVIEW platform is developed, as shown in Fig. 1(c). The program comprises a Fourier basis pattern loading module (indicated by the blue dashed rectangle) and a data card acquisition module (indicated by the red dashed rectangle).

2.2 Fourier single-pixel imaging principle

The FSPI technique is based on the principles of 2D Fourier transform, which posits that any image can be synthesized by weighting and superimposing a series of the Fourier basis patterns with varying spatial frequencies and initial phases [6,36]. The Fourier basis patterns are essentially a series of images characterized by different spatial frequencies and initial phase, with intensities distributed in a cosine manner, which can be represented by Eq. (1) [2]:

(1)$${P_\varphi }(x,y,{f_x},{f_y}) = a + b \cdot cos(2\pi {f_x}x + 2\pi {f_y}y + \varphi ),$$

where a is the average intensity, b is the contrast, $(x,y)$ represents the Cartesian coordinates in the image domain, $\varphi$ is the initial phase, ${f_x}$ and ${f_y}$ are the spatial frequencies corresponding to the x and y directions, respectively. Given the surface reflectance distribution function $R(x,y)$ of an object, when the base pattern ${P_\varphi }(x,y,{f_x},{f_y})$ is projected onto the target, the response of a single-pixel photodetector can be represented by Eq. (2):

(2)$${D_\varphi }({{f_x},\textrm{ }{f_y}} )= {d_n} + \beta \int\!\!\!\int\limits_S {R({x,y} )} {P_\varphi }({x,y,{f_x},\textrm{ }{f_y}} )dxdy,$$

where ${d_n}$ is the response (noise) caused by environmental illuminations on the detector, $\beta$ depends on the size and the location of the detector. The four-step phase-shifting algorithm [2,36] is applied to ${D_0}$, ${D_{\pi /2}}$, ${D_\pi }$ and ${D_{3\pi /2}}$, the corresponding relationship between the single-pixel photodetector response and the target Fourier coefficient can be obtained, as shown in Eq. (3):

(3)$$2b\beta \cdot \tilde{I}({f_x},{f_y}) = [{{D_0}({f_x},{f_y}) - {D_\pi }({f_x},{f_y})} ]+ j[{{D_{\pi /2}}({f_x},{f_y}) - {D_{3\pi /2}}({f_x},{f_y})} ],$$

where $\tilde{I}({f_x},{f_y})$ is the Fourier coefficient of the Fourier basis pattern with frequency $({f_x},{f_y})$. By rearranging the Fourier coefficients computed using Eq. (3), the Fourier spectrum matrix $A$ of the target is obtained. Consequently, the reconstruction image $R$ is given by Eq. (4):

(4)$$R = {{\cal F}^{\textrm{ - 1}}}[A].$$

where ${{\cal F}^{\textrm{ - 1}}}[{\bullet} ]$ signifies the Fourier inverse transform. Based on the aforementioned principles, a series of the Fourier basis patterns with varying spatial frequencies are continuously projected onto the target. Through the utilization of Eq. (3), the Fourier coefficients corresponding to the Fourier basis patterns are acquired and subsequently reorganized into a Fourier coefficient matrix denoted as $A$. Finally, the reconstruction result is obtained by applying Eq. (4).

2.3 High-resolution iterative reconstruction based on diffusion model

2.3.1 Diffusion model

In the diffusion model, the training samples consist of independently and identically distributed random samples that conform to a probability distribution. The probability distribution is typically characterized using a score function, where the score function represents the gradient of the logarithmic probability density function. As shown in Fig. 2, the diffusion model can be divided into two components [31]: forward diffusion (stochastic differential equation, SDE) and reverse SDE. In forward diffusion, Gaussian noise is continuously added to the training set in order to perturb the data distribution, thereby obtaining a prior distribution. In reverse SDE, continuous sampling is performed from the learned data distribution, progressively eliminating noise in order to transform the prior distribution into the target data. More specifically, the aforementioned diffusion process establishes a diffusion process $\{ {x_t}\} _{t = 0}^T$ associated with a continuous time variable $t \in [0,T]$, and utilizes an independently and identically distributed dataset ${x_T}\sim {p_T}$, resulting in ${x_0}\sim {p_0}$, where ${p_0}$ represents the data distribution, and ${p_T}$ is the prior distribution. The diffusion process can be modeled as a process for solving a SDE shown in Eq. (5):

(5) $$dx = f(x,t)dt + g(t)dw,$$

where w represents the standard Wiener process (Brownian motion), $f(x,t)$ and $g(t)$ are the drift and diffusion coefficients of ${x_t}$, respectively. Equation (5) is commonly referred to as the Forward SDE, corresponding to the degradation process of image. The corresponding reverse process is referred to as the reverse SDE, which describes a diffusion process running backward in time and can be expressed as Eq. (6):

(6)$$dx = [{f(x,t) - {g^2}(t){\nabla_x}\log {p_t}(x)} ]dt + g(t)d\bar{w},$$

where $\bar{w}$ represents the standard Wiener process flowing backward in time from T to $0$. ${\nabla _x}\log {p_t}(x)$ represents the score function. $f(x,t)$ and $g(t)$ are known, the only unknown is ${\nabla _x}\log {p_t}(x)$. Therefore, denoising score matching [21] can be employed to train the model for estimating the scoring function ${\nabla _x}\log {p_t}(x)$ at all times t. Subsequently, Eq. (6) can be utilized to generate samples from ${p_0}$.

Fig. 2. Forward and reverse processes of the diffusion model.

Download Full Size | PDF

2.3.2 High-resolution reconstruction based on diffusion model

Prior learning: As shown in the upper part of Fig. 3, a richer sample x was obtained through the application of data augmentation on a large number of high-resolution images, which is subsequently fed into the model. Gaussian noise is continuously added to the samples by the model in order to perturb the data distribution, thereby learning the internal statistical distribution of the training samples. Due to the variance exploding SDE (VE-SDE) [38] often yielding high-quality samples, in the context of Eq. (5), this work selects $f(x,t) = 0$ and $g(t) = \sqrt {{{d[{{\sigma^2}(t)} ]} / {dt}}}$, resulting in the formulation of the forward VE-SDE shown in Eq. (7):

(7)$$dx = \sqrt {{{d[{{\sigma^2}(t)} ]} / {dt}}} dw,$$

where $\sigma (t)$ is a univariate Gaussian noise function over continuous time $t \in [0,1]$, and $\sigma (t)$ can be redescribed as $\{ {\sigma _i}\} _{i = 1}^N$. Unfortunately, solving Eq. (6) requires knowledge of the score function ${\nabla _x}\log {p_t}(x)$ at all time steps. While obtaining ${\nabla _x}\log {p_t}(x)$ directly for any time t can be challenging, ${\nabla _x}\log {p_t}(x)$ can be estimated by a time-correlated score network ${S_\theta }({{x_t},t} )$ through denoising score matching training [39]. To achieve this, the ${S_\theta }({{x_t},t} )$ with a parameter $\theta$ is optimized according to Eq. (8):

(8)$${\theta ^\ast } = \mathop {\arg \min }\limits_\theta {\mathbb E_t}\{{\lambda (t){\mathbb E_{{x_0}}}{\mathbb E_{{x_t}|{x_0}}}[{\|{{S_\theta }({x_t},t) - {\nabla_{{x_t}}}\log {p_t}({x_t}|{x_0})} \|_2^2} ]} \},$$

where $\lambda :[0,T] \to {{\mathbb R}_{ > 0}}$ is a positive weighting function, ${\nabla _x}\log {p_t}({x_t}|{x_0})$ is a Gaussian perturbation kernel centered at ${x_0}$. The score network ${S_\theta }({x_t},t)$ is trained with Eq. (8), ${\nabla _x}\log {p_t}({x_t})$ can be approximate to ${S_\theta }({x_t},t)$, i.e., ${S_\theta }({x_t},t) \simeq {\nabla _x}\log {p_t}({x_t})$. Therefore, Eq. (6) can be rewritten to Eq. (9):

(9)$$dx ={-} d[{{\sigma^2}(t)} ]{S_\theta }({x_t},t) + \sqrt {{{d[{{\sigma^2}(t)} ]} / {dt}}} d\bar{w},$$

Fig. 3. Flow chart of high-resolution iterative reconstruction based on diffusion model. Input 1 is the Gaussian noise, Input 2 is the acquired low-frequency Fourier spectrum of the target. FFT, fast Fourier transform; IFFT, inverse fast Fourier transform; P, predictor; C, corrector; A, the acquired low-frequency Fourier spectrum.

Download Full Size | PDF

Iterative reconstruction: As shown in the lower part of Fig. 3, a predictor-corrector (PC) sampler is employed as a numerical solver for reverse SDE, coupled with data consistency (DC) operations, for conditional generation. Specifically, high-quality image could be progressively generated from the learned prior distribution by reversing the aforementioned forward SDE process, wherein random noise is used as the raw data input while time t is allowed to reverse from T to $0$. Throughout the iterative reconstruction process, the real-sampled low-frequency spectrum is used as the data consistency term, constraining the generation direction of model. During each iteration, the low-frequency component of the input image of PC sampler is replaced by the real-sampled low-frequency spectrum. The reconstruction process can be decoupled into two subproblems: PC and DC.

PC: The PC sampler is introduced to compute the reverse SDE and correct errors in the evolution of the reverse SDE process. Specifically, the predictor refers to the numerical solver for the reverse SDE, achieving initial predictions by solving the inverse SDE, which are used for sample estimation at each time step. The prediction algorithm is given by Eq. (10):

(10)$${\tilde{x}_i} = {x_i} + (\sigma _{i + 1}^2 - \sigma _i^2){S_\theta }({x_i},{\sigma _{i + 1}}) + \sqrt {\sigma _{i + 1}^2 - \sigma _i^2} {z_i},$$

where ${\tilde{x}_i}$ represents an initial prediction based on the prior distribution, $i = N - 1, \cdots ,1,0$ denotes the number of the steps of the reverse-time SDE, and ${z_i} \sim {\mathbb N}(0,1)$ is a standard normal distribution. The corrector is defined as an iterative process of Langevin dynamics, which corrects the gradient ascent direction of the predicted results through the Langevin dynamics Markov chain Monte Carlo method [40–42]. The main role of the corrector is to correct the results generated by the predictor. The correction algorithm is given by Eq. (11):

(11)$${x_{i - 1}} = {\tilde{x}_i} + {\varepsilon _i}{S_\theta }({\tilde{x}_i},{\sigma _{i + 1}}) + \sqrt {2{\varepsilon _i}} {z_i},$$

where ${\varepsilon _i}$ represents the noise step size for the i iteration.

DC: During the reconstruction iteration, data consistency is repeatedly enforced at each iterative reconstruction step to ensure the output is consistent with the real-sampled spectral information. The data consistency process is given by Eq. (12):

(12)$${\tilde{x}_i} = \left\{ \begin{array}{l} {{\cal F}^{ - 1}}\left[ {\frac{{\lambda A + {\cal F}({{\tilde{x}}_i})}}{{1 + \lambda }}} \right],\textrm{ }{x^f} \in \varOmega \\ {{\cal F}^{ - 1}}[{{\cal F}({{\tilde{x}}_i})} ],\textrm{ }{x^f}\;\; \notin \varOmega \end{array} \right.$$

where ${\cal F}$ represents the Fourier transform, $\varOmega $ represents the low-frequency component, and ${x^f}$ represents the Fourier spectrum of ${\tilde{x}_i}$. During each iteration, the high-frequency component of the image is kept constant, the low-frequency component is related to the A. In the noiseless setting (i.e., $\lambda \to \infty $), the low-frequency component of the predicted data spectrum is replaced by the A. During the reconstruction stage, the random noise input to the model undergoes an iterative prediction-correction scheme under the joint constraints of the learned prior knowledge and the A, achieving high-quality reconstruction of FSPI under low sampling rates.

The pseudocode for the reconstruction process is shown in Process 1. The process contains two loops: 1) In the outer loop, the acquired low-frequency Fourier spectrum is embedded into the model to achieve prediction of the data distribution. The number of iteration N in the outer loop is determined by the discrete steps of the reverse SDE. 2) The inner loop is corrected through annealed Langevin iterations, with the number of corrections set to M.

oe-32-3-3138-i001

2.3 Dataset acquisition and network parameter settings

The dataset in this study comprises two subsets: the animal dataset and the coin dataset. (1) Animal Dataset: The animal dataset was sourced from the public dataset ‘30 k Cats and Dogs 150 × 150 Greyscale’ obtained from the Kaggle website (https://www.kaggle.com). The dataset contains 15000 cat images and 15000 dog images, respectively. Details of the dataset can be found at the URL: https://www.kaggle.com/datasets/unmoved/30k-cats-and-dogs-150 × 150-greyscale/data. To better meet the training requirements, 12500 uncorrupted images were first selected from the dataset. The selected images were then subjected to grayscale processing, followed by resizing to 128 × 128 pixels. The first 12000 of these images were utilized for training the diffusion model, while the remaining 500 were retained as a test dataset for evaluating the trained diffusion model. 2) Coin dataset: The coin dataset is a homemade dataset. High-resolution images of coin were captured using a CCD camera (MER2-503-23GM, Daheng Imaging, frame rate: 23.5fps, resolution 2448 × 2048) equipped with a camera lens (HN-1616-5M-C2/3X, Daheng Imaging, aperture: 25.5 mm, F number: F1.6-F16, focal length: 16 mm). 550 images were acquired and cropped to 256 × 256 in size. Data augmentation was performed on 500 of the images to obtain 2000 augmented images for training the diffusion model, while the remaining 50 images were used as the test set. The dataset used for model training in the subsequent coin experiment and simulations is the same dataset.

Given the need for the end-to-end network training with low-resolution image sampled at various rates, the FSPI simulation platform [43] was built for this purpose. Using the platform, low-resolution image at different sampling rates were obtained from the aforementioned animal and coin datasets for the training of end-to-end network. The sizes of the Fourier basis pattern used in the animal and coin datasets were consistent with the image sizes in the training sets, which were 128 × 128 and 256 × 256, respectively. Figure 4 shows a selection of training set samples from the animal and coin datasets and their corresponding Fourier spectra.

Fig. 4. Partial samples and corresponding Fourier spectrum of animal dataset and coin dataset. GT, ground truth.

Download Full Size | PDF

The training of diffusion model was optimized using the Adaptive Moment Estimation method [44,45], with a learning rate set to $2 \times {10^{ - 4}}$. During reconstruction, the number of iteration was set to 2000. When training the animal and coin datasets, the input image sizes were set to 128 × 128 and 256 × 256, respectively. The output sizes were consistent with the inputs, with pixel values normalized before network processing. Gaussian noise with a noise value in range of 0.01-300 was used to perturb the data distribution. The proposed method is implemented using the PyTorch deep learning framework on a graphical processing unit (GPU; GeForce RTX 3060Ti) for accelerated computation.

3. Results

3.1 Single-pixel imaging reconstruction

FSPI simulation was conducted on the 256 × 256 pixel gorilla image under various sampling rates. Figure 5(a) shows the Fourier spectrum of the ground truth (GT). Figures 5(b)–5(f) show the obtained Fourier spectra under different sampling rates, where m represents the times of measurement. As the sampling rate decreases, the high-frequency information is gradually reduced, which also means the acquired details are progressively reduced. Figure 5(g) shows the ground truth. Figures 5(h)–5(l) show the reconstruction results corresponding to Figs. 5(b)–5(f), where the structural similarity (SSIM) is labeled in yellow text, and the peak signal-to-noise ratio (PSNR) is labeled in blue text. As the sampling rate decreases from 25% to 10%, the SSIM of the reconstruction results obtained by the traditional FSPI method gradually decreases from 0.89 to 0.75, and the PSNR gradually decreases from 26.09 dB to 22.95 dB. Under the sampling rate of 5%, the overall gorilla image becomes blurred. With the sampling rate reduced to 3%, the blurriness in the gorilla image is further intensified, resulting in distortion in fine details. As the sampling rate is further reduced to 1%, the SSIM of the reconstruction results obtained by the traditional FSPI method gradually decreases from 0.75 to 0.47, while the PSNR decreases from 22.95 dB to 19.86 dB. The fine details (such as the eye region) become indiscernible, with only the contours remaining distinguishable. Collectively, when the sampling rate attains or surpasses 10%, the reconstruction results obtained by the traditional FSPI method maintain the relatively high level of clarity. However, as the sampling rate falls below 10%, the conspicuous degradation in the clarity of the reconstruction result becomes evident. In order to better observe the degradation degree of the reconstruction results under different sampling rates, the eye area of the gorilla is locally enlarged using the red rectangle. In summary, the reduction in sampling rate led to the loss of high-frequency information, resulting in blurriness in the reconstruction results. Particularly, under extremely low sampling rate (e.g., 1%), the blurriness was notably evident.

Fig. 5. The reconstruction results obtained by the traditional FSPI method under various sampling rates. (a) is the Fourier spectrum of the ground truth. (b)-(f) are the Fourier spectra of the reconstruction results obtained by the traditional FSPI method under sampling rates of 25%, 10%, 5%, 3% and 1%, respectively, m is the times of measurement. (g) is the ground truth. (h)-(l) are the reconstruction results corresponding to (b)-(f). The yellow text is the SSIM of the corresponding reconstruction results, and the blue text is the PSNR of the corresponding reconstruction results. NP, normalized power spectral density; NI, normalized intensity; GT, ground truth.

Download Full Size | PDF

3.2 Numerical simulations

The proposed method is compared with the traditional FSPI method, compressed sensing method [46] and U-Net method [47] to prove the superiority of the proposed method. Figure 6 shows the reconstruction results obtained by the traditional FSPI method, the compressed sensing method (CS), the U-Net method and the proposed method (Ours), respectively, as well as the corresponding ground truth and Fourier spectra. Figures 6(a1)–6(j3) show the Fourier spectra of the reconstruction results obtained by different reconstruction methods under sampling rates of 5%, 3% and 1%, respectively, as well as the Fourier spectrum of the ground truth. While the compressed sensing method shows improvement over the traditional FSPI method, the recovered high-frequency information is still significantly less than the U-Net method. However, more abundant high-frequency information can be recovered by the proposed method. As the sampling rate continues to decrease, the available spectrum gradually diminishes, resulting in a corresponding decrease in the high-frequency information that can be recovered by the U-Net method. Besides, the high-frequency information recovered is still higher than the compressed sensing method and the traditional FSPI method. In contrast, the proposed method is capable of recovering abundant high-frequency information even under the extremely low sampling rate (e.g., 1%). Figures 6(a4)–6(j6) show the reconstruction results obtained by different methods under sampling rates of 5%, 3% and 1%, respectively, as well as the ground truth. The SSIM is labeled in green text, and the PSNR is labeled in blue text. It can be observed that the results obtained by the traditional FSPI method exhibit blurriness with the significant loss of fine details, only the outline information is preserved. The quality of the reconstruction results obtained by the compressed sensing method and the U-Net method have been improved to some extent. The quality of the reconstruction results obtained by the proposed method has been further improved, with the appearing clearer and preserving fine details more effectively. Especially as the sampling rate is reduced to 1%, the reconstruction results obtained by the traditional FSPI method, the compressed sensing method, and the U-Net method can no longer clearly distinguish the details of the animal and the coin, while clearer details can still be observed by the proposed method (more information about the iteration process of the animal can be found in Visualization 1). Figures 6(k) and 6(l) are the close up images indicated by the yellow dashed rectangles 1 and 2, respectively. The reconstruction result obtained by the proposed method are noticeably clearer than the traditional FSPI method, the compressed sensing method and the U-Net method (as indicated by the white arrows). Quantitatively, compared with the U-Net method, the SSIM of the reconstruction result for animal and coin obtained by the proposed method increase by 0.03 and 0.05, while the PSNR increased by 3.47 dB and 2.87 dB under the sampling rate of 5%. Under the sampling rate of 3%, the SSIM is increased by 0.03 and 0.06, while the PSNR increased by 2.21 dB and 2.03 dB, respectively. The proposed method is shown to have more advantages than the U-Net method under low sampling rates. As the sampling rate is further reduced to 1%, the SSIM is improved by 0.06 and 0.09, while the PSNR remains at a high level. The result further confirming the advantage of high-quality reconstruction obtained by the proposed method under the extremely low sampling rate (e.g., 1%).

Fig. 6. The reconstruction results obtained by different methods for animal and coin under various sampling rates, as well as the corresponding ground truth and Fourier spectra. (a1)-(d1) are the Fourier spectra corresponding to the reconstruction results of animal obtained by the traditional FSPI method, the compressed sensing method, the U-Net method and the proposed method under the sampling rate of 5%, respectively. (e1) is the ground truth. (f1)-(i1) are the Fourier spectra of the reconstruction results of coin obtained by the traditional FSPI method, the compressed sensing method, the U-Net method and the proposed method under the sampling rate of 5%, respectively. (j1) is the ground truth. (a2)-(d2) are the Fourier spectra corresponding to the reconstruction results of animal obtained by the traditional FSPI method, the compressed sensing method, the U-Net method and the proposed method under the sampling rate of 3%, respectively. (e2) is the ground truth. (f2)-(i2) are the reconstruction results of coin obtained by the traditional FSPI method, the compressed sensing method, the U-Net method and the proposed method under the sampling rate of 3%, respectively. (j2) is the ground truth. (a3)-(d3) are the Fourier spectra corresponding to the reconstruction results of animal obtained by the traditional FSPI method, the compressed sensing method, the U-Net method and the proposed method under the sampling rate of 1%, respectively. (e3) is the ground truth. (g3)-(i3) are the reconstruction results of coin obtained by the traditional FSPI method, the compressed sensing method, the U-Net method and the proposed method under the sampling rate of 1%, respectively. (j3) is the ground truth. (a4)-(j6) are the reconstruction results of inverse Fourier transform corresponding to (a1)-(j3). (k) is the close up images of the reconstruction results at the position of yellow box 1. (l) is the close up images of the reconstruction results at the position of yellow box 2. CS, compressed sensing; NP, normalized power spectral density; NI, normalized intensity.

Download Full Size | PDF

Figures 7(a)–7(h) show the error maps of the reconstruction results obtained under the sampling rate of 5%. Figures 7(i)–7(p) show the error maps of the reconstruction results obtained under the sampling rate of 3%. Figures 7(q)–7(x) show the error maps of the reconstruction results obtained under the sampling rate of 1%. Under different sampling rates, smaller errors and a closer resemblance to the ground truth are achieved by the proposed method compared with the traditional FSPI method, the compressed sensing method and the U-Net method. Particularly under the extremely low sampling rate (e.g., 1%), the proposed method demonstrates superior reconstruction capabilities with smaller errors in the reconstruction results. Figures 7(a1) and 7(b1) show the variation of SSIM and PSNR for the animal with the iteration. Figures 7(c1) and 7(d1) show the variation of SSIM and PSNR for the coin with the iteration. In general, the SSIM and PSNR exhibit rapid improvement within the first 1000 iterations and tend to stabilize around 1300th iteration. Under sampling rates of 1%, 3% and 5%, the SSIM of the animal images stabilized around 0.81, 0.91 and 0.94, respectively, while the PSNR stabilized around 20.94 dB, 23.77 dB and 25.70 dB, respectively. The SSIM of the coin images stabilized around 0.73, 0.84 and 0.88, respectively, while the PSNR stabilized around 21.19 dB, 25.59 dB and 27.61 dB, respectively. In conclusion, the results show that the proposed method outperforms the traditional FSPI methods, the compressed sensing method and the U-Net method under low sampling rates (e.g., 5% or 3%) and exhibits even more remarkable performance under the extremely low sampling rate (e.g., 1%).

Fig. 7. Error maps and iteration curves of the reconstruction results for the animal and coin. (a)-(x) are error maps of the reconstruction results for the animal and coin obtained by the traditional FSPI method, the compressed sensing method, the U-Net method and the proposed method under sampling rates of 5%, 3% and 1%, respectively. (a1) and (b1) are the variation of SSIM and PSNR for animal with the iteration under different sampling rates. (c1) and (d1) are the variation of SSIM and PSNR for coin with the iteration under different sampling rates. CS, compressed sensing.

Download Full Size | PDF

3.3 Experiment results

To validate the effectiveness of the proposed method experimentally, the acquisition system shown in Fig. 1(b) was employed to obtain low-frequency Fourier spectrum of the target (used as the consistency term for the model reconstruction). The acquired low-frequency Fourier spectrum and random noise were fed into the pre-trained model for iterative reconstruction. A total of 21 × 21 Fourier coefficients were collected, and the size of the reconstruction result was 256 × 256, resulting in an approximate sampling rate of 1%. Figures 8(a)–8(h) and 8(i)–8(p) show the iterative process for the obverse and reverse sides of the coin. The reconstruction begins with the random noise and the approximate outline of the coin becomes visible after 400th iteration. As the number of iteration increases, the noise gradually diminishes and the coin is essentially fully reconstructed at the 700th iteration. Figures 8(a1)–8(p1) are the Fourier spectra corresponding to Figs. 8(a)–8(p). The number of iteration is labeled in yellow text in the lower right corner. It can be observed that the high-frequency components of the reconstruction results obtained by the proposed method are gradually recovered from the initial random noise (more information about the iteration process can be found in Visualization 2). The effectiveness of the proposed method has been demonstrated by the experimental results. Even in the presence of extremely low sampling rates (e.g., 1%), high-quality images can be reconstructed using real-sampled data.

Fig. 8. The iterative process for the real-sampled data of the coin. (a)-(h) are the iterative reconstruction process for the obverse side of the coin. (i)-(p) are the iterative reconstruction process for the obverse side of the coin. (a1)-(p1) are the Fourier spectra corresponding to (a)-(p). The yellow text in the bottom right corner is the number of iteration. NP, normalized power spectral density; NI, normalized intensity.

Download Full Size | PDF

Since the ground truth is difficult to acquire in experiments, the obverse and reverse sides of the coin were photographed by the aforementioned CCD camera and cropped to serve as the reference image (RE, note not the ground truth). Figures 9(a)–9(c) show the Fourier spectra of the reconstruction results for the obverse side obtained by the traditional FSPI method, the U-Net method and the proposed method, respectively. Figures 9(d)–9(f) are the reconstruction results corresponding to Figs. 9(a)–9(c). Figure 9(g) is the reference image. Figures 9(h)–9(j) show the Fourier spectra of the reconstruction results for the reverse side obtained by the traditional FSPI method, the U-Net method and the proposed method, respectively, Figs. 9(k)–9(m) are the reconstruction results corresponding to Figs. 9(h)–9(j), Fig. 9(n) is the reference image. In the experiment, the loss of high-frequency information leads to the relatively poor reconstruction results obtained by the traditional FSPI method, while the U-Net method also shows certain limitations in the recovery of the high-frequency information. Yet, the proposed method can recover more abundant high-frequency information. Compared with the traditional FSPI method and the U-Net method, the reconstruction results obtained by the proposed method exhibit higher clarity. The SSIM and PSNR of the reconstruction results were unable to calculate due to the absence of the ground truth. On the other hand, the edges of the image are often associated with the shape, boundary and texture features of the target. Therefore, the edge intensity-based score [48–52] is adopted to evaluate the image quality of the reconstruction results. The score is defined by the average edge intensity and adding 0.5 times the maximum edge intensity. The edge intensity-based score is labeled in blue text in Fig. 9. Quantitatively, for the obverse and reverse sides of the coin, the edge intensity-based score obtained by the proposed method are 1.80 and 1.81, respectively, which are 1.21 and 1.33 higher than that obtained by the traditional FSPI method, and 1.07 and 1.00 higher than that obtained by the U-Net method. The performance of the reconstruction results is significantly superior to those obtained by the traditional FSPI method and the U-Net method. The result further substantiates the outstanding reconstruction performance of the proposed method in the experiment.

Fig. 9. Practical reconstruction results of the coin. (a)-(c) are Fourier spectra corresponding to the practical reconstruction results of the obverse side of the coin obtained by the traditional FSPI method, the U-Net method and the proposed method. (d)-(f) are the practical reconstruction results of the obverse side of the coin obtained by the traditional FSPI method, the U-Net method and the proposed method. (g) is the reference image of the obverse side of the coin. (h)-(j) are the Fourier spectra corresponding to the practical reconstruction results of the reverse side of the coin obtained by the traditional FSPI method, the U-Net method and the proposed method. (k)-(m) are the practical reconstruction results of the reverse side of the coin obtained by the traditional FSPI method, the U-Net method and the proposed method. (n) is the reference image of the reverse side of the coin. The blue value in the upper right corner of the reconstruction results is the edge intensity-based score. (o)-(r) correspond to close-up images indicated by the yellow dashed boxes 1, 2, 3 and 4, respectively. (s)-(v) represent the signal distribution indicated by the white dashed lines in (o), (p), (q) and (r), respectively. Re, reference image; NP, normalized power spectral density; NI, normalized intensity.

Download Full Size | PDF

Figures 9(o)–9(r) are the close up images indicated by yellow dashed rectangles 1-4 in Figs. 9(g) and 9(n), respectively. Compared with the traditional FSPI method and the U-Net method, the reconstruction results obtained by the proposed method can better recover the contour and pattern of the coin. Figures 9(s)–9(v) show the signal distribution indicated by the white dashed lines in Figs. 9(o)–9(r), respectively. The signal distribution obtained by the proposed method can more effectively depict the texture of the coin (indicated by the black arrows), exhibiting sharper characteristics at the texture edges (reflecting the improvement of resolution). In contrast, the signal distribution obtained by the traditional FSPI method and the U-Net method appear smoother, rendering texture recognition more challenging. The results verify the significant advantages of the proposed method in FSPI under the extremely low sampling rate.

4. Discussion and conclusion

In conclusion, a novel reconstruction method for FSPI via diffusion model was proposed to address the issue of low reconstruction quality in FSPI under low sampling rates. Throughout the training phase, Gaussian noise is continuously added to the training data in order to perturb the data distribution, thereby learning prior information. During the reconstruction phase, the low-frequency Fourier spectrum of the target is obtained and integrated as the consistency term into the iterative process of model. During each iteration, the low-frequency component of the model-predicted result is replaced with the acquired low-frequency Fourier spectrum, and high-quality reconstruction can be achieved after multiple iterations. Simulation and experiment were carried out to evaluate the performance of the proposed method, and comparisons were made with the traditional FSPI method and the U-Net method. For the coin simulation data, under the sampling rate of 3%, the SSIM and PSNR of the proposed method are 0.91 and 27.88 dB, which represent an improvement of 0.06 (∼ 7%) and 2.03 dB (∼ 8%) compared with the U-Net method, and a notable increase of 0.13 (∼ 17%) and 4.75 dB (∼ 21%) compared with the traditional FSPI method. Furthermore, superior performance is exhibited by the proposed method under the extremely low sampling rate (e.g.,1%). The SSIM and PSNR of the proposed method are 0.83 and 24.44 dB, which represent an improvement of 0.09 (∼ 12%) and 1.52 dB (∼ 7%) compared to the U-Net method, and a notable increase of 0.24 (∼ 17%) and 6.98 dB (∼ 40%) compared with the traditional FSPI method. Regarding the experimental data for the obverse and reverse sides of coin, under the extremely low sampling rate (e.g., 1%), the edge intensity-based score obtained by the proposed method are 1.80 and 1.81, respectively, which are 1.21 (∼ 205%) and 1.33 (∼ 277%) higher than that obtained by the traditional FSPI method, and 1.07 (∼ 147%) and 1.00 (∼ 123%) higher than that obtained by the U-Net method. The simulation and experimental results demonstrate that the proposed method exhibits a significant advantage in reconstruction under low sampling rates compared with the U-Net method and the traditional FSPI method. The advantage becomes even more pronounced at extremely low sampling rate (e.g., 1%).

The proposed method has certain limitations. On the one hand, since the proposed method is a strategy of model-based iterative reconstruction, there is a trade-off between reconstruction speed and reconstruction quality. The reconstruction time is related not only to the compute capability of computing unit, but also to the number of iterations. The computation is performed on a graphical processing unit (GPU; GeForce RTX 3060Ti) in the experiment. During the training phase, one checkpoint is saved for every 10000 epochs completed. Thirty checkpoints were obtained in the experiment and the best training model was selected. For the animal dataset, the training time is ∼ 18.6 h, with an average of 0.62 h per checkpoint. For the coin dataset, the training time is ∼ 24.6 h, with an average of 0.82 h per checkpoint. Reconstruction is an iterative process, and the reconstruction time is related to the number of iterations. In general, the SSIM and PSNR exhibit rapid improvement within the first 1000 iterations and tend to stabilize around the 1300th iteration, as shown in Figs. 7(a1)–7(d1). For the animal and coin datasets presented in Fig. 6 of the manuscript, the training times for the proposed method are ∼ 18.6 h and ∼ 24.6 h, respectively, while the reconstruction time are ∼ 61.1 s and ∼ 101.4 s (∼ 1300 iterations). Since the traditional FSPI method does not require pre-training, reconstruction of the target can be achieved simply by zero-padding the collected Fourier spectra and applying the inverse Fourier transform. The compressed sensing method does not require pre-training, while the reconstruction process involves iterative optimization. The reconstruction of large images results in significant demands on memory and noticeably increases the reconstruction time. For the U-Net method, the network is a data-driven end-to-end network, which can directly use the pre-trained model to improve image clarity without iteration. Particularly, in terms of reconstruction quality, the proposed method has obvious advantages over the other three methods, as shown in Fig. 6 in the manuscript. The proposed method can achieve higher-quality reconstruction even under the extremely low sampling rate (e.g., 1%) compared with the traditional FSPI method, the compressed sensing method and the U-Net method. On the other hand, deep learning methods have generalization problems due to limited datasets. It is advisable to use the same type of data for both training and reconstruction when higher-quality images need to be reconstructed. Future improvements can be made from reconstruction speed and model generalization. To further enhance the reconstruction speed, consideration can be given to faster models, such as IR-SDE [53]. This method adjusts the forward process relative to the score-based diffusion model used in the proposed method, so that high-quality images to degrade into low-quality images (instead of pure noise), while high-quality images can be recovered through corresponding reverse-time SDE. The main goal of the strategy is to effectively increase the reconstruction speed without sacrificing too much reconstruction quality. In order to enhance the generalization of the model, the generalization of the model can be improved through expansion of the dataset (such as increasing the type and quantity of the dataset) or data augmentation on the existing data.

In practical imaging, the response of the single-pixel photodetector is different from the ideal value. It can be seen from Eq. (3) that the Fourier coefficient of the target has a linear relationship with the response of the detector. Under ideal conditions, the coefficient $\beta$ in the detector response ${D_\varphi }({{f_x},{f_y}} )$ (indicated by the Eq. (2)) should be equal to 1, the response of the detector is the inner product calculation between the target image and the Fourier basis pattern. In practical imaging, considering the influence of coefficient $\beta$, the obtained detector response ${D_\varphi }({{f_x},{f_y}} )$ differs from the ideal value while maintaining a linear relationship (characterized by the coefficient $\beta$). Therefore, the range of the detector response in practical imaging is different from the range of the ideal value, which will theoretically not affect the final imaging result.

The score-based diffusion model in the proposed method uses Gaussian white noise with zero mean. For the noise of non-Gaussian distribution, alternative diffusion models [54,55] like Cold Diffusion as introduced in Ref. [55] can be employed to handle various forms of noise (including Gaussian noise with non-zero mean values). This method has the potential to significantly improve the practical application scope of FSPI. Overall, only a small amount spectral needs to be acquired by the proposed method to reconstruct high-quality image, which will further enhance the imaging efficiency of FSPI and expand the application scope of FSPI.

Funding

National Natural Science Foundation of China (62265011, 62122033); Jiangxi Provincial Natural Science Foundation (20224BAB212006, 20232BAB202038); National Key Research and Development Program of China (2023YFF1204302).

Acknowledgments

The authors thanks Sihang Li from Ji luan Academy, Nanchang University for helpful discussions.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [56].

References

1. G. M. Gibson, S. D. Johnson, and M. J. Padgett, “Single-pixel imaging 12 years on: a review,” Opt. Express 28(19), 28190–28208 (2020). [CrossRef]

2. Z. Zhang, X. Ma, and J. Zhong, “Single-pixel imaging by means of Fourier spectrum acquisition,” Nat. Commun. 6(1), 6225 (2015). [CrossRef]

3. D. Zhou, J. Cao, H. Cui, et al., “Complementary Fourier single-pixel imaging,” Sensors 21(19), 6544 (2021). [CrossRef]

4. B. Sun, M. P. Edgar, R. Bowman, et al., “3D computational imaging with single-pixel detectors,” Science 340(6134), 844–847 (2013). [CrossRef]

5. S. S. Welsh, M. P. Edgar, R. Bowman, et al., “Fast full-color computational imaging with single-pixel detectors,” Opt. Express 21(20), 23068–23074 (2013). [CrossRef]

6. Z. Zhang, X. Wang, G. Zheng, et al., “Fast Fourier single-pixel imaging via binary illumination,” Sci. Rep. 7(1), 12029 (2017). [CrossRef]

7. Z. Zhang, X. Wang, G. Zheng, et al., “Hadamard single-pixel imaging versus Fourier single-pixel imaging,” Opt. Express 25(16), 19619–19639 (2017). [CrossRef]

8. J. Huang, D. Shi, K. Yuan, et al., “Computational-weighted Fourier single-pixel imaging via binary illumination,” Opt. Express 26(13), 16547–16559 (2018). [CrossRef]

9. A. Pastuszczak, R. Stojek, P. Wróbel, et al., “Differential real-time single-pixel imaging with Fourier domain regularization: applications to VIS-IR imaging and polarization imaging,” Opt. Express 29(17), 26685–26700 (2021). [CrossRef]

10. C. J. Hirschmugl and K. M. Gough, “Fourier transform infrared spectrochemical imaging: review of design and applications with a focal plane array and multiple beam synchrotron radiation source,” Appl. Spectrosc. 66(5), 475–491 (2012). [CrossRef]

11. X. Hu, H. Zhang, Q. Zhao, et al., “Single-pixel phase imaging by Fourier spectrum sampling,” Appl. Phys. Lett. 114(5), 051102 (2019). [CrossRef]

12. R. She, W. Liu, Y. Lu, et al., “Fourier single-pixel imaging in the terahertz regime,” Appl. Phys. Lett. 115(2), 021101 (2019). [CrossRef]

13. K. M. Czajkowski, A. Pastuszczak, and R. Kotyński, “Real-time single-pixel video imaging with Fourier domain regularization,” Opt. Express 26(16), 20009–20022 (2018). [CrossRef]

14. L. Bian, J. Suo, X. Hu, et al., “Efficient single pixel imaging in Fourier space,” J. Opt. 18(8), 085704 (2016). [CrossRef]

15. H. Deng, X. Gao, M. Ma, et al., “Fourier single-pixel imaging using fewer illumination patterns,” Appl. Phys. Lett. 114(22), 221906 (2019). [CrossRef]

16. S. Rizvi, J. Cao, K. Zhang, et al., “Deringing and denoising in extremely under-sampled Fourier single pixel imaging,” Opt. Express 28(5), 7360–7374 (2020). [CrossRef]

17. H. Peng, S. Qi, P. Qi, et al., “Ringing-free fast Fourier single-pixel imaging,” Opt. Lett. 47(5), 1017–1020 (2022). [CrossRef]

18. M. Wenwen, S. Dongfeng, H. Jian, et al., “Sparse Fourier single-pixel imaging,” Opt. Express 27(22), 31490–31503 (2019). [CrossRef]

19. Y. Xiao, L. Zhou, and W. Chen, “Fourier spectrum retrieval in single-pixel imaging,” IEEE Photonics J. 11(2), 1–11 (2019). [CrossRef]

20. S. Rizvi, J. Cao, K. Zhang, et al., “Improving imaging quality of real-time Fourier single-pixel imaging via deep learning,” Sensors 19(19), 4190 (2019). [CrossRef]

21. J. Choi, S. Kim, Y. Jeong, et al., “Ilvr: Conditioning method for denoising diffusion probabilistic models,” arXiv, arXiv:2108.02938 (2021). [CrossRef]

22. A. Lugmayr, M. Danelljan, A. Romero, et al., “Repaint: Inpainting using denoising diffusion probabilistic models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), 11461–11471.

23. Y. Lu, S. Wu, Y. W. Tai, et al., “Image generation from sketch constraint using contextual gan,” in Proceedings of the European conference on computer vision (ECCV) (2018), 205–220.

24. L. Dinh, D. Krueger, and Y. Bengio, “Nice: Non-linear independent components estimation,” arXiv, arXiv:1410.8516 (2014). [CrossRef]

25. A. Graves, “Generating sequences with recurrent neural networks,” arXiv, arXiv:1308.0850 (2013). [CrossRef]

26. R. Lopez, J. Regier, M. I. Jordan, et al., “Information constraints on auto-encoding variational bayes,” in Advances in neural information processing systems (NeurIPS) (2018), 6114–6125.

27. A. Van Den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in International conference on machine learning (ICML) (2016), 1747–1756.

28. I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets,” in Advances in Neural Information Processing Systems (NeurIPS) (2014), 2672–2680.

29. N. Karim and N. Rahnavard, “Spi-gan: Towards single-pixel imaging through generative adversarial network,” arXiv, arXiv:2107.01330 (2021). [CrossRef]

30. Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” in Advances in neural information processing systems (NeurIPS) (2019), 11918–11930.

31. Y. Song, J. Sohl-Dickstein, D. P. Kingma, et al., “Score-based generative modeling through stochastic differential equations,” arXiv, arXiv:2011.13456 (2020). [CrossRef]

32. R. W. Floyd, “An adaptive algorithm for spatial gray-scale,” Proc. Soc. Inf. 17, 75–77 (1976).

33. H. Chen, J. Shi, X. Liu, et al., “Single-pixel non-imaging object recognition by means of Fourier spectrum acquisition,” Opt. Commun. 413, 269–275 (2018). [CrossRef]

34. Z. D. Liu, Z. G. Li, Y. N. Zhao, et al., “Fourier Single-Pixel Imaging Via Arbitrary Illumination Patterns,” Phys. Rev. Applied 19(4), 044025 (2023). [CrossRef]

35. Y. Ma, Y. Yin, S. Jiang, et al., “Single pixel 3D imaging with phase-shifting fringe projection,” Opt. Laser Eng. 140, 106532 (2021). [CrossRef]

36. Z. Y. Liang, Z. D. Cheng, Y. Y. Liu, et al., “Fast Fourier single-pixel imaging based on Sierra–Lite dithering algorithm,” Chinese Phys. B 28(6), 064202 (2019). [CrossRef]

37. Z. Zhang and J. Zhong, “Single-pixel Broadcast Imaging,” in Computational Optical Sensing and Imaging (COSI) (2015), CT4F–3.

38. Y. Zhang, “Line diffusion: a parallel error diffusion algorithm for digital halftoning,” Vis. Comput. 12(1), 40–46 (1996). [CrossRef]

39. P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Comput. 23(7), 1661–1674 (2011). [CrossRef]

40. G. Parisi, “Correlation functions and computer simulations,” Nucl. Phys. 180(3), 378–384 (1981). [CrossRef]

41. C. Matthews and J. Weare, “Langevin Markov Chain Monte Carlo with stochastic gradients,” arXiv, arXiv:1805.08863 (2018). [CrossRef]

42. M. Izzatullah, T. Van Leeuwen, and D. Peter, “Bayesian seismic inversion: a fast sampling Langevin dynamics Markov chain Monte Carlo method,” Int. J. Geophys. 227(3), 1523–1553 (2021). [CrossRef]

43. X. Liu, Z. Li, J. Dong, et al., “Fast high-resolution imaging combining deep learning and single-pixel imaging,” in Third Optics Frontier Conference (OFS) (2023), 205–209.

44. R. N. Singarimbun, E. B. Nababan, and O. S. Sitompul, “Adaptive moment estimation to minimize square error in backpropagation algorithm,” in International Conference of Computer Science and Information Technology (ICoSNIKOM) (2019), 1–7.

45. W. K. Newey, “Adaptive estimation of regression models via moment restrictions,” J. Econom. 38(3), 301–339 (1988). [CrossRef]

46. L. Bian, J. Suo, Q. Dai, et al., “Experimental comparison of single-pixel imaging algorithms,” J. Opt. Soc. Am. A 35(1), 78–87 (2018). [CrossRef]

47. S. Guan, A. A. Khan, S. Sikdar, et al., “Fully Dense UNet for 2D Sparse Photoacoustic Tomography Artifact Removal,” IEEE J. Biomed. Health Inform. 24(2), 568–576 (2020). [CrossRef]

48. X. Zhang, X. Feng, W. Wang, et al., “Edge strength similarity for image quality assessment,” IEEE Signal Process. Lett. 20(4), 319–322 (2013). [CrossRef]

49. M. G. Martini, C. T. Hewage, and B. Villarini, “Image quality assessment based on edge preservation,” Signal Process. Image Commun. 27(8), 875–882 (2012). [CrossRef]

50. C. Feichtenhofer, H. Fassold, and P. Schallauer, “A perceptual image sharpness metric based on local edge gradient analysis,” IEEE Signal Process. Lett. 20(4), 379–382 (2013). [CrossRef]

51. Y. Lang and D. Zheng, “An improved Sobel edge detection operator,” in International Conference on Mechatronics, Computer and Education Informationization (MCEI) (2016), 590–593.

52. G. Zhai, W. Zhang, X. Yang, et al., “Image quality assessment metrics based on multi-scale edge presentation,” in IEEE Workshop on Signal Processing Systems Design and Implementation (SiPS) (2005), 331–336.

53. Z. Luo, K. F. Gustafsson, Z. Zhao, et al., “Image Restoration with Mean-Reverting Stochastic Differential Equations,” arXiv, arXiv:2301.11699 (2023). [CrossRef]

54. E. Hoogeboom and T. Salimans, “Blurring diffusion models,” arXiv, arXiv:2209.05557 (2022). [CrossRef]

55. A. Bansal, E. Borgnia, H. M. Chu, et al., “Cold diffusion: Inverting arbitrary image transforms without noise,” arXiv, arXiv:2208.09392 (2022). [CrossRef]

56. X. Song, X. Liu, Z. Luo, et al., “High-resolution iterative reconstruction at extremely low sampling rate for Fourier single-pixel imaging via diffusion model,” GitHub (2024), https://github.com/yqx7150/FSPI-DM.

Name	Description
Visualization 1	The iterative process for a dog under sampling rates of 5%, 3% and 1%, respectively.
Visualization 2	The iterative process for the real-sampled data of a coin under the sampling rate of 1%.

High-resolution iterative reconstruction at extremely low sampling rate for Fourier single-pixel imaging via diffusion model

Abstract

1. Introduction

2. Principles and methods

2.1 System design

2.2 Fourier single-pixel imaging principle

2.3 High-resolution iterative reconstruction based on diffusion model

2.3.1 Diffusion model

2.3.2 High-resolution reconstruction based on diffusion model

2.3 Dataset acquisition and network parameter settings

3. Results

3.1 Single-pixel imaging reconstruction

3.2 Numerical simulations

3.3 Experiment results

4. Discussion and conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Supplementary Material (2)

Data availability

Cited By

Figures (9)

Equations (12)

Optics Express