Dual-domain mean-reverting diffusion model-enhanced temporal compressive coherent diffraction imaging

Hao Li; Jinwei Xu; Xinyi Wu; Cong Wan; Weisheng Xu; Jianghao Xiong; Wenbo Wan; Qiegen Liu; Qiegen Liu

doi:10.1364/OE.517567

1. Introduction

Coherent diffraction imaging (CDI) [1] is a lensless imaging technique with the potential to achieve higher spatial resolution than direct imaging by generating high-resolution images containing both intensity and phase information from far-field coherent diffraction patterns. It leverages the coherence of light for imaging, yielding high-resolution and high-sensitivity imaging results. This technology finds widespread application in fields such as materials science, biology, and nanotechnology [2–5]. However, the inverse problem of CDI is ill-conditioned, and the quality of traditional phase recovery algorithms [6] is constrained by support constraints and high detection noise [7] resulting from single frame imaging. Several improvement methods have been proposed to mitigate excessive noise and low resolution associated with single frame imaging, including the single-shot phase imaging with a randomized light algorithm [8] and coherent modulation imaging [9], which represent key challenges in CDI. However, the temporal resolution of CDI is limited by the frame rate of the camera, which makes it unable to capture fast-moving small targets. Enhancing the imaging frame rate of CDI is a key issue that needs to be addressed.

With the intention of enhancing data collection efficiency, Duval et al. proposed compressed sensing (CS) [10]. It enables the capture of multiple time frames within a single frame, resulting in the acquisition of multiple images in a single compressed frame, thereby reducing data processing time and cost. This approach significantly enhances the quality of microscope imaging, lensless imaging, video snapshots, and holographic imaging [11–13]. Building upon the temporal CS theory, an algorithm known as temporal compressive coherent diffraction imaging (TC-CDI) was introduced by Chen et. al. [14]. The TC-CDI system integrates CS technology in the temporal domain to capture multi-frame images at a rapid pace and with high spatial resolution. These images are modulated by the Digital Micromirror Device (DMD) and compressed into a single frame in the temporal domain. TC-CDI system enables the reconstruction of up to 8 frames from a single snapshot measurement, thereby enhancing the ability of camera to detect and track fast-moving small targets with heightened sensitivity. While TC-CDI can achieve multi-frame dynamic target recovery, its spatial image recovery accuracy is limited, necessitating the incorporation of prior information from both the frequency domain and spatial domain to enhance the reconstruction quality.

In recent years, deep learning has experienced rapid advancement and has been successfully employed in CS, demonstrating its significant potential in the field of optical imaging and introducing innovative concepts to optical imaging research [15–17]. Recently, a diffusion model with powerful generative capabilities based on stochastic differential equations (SDE) was proposed [18], enabling prior information extraction and high-resolution sample generation. This development has notably enhanced the quality of image editing, medical imaging, and lensless imaging [19–22]. However, the traditional diffusion model necessitates multi-step iteration to achieve high-quality sample generation, resulting in substantial computational overhead and prolonged processing time cost. To expedite the conventional diffusion model, Luo et al. introduced image restoration with mean-reverting stochastic differential equations (IR-SDE) [23]. It only diffuse the image to a lower-quality state without introducing complete noise, effectively creating an intermediate state containing a low-quality (LQ) image and Gaussian noise.

To improve the image quality of CDI under high temporal frame rates with compression ratios, an algorithm termed as dual-domain mean-reverting diffusion model-enhanced temporal compressive coherent diffraction imaging (DMDTC) has been introduced. DMDTC acquires prior information of the spatial domain and frequency domain through sample learning. During the reconstruction process, it utilizes frequency prior information to recover the missing data in the frequency domain, thereby improving the accuracy of the subsequent phase retrieval. Additionally, DMDTC incorporates learned spatial prior information to further denoise and restore the image in the spatial domain, demonstrating significant potential for high-speed frame acquisition and high-quality image reconstruction.

The remainder of this research is organized as follows. Section 2 will provide an overview of the foundational principles of coherent diffraction and the fundamentals of IR-SDE. Section 3 will introduce the implementation of DMDTC as well as explain the training and reconstruction procedure in detail. Section 4 will demonstrate the reconstruction performance of this approach by conducting simulative and experimental experiments. Section 5 will discuss the results between different compressive ratio and conduct the ablation experiment. Section 6 will conclude this work and outlines the proposed approach.

2. Preliminary

2.1 Temporal compressive coherent diffraction imaging

The fundamental principle of TC-CDI is illustrated in Fig. 1. After the dynamic sample $O(x^{\prime},t)$ at time t is illuminated by the monochromatic surface light source ${U_0}(x^{\prime})$, the target in the field of view (FOV) is transformed into the frequency domain by the Fourier lens, where $x^{\prime}$ is the object plane coordinate. Subsequently, the light waves are modulated by digital micromirror device (DMD), whose modulation function can be expressed as $M(x,t)$, to achieve compressed sampling of multi-frame time domain information. The intensity information $I(x)$ is captured by the camera into a single snapshot, where x is the frequency domain coordinate.

Fig. 1. Time compression CDI principal diagram. ${U_0}(x)$ represents the incident light. $O(x^{\prime},t)$ represents the moving sample. $P(x^{\prime})$ represents the FOV. $M\textrm{(}x,t\textrm{)}$ is the dynamic random modulation function. $I(x)$ represents the intensity on the pixelated detector.

Download Full Size | PDF

According to the principle of Fraunhofer diffraction, $I(x)$ can be modeled as:

(1)$$I(x) = \int_{\varDelta t} {{{||{U_t^{(F )}(x,t)M(x,t)} ||}^2}dt}$$

where $\varDelta t$ represents the exposure time of the camera. The frequency domain light field in the DMD plane can be represented as:

(2)$$U_t^{(F )}(x) \propto F{\{{O(x^{\prime},t){U_0}(x^{\prime})P(x^{\prime})} \}_{\frac{{2\pi }}{{\lambda z}}x^{\prime}}}$$

where $F\{{\cdot} \}$ represents the Fourier transform and $P(x^{\prime})$ is FOV. In practical scenarios, discretize the Eq. (1). $M(x,t)$ is discretized into $M(:,:,t)$, which is composed of $\{{0,1} \}$. The measurement of camera $Y \in {{\mathbb R}^{W \times H}}$ can be expressed as:

(3)$$Y = \sum\limits_{t = 1}^T {{{||{U_t^{(F )}(:,:,t)} ||}^2} \odot M(:,:,t) + G} $$

where G is noise, t represents the index of frames and ${\odot} $ represents the Hadamard product. The vectorized form of Eq. (3) can be expressed as:

(4)$${\mathbf y} = {\Phi \mathbf u} + {\mathbf g}$$

where ${\mathbf \Phi } \in {{\mathbb R}^{WH \times WHT}}$ is the sensing matrix, which is composed of multiple diagonal matrices connected in series. The diagonal elements of each diagonal matrix ${{\mathbf \Phi }_t} = Diag({vec({M({:,:,t} )} )} )$ consist of $vec({M({:,:,t} )} )$. Based on CS, Eq. (3) can be solved as follows:

(5)$$\widehat {\mathbf u} = \arg \min ||{{\mathbf y - \varPhi \mathbf u}} ||_2^2 + \tau R({\mathbf u})$$

where $R({\mathbf u})$ represents regularization and $\tau $ is the balanced parameter. The solution to Eq. (5) can be achieved through iterative reconstruction or physics-driven deep expansion network [11]. While this approach enables the acquisition of multi-frame frequency domain images, it presents challenges in fully restoring the missing frequency domain information. Therefore, it is necessary to utilize prior information in the frequency domain to enhance the quality of the spectral graph recovery.

2.2 Mean-reverting SDE

Brownian motion is a stochastic process that introduces continuous, random noise into a system over time. The diffusion model simulates Brownian motion, gradually adds noise to Gaussian noise in the image, and then learns the process of inverse solution. This characteristic of random noise makes diffusion model a valuable tool for describing various natural phenomena, such as particle movement, fluctuations in financial markets. The transformation of images from high quality to Gaussian noise can be viewed as a continuous addition of noise. This process can be characterized as a stochastic process, where the high-resolution image is considered a deterministic signal and the noise is treated as a stochastic process. To better replicate this process, it is modeled as a solution to an SDE:

(6) $$dx = f(x,t)dt + g(t)dw$$

where $dw$ is Gaussian noise, and the drift equation $f(x,t)$ describes the interaction between the image signal and the noise, which determines the direction and intensity of the influence of the noise on the image signal. The dispersion equation $g(t)$ describes the diffusion speed of noise and determines the degree of impact of noise on the image signal.

However, the traditional diffusion model necessitates multi-step iteration to achieve high-quality sample generation, resulting in substantial computational overhead and prolonged processing time cost. Luo et al. [23] proposed image restoration with mean-reverting stochastic differential equations (IR-SDE), which provides a new idea for the accelerated diffusion model.

IR-SDE adds a parameter $\mu $ to the original model, indicating that the diffusion process is from the high-quality image ${x_0}$ to the low-quality target $\mu $, then the Eq. (6) can be converted as:

(7)$$dx = {\theta _t}(\mu - x)dt + {\sigma _t}dw$$

After a certain number of steps, the entire SDE will flow to $\mu $ and stable Gaussian noise $\varepsilon $. Let ${{\sigma _t^2} / {{\theta _t}}} = 2{\lambda ^2}$, then given any starting state $x(s)$ at time $s < t$, Eq. (7) can be solved as:

(8)$$x(t) = \mu + (x(s) - \mu ){e^{ - {{\overline \theta }_{s:t}}}} + \int_s^t {{\sigma _z}} {e^{ - {{\overline \theta }_{z:t}}}}dw(z)$$

where ${\bar{\theta }_{s:t}} = \int_s^t {{\theta _z}} dz$ is known and the transfer kernel $p({x_i}|{x_0}) = N({x_i}|{m_t}(x),{v_t})$ is a Gaussian with mean ${m_t}(x)$ and variance ${v_t}$. ${m_t}(x)$ and ${v_t}$ can be solved from the following equation:

(9)$$\left\{ \begin{array}{l} {m_t}(x )= \mu + ({{x_0} - \mu } ){e^{ - {{\overline \theta }_t}}}\\ {v_t} = \int_0^t {\sigma_z^2{e^{ - 2{{\overline \theta }_{z:t}}}}dz} = {\lambda^2}(1 - {e^{ - 2{{\overline \theta }_t}}}) \end{array} \right.$$

An overview is presented in Fig. 2, and the diffusion process can be inverted from $\mu $ to high-quality images by using numerical techniques:

(10)$$dx = [{{\theta_t}(\mu - x) - \sigma_t^2{\nabla_x}\log {p_t}(x)} ]dt + {\sigma _t}d\widehat w$$

where the only uncertain component is the score ${\nabla _x}\log {p_t}(x)$ of the temporal data distribution, which can be obtained during the training process. As long as the network obtains the noise level ${\varepsilon _t}$ at each time node from ${x_0}$ to $\mu $, it can determine the score function of the data distribution at a specific moment. To achieve this goal, the instantaneous noise is estimated by exploiting the conditional time correlation network ${\widetilde \varepsilon _\theta }({x_i},\mu ,t)$.

Fig. 2. An overview of mean-reverting stochastic differential equations.

Download Full Size | PDF

However, employing neural networks directly to learn instantaneous noise is a challenging task that can potentially result in unstable training. To solve this problem, the maximum likelihood estimation method is used to calculate the optimal path for image restoration. This method not only facilitates the model in estimating the optimal direction for image restoration but also produces an output that closely resembles the real image. Training process can be expressed as follows:

(11)$$x_{i - 1}^\ast{=} \arg \mathop {\min }\limits_{{x_{i - 1}}} [{ - \log p({x_{i - 1}})|{x_i},{x_0}} ]$$

where $x_{i - 1}^\ast $ represents the ideal state reversed from ${x_i}$. According to the Bayes rule, the optimum reversing solution for ${x_i} \to {x_{i - 1}}$ can be expressed as:

(12)$$x_{i - 1}^\ast{=} \frac{{1 - {e^{ - 2{{\overline \theta }_{i - 1}}}}}}{{1 - {e^{ - 2{{\overline \theta }_i}}}}}{e^{ - \theta _i^{\prime}}}({x_i} - \mu ) + \frac{{1 - {e^{ - 2\theta _i^{\prime}}}}}{{1 - {e^{ - 2{{\overline \theta }_i}}}}}{e^{ - {{\overline \theta }_{i - 1}}}}({x_0} - \mu ) + \mu $$

Subsequently, the noise network ${\widetilde \varepsilon _\theta }({x_i},\mu ,t)$ is optimized so that the mean-reverting SDE is aligned with the optimal trajectory as follows:

(13)$${L_\gamma }(\phi ) = \sum\limits_{t = 1}^T {{\gamma _i}} E[{||{{x_i} - {{(d{x_i})}_{{{\widetilde \varepsilon }_{_\phi }}}} - x_{i - 1}^\ast } ||} ]$$

where ${\gamma _1},\ldots ,{\gamma _T}$ are positive weights and ${(d{x_i})_{{{\tilde{\varepsilon }}_\phi }}}$ represents the reverse time SDE in Eq. (12), whose score is calculated by the network ${\widetilde \varepsilon _\phi }$, as well as E stands for mathematical expectation. Replacing the instantaneous noise loss function with this simple loss function results in a more stable noise network, leading to a continuous improvement in image denoising performance. According to the loss function, the state of the low-resolution image at the previous moment can be found, and then the high-quality image that when $t = 0$ can be deduced.

3. Method

To achieve high-quality reconstruction of TC-CDI with a high compression ratio, this paper proposed a method termed as dual-domain mean-reverting diffusion model-enhanced temporal compressive coherent diffraction imaging (DMDTC), the main procedure of which is presented in Fig. 3.

Fig. 3. The main procedure of DMDTC. Through the training phase, the full-sample and the under-sampled image are input into the frequency domain and spatial domain scoring network in pairs to obtain the dual-domain prior information. In the reconstruction stage, the initial step involves the time domain unfolding of the frequency domain image, followed by the utilization of frequency domain prior information to facilitate the recovery of spatial domain image. Subsequently, the resulting image is fed into the hybrid input-output algorithm for spatial image reconstruction. Finally, the spatial domain image undergoes denoising based on spatial domain prior information.

Download Full Size | PDF

With the aim of obtaining the prior information of frequency domain and spatial domain image distribution, two score networks were set for training. In the frequency domain scoring network, the fully sampled spectrogram $U_0^{(F)}$ and the sparse spectrogram ${\mu ^{(F)}}$ sampled by random masks are input into the scoring network in pairs. In the neural network, $U_0^{(F)}$ can be gradually introduced Gaussian noise making it approach the ${\mu ^{(F)}}$. This process can be expressed as:

(14)$$d{U^{(F)}} = {\theta _t}({\mu ^{(F)}} - U_0^{(F)})dt + {\sigma _t}dw$$

In the spatial domain scoring network, the fully sampled spatial domain image $U_0^{(S)}$ and the sparse spatial image ${\mu ^{(S)}}$ under-sampled by random masks and artifact blocks are put into the scoring network in pairs. Then Eq. (7) can be remodeled as:

(15)$$d{U^{(S)}} = {\theta _t}({\mu ^{(S)}} - U_0^{(S)})dt + {\sigma _t}dw$$

By solving Eq. (14) and Eq. (15), the instantaneous noise distribution of the process can be estimated as:

(16)$$\begin{cases}S_{\theta} ^{(F)} (U^{(F)}, t)\simeq \nabla _{U^{(F)}}\log p_t(U^{(F)}) \hfill \cr S_\theta ^{(S)} (U^{(F)}, t)\simeq \nabla _{U^{(S)}}\log p_t(U^{(S)})\end{cases}.$$

In the reconstruction stage, the time-domain compressed spectrogram is restored to $U_r^{(F)}$ using the method described in Section 2.1, where $r = 1,2,3, \cdots ,R$ is the index of frequency domain frames. These reconstructed frequency-domain frames $U_r^{(F)}$, which are then employed masks by Hadamard product, are then individually put into the trained frequency domain scoring network. Using the score distribution $S_\theta ^{(F)}$ obtained during the training phase, the SDE is iteratively inverted. The process can be remodeled as:

(17)$$d{\hat{U}_r}^{(F)} = [{{\theta_t}(U_r^{(F)} - {{\hat{U}}_r}^{(F)}) - \sigma {{(t)}^2}S_\theta^{(F)}} ]dt + {\sigma _t}d\widehat w$$

Since the measured spectral graph only has intensity information, the phase information of the image is obtained and the spatial image is recovered using a hybrid input-output algorithm:

(18)$$f_i^{\prime}[n] = {F^{ - 1}}\left\{ {|{\hat{U}_r^{(F)}} |\frac{{{F_i}[k]}}{{|{{F_i}[k]} |}}} \right\}$$

In the iteration, use Eq. (19) to maintain fidelity, where ${F_i}$ and ${f_i}$ respectively represent the signal in the frequency domain and the spatial domain image at the $i - th$ iteration, F and ${F^{ - 1}}$ represents the Fourier transform and inverse Fourier transform.

(19)$${f_{i + 1}}[n] = \left\{ \begin{array}{l} f_i^{\prime}[n],\quad \quad \quad \qquad \textrm{ }if\textrm{ n} \notin S\\ {f_i}[n] - \beta f_i^{\prime}[n],\quad\textrm{ }if\textrm{ n} \in S \end{array} \right.$$

where the one-to-one coordinates of n and k in the corresponding space, and $\beta $ is a constant parameter. S represents the support set, and constraints are imposed on those exceeding the support set. During each iteration, Eq. (18) is utilized, where the reconstructed frequency domain frames $\hat{U}_r^{(F)}$ serve as fidelity terms, enabling the imposition of amplitude constraints in the Fourier domain. Subsequently, the inverse Fourier transform is conducted to convert the target into the spatial domain. Constraint is carried out through Eq. (19), and then the DNN network is used for denoising. After numerous iterations, the spatial image $U_r^{(S)}$, $r = 1,2,3,\ldots ,R$, can be successfully reconstructed.

To further improve the reconstruction quality, spatial prior information is incorporated for constraints. The reconstructed spatial domain image $U_r^{(S)}$ is used as the mean input for the IR-SDE network. The process can be remodeled as:

(20)$$d\hat{U}_r^{(S)} = [{{\theta_t}(U_r^{(S)} - \hat{U}_r^{(S)}) - {\sigma_t}{{(t)}^2}S_\theta^{(S)}} ]dt + {\sigma _t}d\widehat w$$

After iteratively applying reverse SDE, the high-quality spatial images $\hat{U}_r^{(S)}$ can be generated by the network. Algorithm 1 states the pseudocode of DMDTC.

Algorithm 1. DMDTC

View Table | View all tables in this article

4. Experiments

4.1 Data specification and parameter selection

When training DMDTC, the MINIST dataset is utilized, which contains a diverse range of handwritten numerical styles to construct the training data. A total of 9,000 binarized images from the MINIST dataset are randomly selected without repetition. The 28 × 28 image within the MNIST dataset underwent a process of upscaling to dimensions of 40 × 40. Then they are embedded in center of a black background sized 512 × 512 for spatial domain training. This dataset is utilized as the GT input for the spatial domain training network, following which the GT is under-sampled using random artifact blocks and random masks to serve as the LQ input for the spatial domain training network. For training the frequency domain network, the dataset is converted to the frequency domain by Fourier transform. The frequency domain dataset is modelled to simulate the detector. This dataset is utilized as the GT input for the frequency domain training network, following which the GT is under-sampled using random masks to serve as the LQ input for the frequency domain training network. The number of discretization steps for the Mean-reversing SDE T are set to 100, noise scale ${\sigma _t}$ is set to 25, the number of frames R are set to 20 and the DNN-HIO steps L are set to 60.

The mask values in frequency and spatial domain training used for sampling are {0, 0.7}. Random artifact blocks in spatial domain training with edge sizes ranging from 0 to 3 and gray values uniformly distributed from 0 to 200 are utilized for under-sampled images. Two networks are trained using the Adam optimizer with a learning rate set to 0.0001. During the image reconstruction phase, 100 iterations are performed for frequency and spatial domain training. This research is implemented on an NVIDIA GeForce RTX 3090 Ti graphics processor with 24GB memory, and all experiments were conducted in PyTorch.

4.2 Simulative validation

Simulative validation is conducted to evaluate the effectiveness of the proposed DMDTC in coherent diffraction imaging reconstruction.

Simulative experiments are conducted using an additional 20 images selected from the MINIST dataset apart from training dataset as imaging targets. The Fourier transform is subsequently applied to these 20 images, followed by a Hadamard product with masks to under-sample the images. The frequency domain data is modelled to simulate the detector. The 20 images are then compressed into a single frame measurement for reconstruction. To demonstrate the superiority of the DMDTC, additional TC-CDI experiment is also conducted for comparative purposes. The ground truth (GT) is compared with the reconstructed images obtained through TC-CDI and DMDTC methods, which is illustrated in Fig. 4. To facilitate a clearer observation of the experimental results, the images sized 512 × 512 are cropped to 128 × 128, with the region of interest serving as the center.

Fig. 4. Comparison experiments in the MINIST dataset. (a) is ground truth. (b) is the result of TC-CDI. (c) is the result of DMDTC. (d) is the profile on the red line of images.

Download Full Size | PDF

The proposed DMDTC reconstruction closely resembled the GT, displaying sharp edges and minimal noise. In contrast, the reconstructed results of TC-CDI exhibit low brightness, blurred outlines, and noticeable artifacts. As depicted in Fig. 4(d), the DMDTC plots exhibit higher similarity to the GT plots and less noise.

To quantitatively assess the experiments in this study, peak signal-to-noise ratio (PSNR) and structural similarity index measurement (SSIM) were employed as evaluation metrics.

The PSNR is a metric utilized to gauge the relationship between signal and noise. The calculation formula for PSNR is as follows:

(21)$$ \operatorname{PSNR}\left(U^{(S)}, \hat{U}^{(S)}\right)=10 \lg \frac{255^2}{\operatorname{MSE}\left(U^{(S)}, \hat{U}^{(S)}\right)} $$

where $U^{(S)}$ and $\hat{U}^{(S)}$ represent the target image and reconstructed image respectively. MSE stands for mean square error, which represents the average of the squared differences of each pixel value between $U^{(S)}$ and $\hat{U}^{(S)}$. The larger the PSNR value, the better the quality of the reconstructed image

The SSIM is an index that measures the structural similarity of two images.

(22)$$SSIM({U^{(S)}},{\hat{U}^{(S)}}) = \frac{{(2{\mu _{{U^{(S)}}}}{\mu _{{{\hat{U}}^{(S)}}}} + {c_1})(2{\sigma _{{U^{(S)}}{{\hat{U}}^{(S)}}}} + {c_2})}}{{(\mu _{{U^{(S)}}}^2 + \mu _{{{\hat{U}}^{(S)}}}^2 + {c_1})(\sigma _U^2 + \sigma _{{{\hat{U}}^{(S)}}}^2 + {c_2})}}$$

where ${\mu _{{U^{(S)}}}}$ and ${\mu _{{{\hat{U}}^{(S)}}}}$ represent the average value of the image. ${\sigma _{{U^{(S)}}}}$ and ${\sigma _{{{\hat{U}}^{(S)}}}}$ represent the variance of the image. ${\sigma _{{U^{(S)}}{{\hat{U}}^{(S)}}}}$ represents the covariance of the image. ${c_1}$ and ${c_2}$ are constants used for maintaining stability. The value range of SSIM is [0,1]. The greater the value, the lesser the image distortion.

Quantitative analysis results of simulative validation are presented in Table 1, with the values achieved by averaging the 20 reconstructed images. In comparison, the average PSNR of DMDTC is 3.12 dB higher than that of TC-CDI, and the SSIM value is as high as 0.9870, which demonstrated the effectiveness of the proposed DMDTC in coherent diffraction imaging reconstruction.

Table 1. Average PSNR and SSIM of simulative validation

View Table | View all tables in this article

4.3 Generalization validation

To validate the efficacy of the proposed DMDTC across diverse datasets and its capability in simulating the process of actual camera sampling, two sets of generalization experiments are conducted.

The first set of experiments involved more intricate images sized 28 × 28, sourced from the quick drawing dataset. These images were then directly superimposed onto a black background sized 512 × 512. Subsequently, the images underwent a Hadamard product operation with masks for under-sampling, followed by transformation into frequency domain representations using Fourier transformation. After this transformation, the 20 frequency domain images were compressed into a single measurement for reconstruction. The reconstruction results are visually depicted in Fig. 5. To improve the visibility and interpretability of the experimental results, the original images sized 512 × 512 were cropped to 128 × 128, with the central region of interest serving as the focal point for analysis.

Fig. 5. The reconstructed results of complex 28 × 28 patterns.

Download Full Size | PDF

In the case of the more intricate pattern with a size of 28 × 28, the proposed DMDTC method demonstrates superior reconstruction performance. The reconstruction outcomes of DMDTC show enhanced object contours, increased level of details, more uniform pixel distribution, and reduced noise and artifacts. Conversely, the reconstruction results of the traditional TC-CDI method exhibit unclear contours, higher levels of noise and artifacts, and uneven pixel distribution.

In terms of quantitative analysis, the PSNR and SSIM of the reconstruction results obtained by the proposed DMDTC surpass those obtained using the conventional TC-CDI method. The highest PSNR and SSIM values achieved are 32.89 and 0.9947, respectively. Moreover, the SSIM value of the reconstruction results exceed 0.99, indicating a notable enhancement in image quality and fidelity compared to the traditional TC-CDI approach.

The second experiment involved reconstructing a dynamic target scan to simulate real camera functionality. A binary representation of the character “Nangchang” was created using the Times New Roman font for reconstruction, as illustrated in Fig. 6.

Fig. 6. An image of “Nanchang” for dynamic measurement. The red box is the field of view.

Download Full Size | PDF

The red box sized 40 × 50 in Fig. 6 symbolizing the FOV $P(x^{\prime})$, each frame moved 11 pixels to the right on the target object. In total, 20 frames were generated. Subsequently, Fourier transform is applied to the 20 images, and the under-sampled data is obtained through the Hadamard product with masks. These 20 under-sampled frequency domain images were then combined into a single snapshot for reconstruction. The reconstruction results and normalized residual obtained through TC-CDI and DMDTC are depicted in Fig. 7.

Fig. 7. Generalization validation results. (a) depicts the ground truth. (b1) shows the result of TC-CDI. (b2) displays the residual between (a) and (b1). (c1) illustrates the result of DMDTC. (c2) exhibits the residual between (a) and (c1).

Download Full Size | PDF

As depicted in Fig. 7(b1) and (c1), the artifacts present in the reconstructed images generated by DMDTC are minimal, and the pixel intensity closely approximates that of the ground truth (GT). Conversely, the artifacts in the reconstructed images produced by TC-CDI are conspicuous, with residuals distributed extensively throughout the reconstructed results, leading to an uneven pixel distribution in the reconstructed images. This observation signifies a substantial enhancement in the quality of image reconstruction achieved by DMDTC. The comparison of residuals reveals a stark contrast between TC-CDI and GT, as depicted in the first row of Fig. 7(b2). Residuals are evident across nearly the entire target, with some areas exhibiting residuals reaching a value of 1. Conversely, as illustrated in Fig. 7(c2), the residual comparison between DMDTC and GT approaches almost 0, signifying a substantial enhancement in the reconstruction quality achieved by DMDTC.

Quantitative analysis results of the generalization validation are presented in Table 2, with the values achieved by averaging the 20 reconstructed images. The proposed DMDTC has a positive effect on the reconstruction of dynamic frame objects in different dataset, as evidenced by an increase in the average values of PSNR and SSIM providing the great generalization of the proposed DMDTC.

Table 2. Average PSNR and SSIM of Generalization validation

View Table | View all tables in this article

4.4 Experimental validation

Experimental validation is conducted to validate the performance of the proposed DMDTC in practical scenarios. This experiment utilizing real experimental data provided by Chen et al. [14]. A laser with a central wavelength of 780 nm and a spectral linewidth of 50kHz is utilized as the light source. The measurement results encoded through DMD (1024 × 768 pixels, pixel pitch of 13.68 µm) are captured by a camera (1024 × 1280 pixels, pixel pitch of 4.8 µm). The imaging target is the “westlake” character etched on the steel sheet, and the measurement data is a frequency domain image compressed by 20 frames through sparse sampling. The reconstruction results are illustrated in Fig. 8. To facilitate a more precise assessment of the reconstruction quality, the images sized 512 × 512 are cropped to 128 × 128, with the region of interest serving as the center.

Fig. 8. Experimental validation comparison. The red box in images highlights the significant difference between two methods.

Download Full Size | PDF

As depicted in the red box within Fig. 8, it is evident that the TC-CDI reconstruction image suffers from significant loss of detail, leading to fractures in the contiguous regions and an uneven distribution of pixel intensities throughout the image. Notably, in column 1, the letter “w” experiences substantial information loss under TC-CDI reconstruction, resulting in an uneven distribution of pixel values in the reconstructed image. In contrast, images reconstructed using DMDTC demonstrate heightened sharpness, a more uniform distribution of pixel intensities, and a reduced presence of artifacts. Specifically, in the fourth column, the reconstructed image under TC-CDI exhibits conspicuous fractures. The proposed DMDTC leverages prior information from both the frequency domain and spatial domain to rectify these deficiencies and enhance the quality of reconstruction.

5. Discussion

5.1 Effect of compression ratio on reconstruction quality

To discuss the impact of the number of compressed frames R on the reconstruction results, this work examined the effectiveness and robustness of image reconstruction using DMDTC under different compressive ratios C = 1/R when C = 1/10, 1/12, 1/14, 1/16, 1/18, 1/20. In this experiment, 20 pictures are randomly selected from the MNIST dataset. The 20 images are transformed to frequency domain images, then these images are under-sampled by random masks. When the compression ratio is 1/R, the first R pictures of the 20 images are compressed in order and input into the model for reconstruction. The results for the different compressive ratios are presented in Fig. 9. To improve the visibility and interpretability of the experimental results, the original images sized 512 × 512 were cropped to 128 × 128, with the central region of interest serving as the focal point for analysis.

Fig. 9. Reconstructed images under different compressive ratios.

Download Full Size | PDF

For the same image, it is evident that the image reconstructed by TC-CDI exhibits good reconstruction quality at the compression ratio of C = 1/10. However, when the compression ratio decreased to C = 1/20, the reconstructed image by TC-CDI is markedly distorted in target shape. This disparity in quality can be ascribed to the heightened loss of pertinent data resulting from excessive amount of information is compressed into a single snapshot, consequently leading to a deterioration in the quality of the reconstructed image. Nonetheless, across varying compression ratios, the reconstructed image of the proposed DMDTC demonstrates minimal deviation from the GT, signifying bolstering its stability and robustness of the proposed DMDTC.

By conducting a quantitative analysis of PSNR and SSIM as depicted in Fig. 9, it becomes apparent that the SSIM value of the reconstructed image using the proposed DMDTC approach closely approximates 0.99, while the PSNR has been enhanced and consistently maintained at a high level, which prove the effectiveness of DMDTC under different compressive ratios.

5.2 Ablation experiment

Ablation experiments are conducted to discuss the effects of frequency domain prior information, spatial prior information and dual-domain prior information on the reconstruction results. The “Nanchang” target used in Section 4.3 is employed for reconstruction, and two additional sets of experiments were conducted.

The reconstruction results are depicted in Fig. 10. The first column showcases the reconstructed outputs obtained through TC-CDI. The second column demonstrates the reconstruction outcomes of information filling on frequency domain images achieved through the integration of TC-TCDI with a spatial domain mean-reverting diffusion model. The third column presents the reconstruction results of denoising and image restoration accomplished through the fusion of TC-TCDI with a frequency domain mean-reverting diffusion model. Lastly, the fourth column exhibits the reconstruction results utilizing DMDTC.

Fig. 10. Ablation experiment. (a) depicts the ground truth. (b) shows the result of TC-CDI. (c) exhibits the result of the spatial domain mean-reverting diffusion model. (d) displays the result of the frequency domain mean-reverting diffusion model. (e) showcases the result of DMDTC.

Download Full Size | PDF

The results above indicate that utilizing the mean-reverting diffusion model to assimilate prior information in either the frequency domain or the spatial domain significantly enhances the image reconstruction quality. Solely employing the spatial domain mean-reverting diffusion model facilitates the removal of certain artifacts, but it fails to rectify losses in the frequency domain, exemplified by the redundant connection in the letter “n” in Fig. 10(c) and the absent connection in the letter “g” in Fig. 10(c). Exclusively employing the frequency domain mean-reverting diffusion model effectively mitigates image loss within the frequency domain such as improved shape of the reconstructed images, but it does not entirely alleviate artifact and intensity issues. As illustrated in Fig. 10(d), the pixel intensity of the letters “n” and “g” markedly deviates from that of the GT, and the artifact in the letter “g” is conspicuous. As depicted in Fig. 10(e), the combined utilization of the dual-domain mean-reverting diffusion model exhibits minimal artifacts and pixel intensity that closely aligns with that of the GT.

In the quantitative analysis of PSNR and SSIM presented in Fig. 10, it is evident that the exclusive utilization of either the frequency domain or spatial mean-reverting diffusion model may yield only marginal improvements in the quality of reconstructed images, and in certain instances, may even result in a deterioration. Conversely, the adoption of the proposed DMDTC markedly elevates both the PSNR and SSIM of the reconstructed images. Notably, the dual-domain mean-reverting diffusion model exhibits superior PSNR and SSIM values compared to the single-domain diffusion model.

6. Conclusion

This study proposes a temporal compressive coherent diffraction imaging combined with dual-domain mean-reverting diffusion model, which termed as DMDTC. Through the utilization of extracted prior information concerning the data distribution in both the frequency and spatial domains of the images, the proposed DMDTC significantly enhances the quality of image reconstruction in comparison to traditional TC-CDI. The results indicate that the reconstructed images using DMDTC exhibit fewer artifacts, and the shapes and pixel intensity are more similar to the GT. In simulative validation, both PSNR and SSIM metrics show significant improvement, the highest PSNR and SSIM is 34.41 dB, 0.9964. The proposed DMDTC also has excellent performance in generalization validation, which proves that it can achieve effective recovery of targets in different dataset. The actual measurement data reconstruction results indicate that the reconstructed images using DMDTC exhibit superior shape and uniform pixel distribution. Compared to the single-domain mean-reverting modeling method, the dual-domain mean-reverting diffusion model exhibits good modeling performance and accurately captures data distribution, which is anticipated to be widely utilized in optical and medical imaging fields. Expanding the proposed DMDTC to other areas such as static machined non-biological samples can be achieved by training the model on datasets specific to those fields. This method allows for the customization of image reconstruction solutions tailored to the characteristics of the data in those particular domains. In future work, investigating the imaging of colored objects using temporal compressive coherent diffraction imaging would be a highly significant topic. Additionally, due to the high computational costs, utilizing diffusion models for image reconstruction still presents challenges. It would be worthwhile to consider incorporating acceleration techniques, such as two-step distillation, in future research to reduce the sampling steps of diffusion models during the image reconstruction process.

Funding

National Natural Science Foundation of China (62105138, 62122033); Nanchang University 2023 College Students Innovation and Entrepreneurship Training (2023CX203).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [24].

References

1. J. Miao, P. Charalambous, J. Kirz, et al., “Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens,” Nature 400(6742), 342–344 (1999). [CrossRef]

2. M. A. Pfeifer, G. J. Williams, I. A. Vartanyants, et al., “Three-dimensional mapping of a deformation field inside a nanocrystal,” Nature 442(7098), 63–66 (2006). [CrossRef]

3. D. Shapiro, P. Thibault, T. Beetz, et al., “Biological imaging by soft x-ray diffraction microscopy,” Proc. Natl. Acad. Sci. U.S.A. 102(43), 15343–15346 (2005). [CrossRef]

4. G. Popescu, T. Ikeda, R. R. Dasari, et al., “Diffraction phase microscopy for quantifying cell structure and dynamics,” Opt. Lett. 31(6), 775–777 (2006). [CrossRef]

5. P. Marquet, B. Rappaz, P. J. Magistretti, et al., “Digital holographic microscopy: a noninvasive contrast imaging technique allowing quantitative visualization of living cells with subwavelength axial accuracy,” Opt. Lett. 30(5), 468–470 (2005). [CrossRef]

6. Ç Işıl, F. S. Oktem, and A. Koç, “Deep iterative reconstruction for phase retrieval,” Appl. Opt. 58(20), 5422–5431 (2019). [CrossRef]

7. X. Huang, J. Nelson, J. Steinbrener, et al., “Incorrect support and missing center tolerances of phasing algorithms,” Opt. Express 18(25), 26441–26449 (2010). [CrossRef]

8. R. Horisaki, R. Egami, and J. Tanida, “Single-shot phase imaging with randomized light (SPIRaL),” Opt. Express 24(4), 3765–3773 (2016). [CrossRef]

9. F. Zhang, B. Chen, G. R. Morrison, et al., “Phase retrieval by coherent modulation imaging,” Nat. Commun. 7(1), 13367 (2016). [CrossRef]

10. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

11. Y. Dou, M. Cao, X. Wang, et al., “Coded aperture temporal compressive digital holographic microscopy,” Opt. Lett. 48(20), 5427–5430 (2023). [CrossRef]

12. L. Wang, M. Cao, Y. Zhong, et al., “Spatial-temporal transformer for video snapshot compressive imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 1–18 (2022). [CrossRef]

13. Y. He, Y. Yao, D. Qi, et al., “Temporal compressive super-resolution microscopy at a frame rate of 1200 frames per second and spatial resolution of 100 nm,” Adv. Photon. 5(02), 026003 (2023). [CrossRef]

14. Z. Chen, S. Zheng, Z. Tong, et al., “Physics-driven deep learning enables temporal compressive coherent diffraction imaging,” Optica 9(6), 677–680 (2022). [CrossRef]

15. G. Montavon, W. Samek, and K. R. Müller, “Methods for interpreting and understanding deep neural networks,” Digit. Signal Process. 73, 1–15 (2018). [CrossRef]

16. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

17. X. Yuan, D. J. Brady, and A. K. Katsaggelos, “Snapshot compressive imaging: Theory, algorithms, and applications,” IEEE Signal Process. Mag. 38(2), 65–88 (2021). [CrossRef]

18. Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” Advances in neural information processing systems 33, 12438–12448 (2020).

19. H. Peng, C. Jiang, J. Cheng, et al., “One-shot Generative Prior in Hankel-k-space for Parallel Imaging Reconstruction,” IEEE Trans. Med. Imaging 42(11), 3420–3435 (2023). [CrossRef]

20. M. Daniels, T. Maunu, and P. Hand, “Score-based generative neural networks for large-scale optimal transport,” Advances in neural information processing systems 34, 12955–12965 (2021).

21. W. Wan, H. Ma, Z. Mei, et al., “Multi-phase FZA lensless imaging via diffusion model,” Opt. Express 31(12), 20595–20615 (2023). [CrossRef]

22. X. Song, G. Wang, W. Zhong, et al., “Sparse-view reconstruction for photoacoustic tomography combining diffusion model with model-based iteration,” Photoacoustics 33, 100558 (2023). [CrossRef]

23. Z. Luo, F. K. Gustafsson, Z. Zhao, et al., “Image restoration with mean-reverting stochastic differential equations,” arXiv, arXiv:2301.11699 (2023). [CrossRef]

24. H. Li, “Dual-domain Mean-reverting Diffusion Model-enhanced Temporal Compressive Coherent Diffraction Imaging,” GitHub (2024) [accessed 9 Apr. 2024], https://github.com/yqx7150/DMDTC.

Method	PSNR (dB)	SSIM
TC-CDI	25.08	0.9614
DMDTC	27.39	0.9832

Method	PSNR (dB)	SSIM
TC-CDI	25.08	0.9614
DMDTC	27.39	0.9832

Dual-domain mean-reverting diffusion model-enhanced temporal compressive coherent diffraction imaging

Abstract

1. Introduction

2. Preliminary

2.1 Temporal compressive coherent diffraction imaging

2.2 Mean-reverting SDE

3. Method

4. Experiments

4.1 Data specification and parameter selection

4.2 Simulative validation

4.3 Generalization validation

4.4 Experimental validation

5. Discussion

5.1 Effect of compression ratio on reconstruction quality

5.2 Ablation experiment

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (3)

Equations (22)

Optics Express

Method	PSNR(dB)	SSIM
TC-CDI	24.72	0.9682
DMDTC	27.85	0.9870

Method	PSNR(dB)	SSIM
TC-CDI	24.72	0.9682
DMDTC	27.85	0.9870