Adaptive coded phase mask design and high-quality image reconstruction for interference-less coded aperture correlation holography

Rui Xiong; Xiangchao Zhang; Xinyang Ma; Leheng Li; Zhanghao Ni; Xiangqian Jiang; Xiangqian Jiang

doi:10.1364/OE.517854

1. Introduction

Coded aperture imaging technology [1–3] encodes the object wave-front using a mask, permitting a significant number of high-frequency components originating from the target to the detector. Following that, reconstruction is conducted to decode the images, leading to a high-resolution representation of the target. Nowadays, the fusion of coded aperture imaging and digital holography has sparked considerable interests. This merger of two distinct imaging modalities combines their unique benefits and presents thrilling new opportunities. A new imaging concept, namely the coded aperture correlation holography (COACH) [4,5] is proposed subsequently. In COACH, optical techniques are still employed for splitting waves to form a two-dimensional (2D) hologram containing the observed scene’s three-dimensional (3D) information. Additionally, COACH can achieve comparable lateral and axial resolutions to conventional imaging methods. Further investigation has found that, unlike other coded aperture imaging systems like FINCH [6], COACH uses a pseudo-randomly coded phase mask (CPM) instead of a quadratic phase mask. The pseudo-randomly CPM possesses the dual capability of modulating both phase and amplitude. As a result, COACH has the capability of imaging a 3D scene without two-wave interference. Such an improved version of COACH is termed the interference-less coded aperture correlation holography (I-COACH) [7,8]. I-COACH, owing to its interference-less property, has an optical configuration as simple as a lens-based direct imaging system. In the system, the incoherent light emitted from an object is modulated by a CPM and then recorded by a digital camera as an object hologram H_OBJ. The 3D image is conventionally reconstructed using a cross-correlation algorithm between H_OBJ and the point spread holograms H_PSF associated with a series of axial positions_. I-COACH has a resolution comparable to COACH, and it has been widely used in applications like partial and synthetic aperture systems [9,10], endoscopic systems [11], scattering imaging [12], and field-of-view extension [13].

However, the presence of noise results in a low signal-to-noise ratio (SNR) in the I-COACH system [14]. This is due to two main factors. On one hand, the reconstructed image suffers from serious background. According to the convolution theorem, the reconstructed point object is the autocorrelation of H_PSF. If the width of the autocorrelation of H_PSF is sufficiently sharp, the object can be reconstructed reliability. Nevertheless, the autocorrelation of H_PSF normally has sidelobes, henceforth a background artefact arises. Several optimization algorithms have been developed based on the cross-correlation theory to suppress the background component, including the phase-only filter (POF) [15], the nonlinear filter (NLF) [16], and the modified nonlinear filter (MNLF) [17]. However, they may compromise other shortcomings, such as imaging resolution and computational efficiency. Moreover, 3D imaging relies on a pre-prepared library of H_PSF, which is elaborate to be built. Fortunately, deep learning [18,19] has been proven effective in addressing these challenges. The potential of integrating deep learning with the COACH-based systems has been demonstrated by our research group [20]. On the other hand, the design of the CPM can also impact the SNR of the system. At the outset, the spectrum of the CPM should strive for uniformity, referred to as non-sparse CPM [4]. In theory, modulating the object wave-front by non-sparse CPMs can retrieve high frequency components that were previously discarded due to the limited numerical aperture of the imaging system. However, the I-COACH encountered a relatively low SNR in practice as a non-sparse CPM can amplify system noise during reconstruction, and the system noise is primarily introduced by hardware, such as photo shot noise and dark current noise emanating from the image sensor. An improved version, namely the dot-sparse CPM, was then developed [21–23]. The spectrum of the dot-sparse CPM is represented by a cluster of sparsely distributed dots. In reconstruction, it provides a higher SNR than the non-sparse CPM. A more recent advancement in this field incorporates an annular sparse variant of CPM [17], where the spectrum of the CPM is replaced with predefined sparse annuluses. However, once an appropriate CPM is obtained by the Gerchberg-Saxton (GS) algorithm, the wave-fronts of disparate objects in the I-COACH system are uniformly modulated by this fixed CPM. Due to the different features contained in various objects, it is infeasible to properly modulate all the objects with the same CPM.

As the SNR of the I-COACH system is affected by both the design of CPM and the reconstruction algorithm, an imaging scheme consisting of two sub-neural networks is proposed in this paper. The first sub-network generates adaptive CPMs in accordance with the frequency distribution of the object, and the second sub-network mitigates the background component based on physical knowledge. In this paper, the performance of the network is evaluated and enhanced using a combined form of loss functions. The established model allows for single-shot direct reconstruction, eliminating the prerequisite of a H_PSF library. The principles and methodology are presented in Section 2 and Section 3, while experimental demonstrations are provided in Section 4. The conclusion is presented in Section 5.

2. Imaging principle and image analysis of I-COACH

2.1 Imaging scheme of I-COACH

The basic configuration of the I-COACH system is shown in Fig. 1. An object is critically illuminated by an incoherent light source and a lens L₀. The light from the object is collected and collimated by a lens L₁ located at a distance f₀ from the object. The light polarized by a polarizer P is incident on the spatial light modulator (SLM) located at a distance z_s, such that most of the incident light is modulated by the CPM displayed on the SLM. The modulated light is recorded by an image sensor located at a distance z_h from the SLM. The pinhole is moved to different axial locations and the corresponding intensity patterns are recorded, thereby generating a H_PSF library. The point spread hologram H_PSF of an object located at the axis position z_s is expressed as,

(1)$$\begin{aligned} {H_{\textrm{PSF}}}(\overline {{r_h}} ,\overline {{r_s}} ,{z_s}) &= {\left|{\sqrt {{A_s}(\overline {{r_s}} ,{z_s})} {C_1}L(\frac{{\overline {{r_s}} }}{{{z_s}}})Q(\frac{1}{{{z_s}}})Q(\frac{1}{{ - {f_1}}})\exp [i\phi (\overline r )] \ast Q(\frac{1}{{{z_h}}})} \right|^2}\\ &= {\left|{\sqrt {{A_s}(\overline {{r_s}} ,{z_s})} {C_1}L(\frac{{\overline {{r_s}} }}{{{z_s}}})Q(\frac{1}{{{z_1}}})\exp [i\phi (\overline r )] \ast Q(\frac{1}{{{z_h}}})} \right|^2}\\ &= {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}\overline {{r_s}} ,0,{z_s}), \end{aligned}$$

where * denotes a 2D convolution, ${z_1} = {{{z_s}{f_1}} / {({f_1} - {z_s})}}$, $\sqrt {{A_s}(\overline {{r_s}} ,{z_s})}$ is the amplitude of the point object at $({{x_s},{y_s},{z_s}} )$, $\overline {{r_s}} = ({x_s},{y_s})$ denotes the transverse location vector on the object plane, $\overline {{r_h}} = ({x_h},{y_h})$ represents the transverse location vector on the Sensor plane. C₁ is a complex constant, Q and L represent quadratic and linear phase functions, derived as $Q(a) = \exp [i\pi a{\lambda ^{ - 1}}({x^2} + {y^2})]$ and $L({{\overline {{r_s}} } / z}) = \exp [i2\pi {(\lambda z)^{ - 1}}({b_x}x + {b_y}y)]$ (b_x and b_y denote the coefficients in the x and y directions), respectively. In addition, $\phi (\bar{r})$ represents the phase distribution of the CPM and λ is the wavelength. The last equal sign in Eq. (1) indicates that the intensity on the Sensor plane is a shifted version of the intensity response for a point object located on the optical axis $\overline {{r_s}} = (0,0)$, where the shifting distance is $\overline {{r_s}} {z_h}/{z_s}$.

Fig. 1. Optical configuration of I-COACH. L₀, L₁: lenses; P: polarizer; Blue arrows indicate polarization orientations.

Download Full Size | PDF

Once the H_PSF library is saved in a computer, the system is ready to record actual holograms and to reconstruct 3D objects. A 2D object slice at z_s can be considered as a collection of N uncorrelated point objects $O(\overline {{r_s}} ) = \sum\limits_j^N {{a_j}\delta } (\overline {{r_s}} - \overline {{r_j}} )$, where a_j is the intensity of the j-th object point at $\overline {{r_j}}$. The object is illuminated by an incoherent quasi-monochromatic light, so that the overall intensity distribution on the Sensor plane is a sum of the point responses given by,

(2)$$\begin{aligned} {H_{\textrm{OBJ}}}(\overline {{r_h}} ,{z_s}) &= O({{\bar{r}}_s}) \ast {H_{\textrm{PSF}}}(\overline {{r_h}} ,{z_s}) = \sum\limits_j^N {{a_j}\delta ({{\bar{r}}_s} - {{\bar{r}}_j})} \ast {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}\overline {{r_s}} ,0,{z_s})\\ &= \sum\limits_j^N {{a_j}} {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}{{\bar{r}}_{s.j}},0,{z_s}), \end{aligned}$$

The reconstruction of the object is conventionally realized by the cross-correlation between ${H_{\textrm{OBJ}}}(\overline {{r_h}} ,{z_s})$ and ${H_{\textrm{PSF}}}(\overline {{r_h}} ,{z_s})$,

(3)$$\begin{aligned} {I_{\textrm{IMG}}}({{\bar{r}}_s}) &= {H_{\textrm{OBJ}}}(\overline {{r_h}} ,{z_s}) \otimes {H_{\textrm{PSF}}}(\overline {{r_h}} ,{z_s}) = \sum\limits_j^N {{a_j}} {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}{{\bar{r}}_{s,j}},0,{z_s}) \otimes {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}\overline {{r_s}} ,0,{z_s})\\ &= {\Im ^{ - 1}}\{ \Im \{ \sum\limits_j^N {{a_j}\delta ({{\bar{r}}_s} - {{\bar{r}}_j})} \} {h^2}\exp (i\varphi )\exp ( - i\varphi )\} \\ &= \sum\limits_j^N {{a_j}\delta ({{\bar{r}}_s} - {{\bar{r}}_j})} \ast {\Im ^{ - 1}}\{ {h^2}\} \approx {O^{\prime}}({{\bar{r}}_s}), \end{aligned}$$

where the sign ⊗ denotes the correlation, $\Im \{{\cdot} \}$ stands for the 2D Fourier transform. Then we have $\Im \{ {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}\overline {{r_s}} ,0,{z_s})\} = hexp (i\varphi )$, where h and φ denote the amplitude and phase of the Fourier transform of H_PSF, respectively. The approximation of Eq. (3) holds true given that the function $\Im \{ {h^2}\}$ is a sharply peaked function.

2.2 Impact of CPM design on imaging resolution

Figure 2 illustrates the synthesis of a CPM diagram using the GS algorithm [24]. The iterative process begins with an initial random phase Φ_p and a uniform amplitude A_p on the SLM plane. The complex matrix A_p exp(iΦ_p) is Fourier transformed onto the Sensor plane, in which the amplitude of the transformed function is replaced by a predefined distribution A_t, while the phase values Φ_t remain unchanged in the next iteration. Repeat the calculation until the difference between two consecutive CPMs is negligible. Subsequently, the angular spectrum theory will be employed to investigate the CPM’s impact on the imaging quality.

Fig. 2. CPM synthesis using a modified Gerchberg-Saxton algorithm.

Download Full Size | PDF

Specifically, a point object is taken as input, and its Fourier transform is denoted as Ω(f_x, f_y). The transfer function associated with the diffractive propagation via a distance z is given by,

(4)$$h({f_x},{f_y},z) = \exp [i\frac{{2\pi }}{\lambda }z\sqrt {1 - {{(\lambda {f_x})}^2} - {{(\lambda {f_y})}^2}} ],$$

Consequently, the complex amplitude on the SLM plane at the distance z_s can be expressed as,

(5)$$\mathrm{\Omega }({f_x},{f_y}) \cdot h({f_x},{f_y},{z_s}),$$

Assume that the spatial coordinates of the SLM plane are (u, v), and the corresponding spatial frequencies are (f_u, f_v). Denote the magnitude of the CPM in the spectral domain with A_t, thereby the spectra of CPM can be expressed as,

(6)$$\psi ({f_x},{f_y}) = {A_t}({f_u},{f_v})\exp [i \cdot {{\varPhi }_t}({f_u},{f_v})],$$

Then, the respective modulating function on the SLM plane can be obtained as,

(7)$$\Psi (u,v) = {\Im ^{ - 1}}\{ \psi ({f_x},{f_y})\} ,$$

According to the optical configuration of the imaging system, the Sensor plane is located at the Fourier plane of the SLM, then the coordinates (u, v) and (f_x, f_y) are equivalent. Therefore, the spectral distribution of the image on the Sensor plane associated with a point object is,

(8)$$\textrm{CTF}({f_x},{f_y}) = {\varOmega }({f_x},{f_y}) \cdot h({f_x},{f_y},{z_s}) \cdot \Psi ({f_x},{f_y}) \cdot h({f_x},{f_y},{z_h}),$$

It is the coherent transfer function (CTF) of the system, where h(f_x, f_y, z) is a phase-type function with a normalized amplitude. Given an ideal point object, its complex amplitude at the object plane is Ω=1. Therefore, the optical transfer function (OTF) of the I-COACH system can be written as,

(9)$$\begin{aligned} \textrm{OTF}({f_x},{f_y}) &= {C_1} \cdot \textrm{CTF}({f_x},{f_y}) \otimes \textrm{CTF}({f_x},{f_y})\\ &= {C_1} \cdot \Psi ({f_x},{f_y}) \otimes \Psi ({f_x},{f_y}), \end{aligned}$$

where C₁ is a constant. It can be seen that the OTF is mainly affected by the CPM function. Moreover, Eqs. (6) and (7) show that different CPM functions can be obtained by modifying the spectral amplitude A_t. Then a proper design of A_t is investigated to improve the imaging quality.

The scattering degree σ is adopted to measure the modulating capability of a CPM in the spectral domain, which is defined as the ratio of the spectral bandwidth B to the bandwidth B_max,

(10)$$\sigma = {B / {{B_{\max }}}},$$

The non-zero dots in A_t are only within the band specified by B. When the scattering degree rises, the constrained bandwidth B increases, thereby broadening the distribution range of non-zero dots.

The initial version of the I-COACH employs a non-sparse CPM and requires A_t to be as uniform as possible under the maximum scattering degree. However, the resulting SNR is relatively low because the non-sparse CPM encodes remarkable system noise during the reconstruction process. Specifically, when we consider the system noise, Eq. (3) is modified as follows

(11)$$\begin{aligned} {I_{\textrm{IMG}}} &= ({H_{\textrm{OBJ}}} + {n_{\textrm{OBJ}}}) \otimes ({H_{\textrm{PSF}}} + {n_{\textrm{PSF}}})\\ &= {H_{\textrm{OBJ}}} \otimes {H_{\textrm{PSF}}} + {H_{\textrm{OBJ}}} \otimes {n_{\textrm{PSF}}} + {H_{\textrm{PSF}}} \otimes {n_{\textrm{OBJ}}} + {n_{\textrm{OBJ}}} \otimes {n_{\textrm{PSF}}}\\ &\approx {H_{\textrm{OBJ}}} \otimes {H_{\textrm{PSF}}} + {H_{\textrm{OBJ}}} \otimes {n_{\textrm{PSF}}} + {H_{\textrm{PSF}}} \otimes {n_{\textrm{OBJ}}}, \end{aligned}$$

where n_OBJ and n_PSF denote the noise terms in the recorded H_PSF and H_OBJ, respectively. The values of H_OBJ ⊗ n_PSF and H_PSF ⊗ n_OBJ affect the imaging SNR. Therefore, the CPM needs to be modified to suppress these two terms. Recently, modified CPMs of sparsely distributed dots [21] and sparsely distributed annuluses [17] are proposed.

To investigate the modulation and reconstruction performances of different forms of CPMs, we assign Gaussian noise (mean = 0, variance = 0.01) on an object. The annular sparse CPM has an annulus width of two pixels and contains the same number of non-zero points as the dot-sparse CPM. In addition, the object is reconstructed using the conventional POF method as well for comparison. The results are shown in Fig. 3. When the scattering degree is low, the SNR of the non-sparse CPM is higher than those of the other CPMs, because the energy of an object is mainly concentrated in the low-frequency region. Hence, the CPM spectra need more non-zero dots condensed in the central low-frequency region. At this point, the non-sparse CPM behaves as a low-pass filter encoding mainly low-frequency components, thereby less susceptible to system noise. In fact, the spectral distribution of the CPM should be consistent to that of the object, which explains why a non-sparse CPM is advantageous for retrieving more energy about the object at a low scattering degree.

Fig. 3. Experimental results with different modulated CPMs. (SNR in dB)

Download Full Size | PDF

As the scattering degree gradually increases, the non-zero dots are in a broader range, then the CPM tends to encode more high-frequency components about the object, leading to enhanced reconstruction of detailed features. Unfortunately, when the scattering degree becomes high, more noise is encoded into the hologram, then the SNR associated with a non-sparse CPM gradually decreases. Interestingly, the dot-sparse CPM has the best modulation effect. It can reduce the probability of noise encoding to some extent but cannot completely eradicate noise. Therefore, the enumeration method is frequently utilized to list all possible combinations of scattering degree and number of dots. Then, the most appropriate dot-sparse CPM can be identified in terms of appropriate image metrics, such the visibility [14] and SNR. However, the obtained CPM typically exhibits a small scattering degree, which contradicts the aim of achieving high spatial frequencies via CPM modulation.

The spectral distribution of an annular sparse CPM has been engineered into an annular structure comprising of a series of annuluses in which non-zero dots are scattered. Despite with the same scattering degree and the same number of non-zero dots, the encoding performance of the annular sparse CPM is not the same with that of the dot-sparse CPM, suggesting that the dot distribution affects the reconstruction cutoff frequency as well. It is noticeable that, the SNR shows an increase first followed by a decrease. Specifically, its SNR is higher than that of a dot-sparse CPM for a scattering degree from 0.2 to 0.5, but lower for a scattering degree from 0.75 to 1. Therefore, the annular sparse CPM is only suitable for a medium scattering degree.

3. Proposed method

Based on the afore-mentioned analysis, the number and the distribution of non-zero dots in the spectral amplitude of the CPM are the main influencing factors of CPM design. It is only under a high scattering degree that the system has a greater likelihood of acquiring high-frequency information on the object. However, a high scattering degree may lead to a loss of information or the encoding of additional noise. Consequently, a method is proposed to design adaptive dot-sparse CPMs under the maximum scattering degree for different objects. We can utilize the frequency distribution of different objects as a guide and generate a suitable spectral distribution of the CPM using convolution neural networks. The structure of the proposed model is illustrated in Fig. 4. The initial sub-network comprises a feature extraction module and a mask generation module, whereas the second sub-network comprises a reconstruction module. Each module will be described in detail below.

Fig. 4. Network architecture of the proposed method. (a) Feature extraction module; (b) Mask generation module; (c) Reconstruction module.

Download Full Size | PDF

3.1 Feature extraction module

Valuable information from various object scales can be extracted using convolutional operators in the feature extraction module. The effectiveness of feature extraction will be directly affected by the size of the convolution kernel. The Atrous spatial pyramid pooling (ASPP) [25] is a method of extracting multi-scaled features using four dilating convolutional kernels with dilating rates of 6, 12, 18, and 24, respectively. This method can improve the discriminability and robustness of features, and the dilating convolution used has a relatively high dilating rate, which is suitable for acquiring large-scaled features. However, a higher dilating rate results in local information loss, which means it may be ineffective on small-scaled features. Inspired by the RFBNet [26], the block’s capability in simultaneously extracting large-scaled and small-scaled features is improved, thereby achieving a balance between the two. Specifically, the original dilating convolutional kernels with dilating rates of 6, 12, 18 and 24 are modified to 3, 5, 7 and 9, respectively. Among them, the dilating convolutional kernels with dilating rates of 3 and 5 are used to extract small-scaled features, and those of 7 and 9 are used to extract large-scaled features.

As illustrated in Fig. 4(a), the object is first sent to a 3 × 3 convolutional block with 64 filters and stride 1, which yields 64 feature maps. Each feature map is processed by batch normalization (BN) and rectified linear unit (ReLU) to accelerate model convergence. The 64 feature maps are then divided into four groups. Each group contains an ordinary convolutional kernel and a dilating convolutional kernel, where the kernel size of the ordinary convolutional kernel is made equal to the dilating rate of the convolutional kernel to mimic the receptive field and eccentricity in the human visual model. Viewed from the perspective of the receptive field, two consecutive 3 × 3 ordinary convolutional kernels are equivalent to one 5 × 5 ordinary convolutional kernel, whilst the former has a lower computation cost. Therefore, to further reduce the number of parameters while maintaining the receptive field unchanged, two stacked 3 × 3 convolutional kernels are used to replace a 5 × 5 kernel. Similarly, three 3 × 3 kernels are used to replace a 7 × 7 kernel, and four consecutive 3 × 3 kernels are used to replace a 9 × 9 kernel. Finally, the feature maps obtained from different dilating convolution blocks are merged. As a result, the output of the feature extraction module contains the primary features of the object.

3.2 Mask generation module

Adaptive CPMs are then generated adaptively in accordance with the frequency distributions of different objects. As shown in Fig. 4(b), the GS algorithm first obtains a non-sparse CPM at the maximum scattering degree and then performs a Fourier transform on the CPM to obtain its spectrum and phase distribution Φ_t. Next, the feature image obtained by the feature extraction module is diffractively propagated to the frequency domain, and its modulus is attached with the extracted phases Φ_t to form a new complex matrix. This matrix is inversely Fourier transformed to the SLM plane to obtain a new CPM. Thereby, this CPM is used as an optimizable matrix continuously tuned by feed-forward and backward error propagation throughout the network. The mask generation module can be formulated as

(12)$$\textrm{CP}{\textrm{M}_{\textrm{learn}}} = {\Im ^{ - 1}}\{ |\Im \{ {O_{\textrm{feature}}}\} |\exp (i{\Phi _t})\} ,$$

where O_feature denotes the output from the feature extraction module.

3.3 Reconstruction module

When the object wavefront is modulated by a CPM, the resulting hologram H_OBJ can be captured on the Sensor plane. Furthermore, in data-driven deep learning, a mapping relationship is directly established between H_OBJ and the object to reconstruct the image from H_OBJ [27]. However, such an approach suffers from resource waste and dimensional disaster. To overcome these shortcomings, it is preferred to take full use of physics priors as constraints. Using the convolution theorem, the autocorrelation of H_OBJ can be defined as,

(13)$$\begin{array}{l} {H_{\textrm{OBJ}}}(\overline {{r_h}} ,{z_s}) \otimes {H_{\textrm{OBJ}}}(\overline {{r_h}} ,{z_s})\\ = \sum\limits_j^N {{a_j}} {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}{{\bar{r}}_{s,j}},0,{z_s}) \otimes \sum\limits_j^N {{a_j}} {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}{{\bar{r}}_{s,j}},0,{z_s})\\ = [O({{\bar{r}}_s}) \otimes O({{\bar{r}}_s})] \ast [{H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}{{\bar{r}}_{s,j}},0,{z_s}) \otimes {H_{\textrm{PSF}}}(\overline {{r_h}} - \frac{{{z_h}}}{{{z_s}}}{{\bar{r}}_{s,j}},0,{z_s})], \end{array}$$

where ${H_{\textrm{PSF}}} \otimes {H_{\textrm{PSF}}}$ is the autocorrelation of H_PSF, which is a sharply peaked function [28]. The autocorrelation of H_OBJ is equal to the autocorrelation of the object with an additional background factor C [29]. Thus, Eq. (14) can be further simplified as,

(14)$${H_{\textrm{OBJ}}}(\overline {{r_h}} ,{z_s}) \otimes {H_{\textrm{OBJ}}}(\overline {{r_h}} ,{z_s}) = [O({\bar{r}_s}) \otimes O({\bar{r}_s})] + C, $$

The background factor C results in a poor SNR of the reconstructed images. Therefore, in the image reconstruction module, the autocorrelation of H_OBJ is used as a physics prior knowledge to eliminate the background component by switching from establishing a direct mapping between H_OBJ and the object to a mapping between the autocorrelations of the two. Consequently, the constant C is the only variable to be solved, the dimensionality of which is much lower than the image under reconstruction. According to the Wiener-Khinchin’s theory [30], the autocorrelation of H_OBJ is equal to the inverse Fourier transform of its power density spectrum. When $O \otimes O$ is obtained from the reconstruction module, the Fourier amplitude of the object can be solved as,

(15)$$\textrm{|}\Im {\{ }{O^{\prime}}({\bar{r}_s}){\} |} = \sqrt {|\Im \{ O({{\bar{r}}_s}) \otimes O({{\bar{r}}_s})\} |} ,$$

Finally, the reconstruction image can be calculated [29] using iterative phase-retrieval algorithm [24].

3.4 Loss function

The loss function plays a vital role in model training. A synthesized loss function is defined as $Loss = \alpha {L_{\textrm{pixel}}} + \beta {L_{\textrm{kl}}}$. First, a pixel-wise loss function guides numerical optimization and facilitates model convergence. It is the square of the difference between the predicted result and the ground truth, denoted as

(16)$${L_{\textrm{pixel}}} = \frac{1}{N}\sum\limits_{i = 1}^N {|O_i^{\prime} - {O_i}{|^2}} ,$$

where α and β are hyperparameters that adjust the weights of L_pixel and L_kl. O’ and O are the reconstructed results and the ground truth, respectively. Second, we evaluate the similarity between the autocorrelation of H_OBJ and that of the object using the Kullback-Leibler Divergence (KLD) [31]

(17)$${L_{\textrm{kl}}} = \sum\limits_{i = 1}^N {(O_i^{\prime} \otimes O_i^{\prime})} \log [{{(O_i^{\prime} \otimes O_i^{\prime})} / {({O_i} \otimes {O_i})}}],$$

Typically, a minor loss function L_kl means that the generated CPM behaves well in modulating the object wave-front.

4. Experimental demonstrations and discussions

4.1 Implementation details

The initial dataset comprises 187 images from BSD68, CBSD68, Kdoak24, and RN15 [32], which are converted into 512 × 512 grayscale images. This dataset is segregated into two parts. Virtual training is conducted by adding Gaussian, salt-pepper, and speckle noise with different amplitudes onto the images in the dataset. In addition, actual objects are also generated by displaying the images in the dataset using a digital micromirror device. Then, the incoherent light emitted by the LED (JCOPTIX LEM-620 K, 330 mW, central wavelength λ = 620 nm) passes through a piece of ground glass and illuminates the device. Finally, images are recorded as actual training data. We obtain 1122 training data using data enhancement techniques such as rotating, flipping, resizing, etc. Furthermore, we test the proposed model on the COCO dataset [33] and other objects. Notably, the model has yet to ‘see’ any test data before.

During the training phase, the model undergoes 80 epochs, with the learning rate ranging from 10⁻³ to 10⁻⁵. Specifically, the first 40 epochs have a learning rate of 10⁻³, the following 30 epochs of 10⁻⁴, and the final 10 epochs of 10⁻⁵. An Adam optimizer is employed to update the network weights during training. The model is trained and tested using torch 1.10 and Python 3.6. The experiments are conducted on a computer with a Nvidia A30 graphics processing unit coupled with Nvidia CUDA 11.2.

4.2 Analyzing the effectiveness of CPM design sub-network

The following experiments are conducted to demonstrate the ability of the first sub-network in generating CPMs. First, the model acquires five objects with Gaussian noise (mean = 0, variance = 0.01). Then, the network predicts the CPM and records H_OBJ. The spectral distributions of the predicted CPMs are shown in Fig. 5, with the yellow dashed line indicating the non-zero dots the spectrum. The spectra of these CPMs are different due to different information contained in the objects. However, they all have more non-zero dots in the center and fewer outwards.

Fig. 5. Spectral distributions of the CPMs generated by different objects

Download Full Size | PDF

Next, the image quality of the object modulated by different forms of CPM is verified. In Fig. 6(a), the spectral modulus of the adaptive CPM has 403 non-zero dots. Henceforth, the numbers of non-zero dots of the dot-sparse CPM and the annular sparse CPM are also set as 403 for comparison. At this point, H_OBJ has not yet entered the second sub-network for image prediction. Instead, the images modulated by the adaptive, non-sparse, dot-sparse, and annular sparse CPMs are reconstructed using the POF method. From the results, it can be found that the reconstruction result of the non-sparse CPM contains significant noise. Meanwhile, the annular sparse CPM is only suitable for those cases with a small scattering degree. In addition, the modulation transfer function (MTF) is applied to assess different CPMs, as shown in Fig. 7. The results indicate that the adaptive CPM (blue curve) produces the highest MTF, and the area enclosed by the MTF curve and the axes is the largest. Therefore, the adaptive CPMs are more effective than other CPMs.

Fig. 6. Comparison of different CPMs reconstructed by POF. (a) adaptive CPM; (b) non-sparse CPM; (c) dot-sparse CPM; (d) annular sparse CPM. (SNR in dB)

Download Full Size | PDF

Fig. 7. MTFs of H_PSF associated with different CPMs.

Download Full Size | PDF

4.3 Analyzing the effectiveness of image reconstruction sub-network

The following experiments verify the feasibility of the second sub-network for image reconstruction. In the literature [20], an encoder-decoder network establishes a mapping relationship between the autocorrelation of H_OBJ and that of the object, thus reducing the background component generated by the cross-correlation algorithm. Here, the encoder-decoder is improved by building an appropriate sub-network. First, the CPM obtained via the first sub-network is used to modulate the object wavefront. Then, UNet, UNet + ResNet [34,35], BM3D [36], FFDNet [37], DnCNN [38], and WNNM [39] are employed to reconstruct H_OBJ, respectively. In addition, we also vary the Gaussian noise levels added onto the object. The reconstruction results are presented in Table 1. As can be seen, all methods suffer from reconstruction errors when the noise level increases. However, UNet + ResNet consistently outperforms several other denoisers.

Table 1. Average SNR for different methods on COCO with various noise levels

View Table

Next, the reconstructed image qualities of UNet + ResNet as well as the POF, NLF (parameters o = 0.7 and p = 0.3) and MNLF (parameter ξ=47.33) methods are compared, as shown in Fig. 8. To assess the efficacy of different reconstruction schemes, with quantitative measures of SNRs and the structure similarity index measure (SSIM). The reconstruction results show that the proposed scheme successfully suppresses system noise and background components while restoring visible features. On the contrary, traditional reconstruction algorithms suffer from noise issues. Specifically, the object is reconstructed by the second sub-network, with the highest SNR value of 40.4377 dB, indicating the validity of reducing the training dimensionality by prior knowledge.

Fig. 8. Test on the element 18 of NBS resolution target. (a) Reconstructed results of different algorithms using the same CPM; (b) The cross-section of the red lines. (SNR in dB)

Download Full Size | PDF

Furthermore, an experimental assessment of the COCO dataset is conducted, as shown in Fig. 9(a), which depicts ground truth objects. Meanwhile, Fig. 9(d) displays they images containing Gaussian noise (mean = 0, variance = 0.25), which served as an input to the model. Figure 9(b) illustrates the spectral distribution using the proposed method. Figure 9 also displays the reconstruction results of different methods, indicating that the proposed method preserves more features than others, specifically in the first, second, and sixth rows.

Fig. 9. Test on COCO dataset. (a) Ground truth; (b) Spectral distributions of different CPMs generated using the first sub-network; (c) Enlarged partial views of (a); (d) Noisy objects. (SNR: dB)

Download Full Size | PDF

4.4 Imaging of 3D transmitted/reflective objects

In the third experiment, the 3D imaging capability of the proposed method is compared with the POF method by reconstructing objects from H_OBJ captured at different axial positions. Specifically, the element of 10 lp/mm of a USAF 1951 resolution target is used as an object and illuminated with an LED (JCOPTIX LEM-620 K, 330 mW, central wavelength λ = 620 nm). The focal length of the lens L₁ is 150 mm. The SLM (HOLOEYE, 1080 × 1920 pixels, 8µm pixel pitch, phase-only modulation) is located at a distance of 55 mm from L₁. The light modulated by the SLM is collected by a camera (Manta G-419, 2048 × 2048 pixels, 5µm pixel pitch) located at a distance of z_h= 150 mm from the SLM. The front focal plane of the lens L₁ is specified as Δz = 0 mm. Initially, the object is placed at the front focal plane of lens L₁. Subsequently, the object is moved away, with a step of 2 mm, along the axial direction. Then, H_OBJ is recorded at different axial distances. In addition, to realize the 3D imaging using the POF method, a pinhole with a diameter of 30µm was used as a point object to record H_PSF at different axial positions.

Figure 10 shows the reconstruction results obtained using the proposed and the POF methods. The proposed method can reliably recover the object at different axial positions. On the contrary, the reconstructed images obtained by the POF method have lower quality and contrast. This is because the POF method cannot effectively suppress the background components. Interestingly, the reconstructed image becomes almost unrecognizable when the object is far from the front focal plane.

Fig. 10. Experiment results when the object moves away from the front focal plane of the lens. (a) Reconstruction results; (b) Cross-sections.

Download Full Size | PDF

In the fourth experiment, the transmitted experimental setup is adjusted into a reflective configuration, as depicted in Fig. 11. We choose two neighboring screws as an object, where the axial distance between the top planes of them is 5 mm. The direct imaging at the two axial planes is shown in Fig. 12(a) and (b), and the reconstructed results of the proposed method are shown in Fig. 12(c) and (d). In direct imaging, when recorded at one of the axial positions, only one screw is in focus, and the other is out of focus. Although the reconstructed results appear similar to those of direct imaging, the contrast and detail recovery are improved. Specifically, as depicted in the enlarged version in Fig. 12, object 2 is out of focus (yellow solid image), but more detailed features are reconstructed (blue dashed image). In the second row, object 1 is out of focus, while the contrast of the reconstructed image is improved significantly.

Fig. 11. Experimental setup for imaging reflective objects.

Download Full Size | PDF

Fig. 12. Reflective imaging of two screws. (a) and (b) Direct imaging at two positions; (c) and (d) Imaging using the proposed method at two positions.

Download Full Size | PDF

5. Conclusions

Since I-COACH was proposed in 2017, improving the reconstruction quality has been a popular topic. Generally, the governing factors of the SNR of the I-COACH system are mainly the system noise and the background components. These two problems are conventionally addressed separately. The scheme proposed in this paper attempts, for the first time, to address both issues simultaneously. It consists of two sub-networks. The first takes the frequency distribution of the object as a guidance. It adaptively generates the coded phase masks to eliminate the coupling effect between the object and the noise. In addition, the second subnetwork establishes a mapping relationship between the autocorrelations of the hologram and of the object to reduce the training dimensionality. The reconstruction quality can be greatly improved. More importantly, the adaptively generated coded phase masks increase the generalization capability of the I-COACH system and the interpretability of the network is improved by embedding a prior physical knowledge, which is essential for guaranteeing the fidelity of holographic imaging.

Funding

National Natural Science Foundation of China (51875107); Jiangsu Provincial Key Research and Development Program (BE2021035); Dreams Foundation of Jianghuai Advance Technology Center (2023-ZM01C008).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. W. L. Chi and N. George, “Optical imaging with phase-coded aperture,” Opt. Express 19(5), 4294–4300 (2011). [CrossRef]

2. A. A. Faust, R. E. Rothschild, P. Leblanc, et al., “Development of a coded aperture x-ray backscatter imager for explosive device detection,” IEEE Trans. Nucl. Sci. 56(1), 299–307 (2009). [CrossRef]

3. C. Slinger, H. Bennett, G. Dyer, et al., “Adaptive coded-aperture imaging with subpixel superresolution,” Opt. Lett. 37(5), 854–856 (2012). [CrossRef]

4. A. Vijayakmar, Y. Kashter, R. Kelner, et al., “Coded aperture correlation holography-a new type of incoherent digital holograms,” Opt. Express 24(11), 2430–2441 (2016). [CrossRef]

5. J. Rosen, A. Vijayakumar, M. Kumar, et al., “Recent advances in self-interference incoherent digital holography,” Adv. Opt. Photonics 11(1), 1–66 (2019). [CrossRef]

6. J. Rosen and G. Brooker, “Digital spatially incoherent Fresnel holography,” Opt. Lett. 32(8), 912–914 (2007). [CrossRef]

7. A. Vijayakumar and J. Rosen, “Interferenceless coded aperture correlation holography-a new technique for recording incoherent digital holograms without two-wave interference,” Opt. Express 25(12), 13883–13896 (2017). [CrossRef]

8. M. Kumar, A. Vijayakumar, and J. Rosen, “Incoherent digital holograms acquired by interferenceless coded aperture correlation holography system without refractive lenses,” Sci. Rep. 7(1), 11555 (2017). [CrossRef]

9. A. Bulbul and J. Rosen, “Super-resolution imaging by optics incoherent synthetic aperture with one channel at a time,” Photonics Res. 9(7), 1172–1181 (2021). [CrossRef]

10. A. Bulbul, A. Vijayakumar, and J. Rosen, “Partial aperture imaging by systems with annular phase coded masks,” Opt. Express 25(26), 33315–33329 (2017). [CrossRef]

11. N. Dubey, J. Rosen, and I. Gannot, “High-resolution imaging system with an annular aperture of coded phase masks for endoscopic applications,” Opt. Express 28(10), 15122–15137 (2020). [CrossRef]

12. S. Mukherjee, A. Vijayakumar, M. Kumar, et al., “3D imaging through scatterers with interferenceless optical system,” Sci. Rep. 8(1), 1134 (2018). [CrossRef]

13. M. R. Rai, A. Vijayakumar, and J. Rosen, “Extending the field of view by a scattering window in an I-COACH system,” Opt. Lett. 43(5), 1043–1046 (2018). [CrossRef]

14. N. Dubey and J. Rosen, “Interferenceless coded aperture correlation holography with point spread holograms of isolated chaotic islands for 3D imaging,” Sci. Rep. 12(1), 4544 (2022). [CrossRef]

15. M. R. Rai and J. Rosen, “Resolution-enhanced imaging using interferenceless coded aperture correlation holography with sparse point response,” Sci. Rep. 10(1), 5033 (2020). [CrossRef]

16. M. R. Rai, A. Vijayakumar, and J. Rosen, “Non-linear adaptive three-dimensional imaging with interferenceless coded aperture correlation holography (I-COACH),” Opt. Express 26(14), 18143–18154 (2018). [CrossRef]

17. Y. H. Wan, C. Liu, T. Ma, et al., “Incoherent coded aperture correlation holographic imaging with fast adaptive and noise-suppressed reconstruction,” Opt. Express 29(6), 8064–8075 (2021). [CrossRef]

18. Y. C. Wu, Y. Rivenson, Y. B. Zhang, et al., “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704–710 (2018). [CrossRef]

19. F. Wang, Y. M. Bian, H. C. Wang, et al., “Phase imaging with an untrained neural network,” Light: Sci. Appl. 9(1), 77 (2020). [CrossRef]

20. R. Xiong, X. C. Zhang, X. Y. Ma, et al., “Enhancement of imaging quality of interferenceless coded aperture correlation holography based on physics-informed deep learning,” Photonics 9(12), 1 (2022). [CrossRef]

21. M. R. Rai and J. Rosen, “Noise suppression by controlling the sparsity of the point spread function in interferenceless coded aperture correlation holography (I-COACH),” Opt. Express 27(17), 24311–24323 (2019). [CrossRef]

22. M. Kumar, A. Vijayakumar, J. Rosen, et al., “Interferenceless coded aperture correlation holography with synthetic point spread holograms,” Appl. Opt. 59(24), 7321–7329 (2020). [CrossRef]

23. M. Kumar, V. Anand, and J. Rosen, “Interferenceless incoherent digital holography with binary coded apertures optimized using direct binary search,” Opt. Lasers Eng. 160(3), 107306 (2023). [CrossRef]

24. J. R. Fienup, “Phase retrieval algorithms - a comparison,” Appl. Opt 21(15), 2758–2769 (1982). [CrossRef]

25. L. C. Chen, G. Papandreou, I. Kokkinos, et al., “DeepLab: semantic image aegmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). [CrossRef]

26. S. T. Liu, D. Huang, and Y. H. Wang, “Receptive Field Block Net for Accurate and Fast Object Detection,” Computer Vision - Eccv 11215(1), 404–419 (2018). [CrossRef]

27. M. H. Zhang, Y. H. Wan, T. L. Man, et al., “Interferenceless coded aperture correlation holography based on Deep-learning reconstruction of Single-shot object hologram,” Opt. Laser Technol. 163(1), 109343 (2023). [CrossRef]

28. N. Hai and J. Rosen, “Doubling the acquisition rate by spatial multiplexing of holograms in coherent sparse coded aperture correlation holography,” Opt. Lett. 45(13), 3439–3442 (2020). [CrossRef]

29. O. Katz, P. Heidmann, M. Fink, et al., “Non-invasive single-shot imaging through scattering layers and around corners via speckle correlations,” Nat. Photonics 8(10), 784–790 (2014). [CrossRef]

30. L. Cohen, “Generalization of the Wiener-Khinchin theorem,” IEEE Signal Process. Lett. 5(11), 292–294 (1998). [CrossRef]

31. T. van Erven and P. Harremoës, “Renyi divergence and Kullback-Leibler divergence,” IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014). [CrossRef]

32. S. Roth and M. J. Black, “Fields of experts,” Int. J. Comput. Vis. 82(2), 205–229 (2009). [CrossRef]

33. X. Chen, H. Fang, T.Y. Lin, et al., Microsoft COCO Captions: Data Collection and Evaluation Server, 2015.

34. R. Kuschmierz, E. Scharf, D. F. Ortegón-González, et al., “Ultra-thin 3D lensless fiber endoscopy using diffractive optical elements and deep neural networks,” Light: Advanced Manufacturing 2(4), 415–424 (2021). [CrossRef]

35. X. L. Li, R. G. Li, Y. Q. Zhao, et al., “An improved model training method for residual convolutional neural networks in deep learning,” Multimed. Tools Appl. 80(5), 6811–6821 (2021). [CrossRef]

36. K. Dabov, A. Foi, V. Katkovnik, et al., “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. on Image Process. 16(8), 2080–2095 (2007). [CrossRef]

37. K. Zhang, W. M. Zuo, and L. Zhang, “FFDNet: toward a fast and flexible solution for CNN-based image denoising,” IEEE Trans. on Image Process. 27(9), 4608–4622 (2018). [CrossRef]

38. K. Zhang, W. M. Zuo, Y. J. Chen, et al., “Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

39. S.H. Gu, L. Zhang, W.M. Zuo, et al., “Weighted nuclear norm minimization with application to image denoising,” IEEE Conference on Computer Vision and Pattern Recognition2014, pp. 2862–2869.

Method	UNet	UNet + ResNet	BM3D	FFDNet	DnCNN	WNNM
variance = 0.01	39.1274	40.4377	38.9103	39.4390	39.6133	39.1765
variance = 0.05	30.0045	30.2331	29.4538	30.1011	30.1827	29.6487
variance = 0.1	27.9911	27.2228	26.1315	27.0314	27.0352	26.4514

Adaptive coded phase mask design and high-quality image reconstruction for interference-less coded aperture correlation holography

Abstract

1. Introduction

2. Imaging principle and image analysis of I-COACH

2.1 Imaging scheme of I-COACH

2.2 Impact of CPM design on imaging resolution

3. Proposed method

3.1 Feature extraction module

3.2 Mask generation module

3.3 Reconstruction module

3.4 Loss function

4. Experimental demonstrations and discussions

4.1 Implementation details

4.2 Analyzing the effectiveness of CPM design sub-network

4.3 Analyzing the effectiveness of image reconstruction sub-network

4.4 Imaging of 3D transmitted/reflective objects

5. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (1)

Equations (17)

Optics Express