Phase dual-resolution networks for a computer-generated hologram

Ting Yu; Shijie Zhang; Wei Chen; Wei Chen; Juan Liu; Xiangyang Zhang; Zijian Tian

doi:10.1364/OE.448996

1. Introduction

The holographic display is considered as a promising future display technology that accurately records the light waves of a real object as a hologram. Compared with traditional 3D imaging technology, the holographic display progressively becoming a key instrument in 3D scene reconstruction and augmented reality [1,2].

In the holographic display, the intensity and depth information of the object is recorded in the hologram. Currently, most existing spatial light modulators (SLMs) cannot modulate both the amplitude and phase of the optical wave at the same time. Phase-only holograms have become the major encoding method for CGHs due to the high optical efficiency of phase modulation and the absence of interference from conjugate images during reconstruction [3,4]. However, how to create high-quality CGHs in real-time is still a difficult problem to be solved in computational physics. The main issue of CGHs is to solve an ill-posed problem and identify the best possible wave modulation by solving nonlinear and non-convex inverse problems.

In the past few decades, people have developed different CGH algorithms. Common iterative optimization, Gercheberg-Saxton (GS) [5], requires continuous iterative projection between two optical planes until the reconstructed image meets the error setting. There are also some optimization algorithms based on GS. For example, the light field distribution is dispersed by adding random phases to the target amplitude image to minimize excessive light field concentration on the hologram [6,7]. However, the random phase also causes speckle noise in the optical reconstruction. The bidirectional error diffusion algorithm [8] removes the amplitude of the complex wave field. By bidirectional scanning of the odd and even lines of each pixel, the complex wave field is scattered in proportion to the unscanned pixels, which suppresses some speckle noise. Recently, some researchers have computed holograms by solving a non-convex optimization problem with an explicit loss function. Such as non-convex optimization with a gradient descent [9] or Wirtinger derivatives [10]. Non-convex optimization is a better solution than GS, but iterative optimization takes a long time to calculate. It cannot meet the real-time computation of CGH for large-scale images. Non-iterative algorithms include dual-phase amplitude coding (DPAC) [11] and one-step phase extraction [12], which are faster than iterative algorithms in the calculation. However, the spatial misalignment of the phase components in each complex amplitude modulation cell cannot be superimposed completely and coherently in the DPAC algorithm. The reconstructed image of the hologram has noise that cannot be eliminated. The one-step phase extraction method directly removes the amplitude part of the complex amplitude field to obtain a phase-only hologram. The lines and edges of the reconstructed image are uneven.

The iterative optimization algorithm and the non-iterative optimization algorithm always make a trade-off between the accuracy of the hologram and the computation time. In recent years, deep learning technology has been introduced into optics by degrees [13], which has also produced many gratifying results. In [14–16], the CNN is used for phase-only CGH, and the quality of the reconstruction is better than the iterative optimization algorithm. In [17], a large number of holograms are generated as a dataset by the Fresnel method to train a generative adversarial network (GAN). In [18], multiple images of different planes are used as the input of CNN, and holograms generated by the iterative method are used as the output to train the CNN. They realized the image reconstruction of multiple planes and proved the feasibility of generating multi-plane holograms based on deep learning. Shi et al. [19] introduced a large-scale Fresnel holographic dataset (MIT-CGH-4K) to train a CNN model for generating realistic 3D holograms. Peng et al. [20] are based on the SGD algorithm and use CITL technology to generate high-quality holograms. In each cycle, CITL technology is used to capture the optical reconstruction and feed it back to the algorithm. In addition, Peng et al. also used this method to train an interpretable model of optical field propagation for end-to-end high-resolution 2D holographic display.

The neural network can be used as a generalized functional approximation, which learns the mapping between input and output. However, the performance of CNNs depends on the quality of the training set. At present, CGH lacks a training set containing target amplitude images and accurate holograms. This has also become a key challenge for applying deep learning to CGH. More importantly, the convolutional layer operates on the spatial dimension of the input, so the CNN is most suitable for modeling and calculating the mapping of the spatial relationship between input and output. With direct phase inference, the CNN performs a cross-domain mapping of the illumination pattern, defined in the object plane, to the SLM phase mask, defined in the Fourier domain. The spatial correspondence is not preserved and the CNN capabilities are underutilized [15].

In this paper, the PDRNet algorithm is proposed to explore CNN’s capability for generating holograms. The CNN learns the mapping in the same domain instead of cross-domain mapping. Our algorithm comprises some parts. In the object plane, a CNN is used to learn the required phase to obtain the complex-valued wave field. In the hologram plane, a CNN estimates the hologram from the complex-valued wave field. The required recording distance between the object plane and the hologram plane is achieved by the angular spectrum method. Furthermore, the algorithm can complete the unsupervised learning of the mapping between the target amplitude and the hologram with natural image datasets. There is no need to make hologram masks. In the PDRNet algorithm, the CNN is an optimized dual-resolution network, and the loss function is a combination of MS-SSIM and MSE. They are crucial for generating high-quality holograms.

In section 2, we introduce the derivability of the angular spectrum and the detailed network structure. In section 3, to verify the proposed method, we set up comparative experiments to show the complexity of the model, and then reconstruct the hologram numerically and optically.

2. Methods

2.1 Differentiable properties of the angular spectrum

In a holographic display, the complex-valued wave field u_src generated by a coherent source such as a plane wave or a spherical wave is incident on the SLM. The wave field delays the phase ϕ(x, y) in a per-SLM-pixel manner, and then is propagated to the target plane in free space. There are three main methods for propagating the wave field between the SLM and the target plane in free space: Fresnel method based on fast Fourier transform (FT-FR), Fresnel method based on convolution (CV-FR), and the angular spectrum method. Among them, FT-FR is simple, but the sampling window and interval are proportional to the propagation distance. The sampling window and interval of other methods do not depend on the propagation distance, that is, the sampling window and interval are the same as the source field. In addition, CV-FR is only applicable to paraxial wave fields, while the angular spectrum method is applicable to both non-paraxial and paraxial wave fields. The formula for the angular spectrum method comes directly from the Kirchhoff or Rayleigh-Sommerfeld diffraction theory and is expressed as follows:

(1)$${u_z}({x_1},{y_1}) = {f_{\textrm{AS}}}\{ \phi (x,y)\} = IFFT\{ FFT\{ {u_{\textrm{src}}}(x,y)\, \cdot \,{e^{i\phi (x,y)}}\} \cdot H({f_x},{f_y})\} ,$$

where u_src denotes the complex-valued wave field generated by the coherent source; u_z denotes the complex-valued wave field at diffraction distance z; ϕ (x, y) denotes the phase-only hologram loaded into the SLM; FFT and IFFT are the Fourier and inverse Fourier transform operators respectively; H(f_x, f_y) is the angular spectrum transform function. The expression of H(f_x, f_y) is expressed by:

(2)$$H({f_x},{f_y}) = \left\{ \begin{array}{ll} {e^{i\frac{{2\pi }}{\lambda }z\sqrt {1 - {{(\lambda {f_x})}^2} - {{(\lambda {f_y})}^2}} }}& if\sqrt {{f_x}^2 + {f_y}^2} < \frac{1}{\lambda },\\0 &otherwise, \end{array} \right.$$

where f_x and f_y are the spatial frequencies; λ is the light wavelength, z is the diffraction distance, and H(f_x, f_y) is the transfer function of the angular spectrum method.

To achieve unsupervised training of the neural network without hologram masks, the angular spectrum method is used to reconstruct the phase-only hologram predicted by the network into an intensity image. From Eq. (1), u(x, y) is a content-independent coherent light source, so the derivability of the angular spectrum method is only related to H(f_x, f_y). From Eq. (2), the transfer function of the angular spectrum method is related to wavelength, propagation distance, pixel spacing, and spatial resolution. When the optical system is fixed, these parameters are fixed, so the transfer function can be regarded as a constant. Therefore, the calculation formula of the angular spectrum method is differentiable, and gradient propagation can be realized. According to the general approximation theorem of neural networks, a phase-only hologram is generated by the forward propagation of CNN, and the angular spectrum method is used to calculate the numerical reconstruction of the hologram. This process can be represented by:

(3)$$\widehat I = {f_{\textrm{AS}}}({\phi _{holo}}) = {f_{\textrm{AS}}}({f_{net}}(I)),$$

where I denotes the target amplitude, f_net denotes the approximation function of the CNN, ϕ_holo denotes phase-only hologram, Î denotes the reconstructed image. The angular spectrum method is differentiable to meet the backpropagation of the neural network, the process of calculating the gradient by the loss function and the angular spectrum method can be expressed by:

(4)$$\left\{ \begin{array}{l} \frac{{\partial \ell (\widehat I,I)}}{{\partial {w_k}}} = \frac{{\partial \ell (\widehat I,I)}}{{\partial {f_{\textrm{AS}}}}} \cdot \frac{{\partial {f_{\textrm{AS}}}}}{{\partial {\phi_{holo}}}} \cdot \frac{{\partial {\phi_{holo}}}}{{\partial {w_k}}},\\ \frac{{\partial \ell (\widehat I,I)}}{{\partial {b_k}}} = \frac{{\partial \ell (\widehat I,I)}}{{\partial {f_{\textrm{AS}}}}} \cdot \frac{{\partial {f_{\textrm{AS}}}}}{{\partial {\phi_{holo}}}} \cdot \frac{{\partial {\phi_{holo}}}}{{\partial {b_k}}}, \end{array} \right.$$

where w_k and b_k denote the hyperparameters of the k-th layer of the CNN, ϕ_holo represents the hologram.

2.2 PDRNet algorithm

The CNN has been successfully applied not only in the image domain but also in the natural language processing domains [21]. Both image and natural language have the common feature of spatial local correlation. With direct phase inference, the CNN performs a cross-domain mapping of the illumination pattern, defined in the object plane, to the SLM phase mask, defined in the Fourier domain. The local correlation and spatial invariance of cross-domain mapping are not strong, so the capabilities of CNN cannot be fully utilized. We hope that CNN can learn the mapping in the same domain.

We adopt the idea of iterative algorithms to propagate on the object plane and the hologram plane, but we use CNN to generate the required phase on the source plane or target plane. In the iterative algorithm, random phases can change the light field distribution, which improves the quality of the hologram. However, the random phase easily leads to the speckle noise of the reconstructed image, so we use CNN to learn the required phase. In the hologram plane, the GS method directly discards the amplitude information in the complex-valued wave field to obtain the hologram, which is easy to lose information. Therefore, we use CNN to learn how to obtain the hologram from the complex-valued wave field. Between the two optical planes, the angular spectrum method is used to achieve the recording distance and the reconstruction distance.

Our algorithm is shown in Fig. 1. In the object plane, the phase estimated by the CNN from the target amplitude forms a complex-valued wave field with the target amplitude. The complex-valued wave field is propagated to the hologram plane by the angular spectrum method. At the hologram plane, the complex-valued wave field is learned by the CNN to obtain the hologram. Up to here, we can complete the unsupervised training by reconstructing the hologram with the angular spectrum method. Equation (3) can be further expressed by:

(5)$$\widehat I = {f_{\textrm{AS}}}({\phi _{holo}}) = {f_{\textrm{AS}}}({f_{net2}}({f_{\textrm{AS}}}(I{e^{i{\kern 1pt} {f_{net1}}(I)}}))),$$

where Î denotes the reconstructed image, ϕ_holo denotes phase-only hologram, f_AS represents the angular spectrum method, I denotes the target amplitude, f_net₁ denotes the approximation function of the CNN in the object plane, f_net₂ denotes the approximation function of the CNN in the hologram plane.

Fig. 1. Structure of PDRNet. A trained CNN estimates the phase from target amplitudes. The estimated phase and target amplitude form a complex-valued wave field, U_z, which is then propagated to the hologram plane, U₀. To avoid the complex values as the input to the CNN, the complex-valued wave field is decomposed into amplitude and phase. The amplitude and phase are concatenated into a two-channel tensor as the input to the CNN. The trained CNN yields the hologram at the hologram plane. The consistency loss between the numerical reconstruction of the hologram and the target amplitude is calculated to realize unsupervised training of the network.

Download Full Size | PDF

2.3 CNN structure

Figure 2 shows the main components of the CNN structure for CGH. The dual-resolution network [22] is used to yield holograms instead of the U-net [23]. The high-resolution branch retains more high-frequency features, while the low-resolution branch extracts the high-level features of the target amplitude. In the U-net, encoder-decoder structure and shortcut are well suited for semantic segmentation tasks. However, for generating high-resolution holograms, the progressively lower spatial resolution of U-net with the progressive down sampling loses more detailed images and features. Although shortcut performs information compensation in recovering the mask, maintaining a certain spatial resolution using dilated convolution [24,25] seems to be a more fundamental solution. We followed the design principles of hybrid dilated convolution [25] to use dilated residual blocks for the dual-resolution network to prevent the arbitrary setting of the dilation factor leading to “gridding”. The maximum distance between two non-zero values in the convolution is expressed by:

(6)$${M_i} = \max [{M_{i + 1}} - 2{r_i},{M_{i + 1}} - 2({M_{i + 1}} - {r_i}),\;{r_i}],\quad {M_n} = {r_n},$$

where i denotes the i-th layer of convolution, M_i denotes the maximum distance between non-zero values in the output of the i-th layer of convolution. r_i denotes the hole factor of the i-th layer of convolution. The design goal is to let M₂ ≤ K. K is the kernel size of convolution. Please refer to [25] for more details.

Fig. 2. CNN structure for PDRNet. CBP denotes convolutional blocks, with convolution, batch normalization, and PReLU function. RB denotes the residual block and RBB denotes the residual bottleneck block. DAPPM represents the deep aggregation pyramid pooling module, which is important for extracting multi-scale contexts. The dual-resolution network takes the target amplitude or complex-valued wave field as input and the phase or hologram as output. Two CBP blocks, four residual blocks, and one residual bottleneck encode the features. The auxiliary branch contains two residual blocks, one residual bottleneck, and a DAPPM encodes auxiliary features. The long shortcut is used to concatenate encoding 1/4 features and decoding 1/4 features, PS is sub-pixel convolution.

Download Full Size | PDF

More GPU memory is easily consumed by generating high-resolution holograms, and the complexity of the network model will increase as the resolution of the hologram increases. We use group convolution to reduce the model complexity, which also allows PDRNet to be trained with a larger batch size. The larger batch size is beneficial to train a better model. For upsampling of recovery masks, the bilinear interpolation method loses more information. Deconvolution can easily have “uneven overlap,” putting more of the metaphorical paint in some places than others [26]. In the end, we chose sub-pixel convolution [27] as the upsampling method to obtain high-resolution images by periodic shuffling. Because the dense long shortcut increases the number of hyperparameters, we only introduce one long shortcut in the main branch of the dual-resolution network to enhance its feature decoding capability, i.e., 1/4 features of the encoder and 1/4 features of the decoder are concatenated.

2.4 Loss function

The phase-only hologram is generated by deep learning, the neural network converges to different optimal solutions depending on different loss functions. Researchers have also made some explorations in this area and got some positive results, such as [15,19] using MSE as the loss function, [20] using perceptual loss with MSE as the loss function, and [16] using perceptual loss with negative Pearson correlation coefficient (NPCC) as the loss function.

After the hologram is reconstructed in the PDRNet algorithm, then the task is similar to the super-resolution task in deep learning. To train PDRNet to generate holograms with reconstructed images conforming to the human visual system, we use a combination of multi-scale structural similarity (MS-SSIM) and MSE as the loss function. The mixed loss function is expressed by:

(7)$$\begin{aligned} &{\ell ^{\textrm{MS}\textrm{-}\textrm{SSIM}}}(I,\widehat I) = 1 - \textrm{MS - SSIM}(I,\widehat I),\\ &{\ell ^{\textrm{MSE}}}(I,\widehat I) = \frac{1}{N}\sum\limits_{p \in P} {{{(I(p) - \widehat I(p))}^2}} ,\\ &\ell (I,\widehat I) = \alpha {\ell ^{\textrm{MS - SSIM}}}(I,\widehat I) + (1 - \alpha ){\ell ^{\textrm{MSE}}}(I,\widehat I), \end{aligned}$$

where I denotes the target amplitude, Î denotes the reconstructed amplitude, and MS-SSIM(I, Î) is used to calculate the multi-scale structural similarity between I and Î, p denotes pixels, ${\ell ^{\textrm{MSE}}}$ is used to calculate the error pixel by pixel, empirically set α = 0.84.

The sensitivity of the human visual system (HVS) to noise depends on local variations in luminance, contrast, and structure. MS-SSIM as a loss function can meet the need for noise sensitivity of the human visual system. MS-SSIM uses multi-scale Gaussian kernels to avoid small Gaussian kernels to make speckle noise in flat areas of the reconstruction and large Gaussian kernels to make ringing artifacts in the edge of reconstruction. However, MS-SSIM is not sensitive enough to the consistency deviation, and the reconstructed image is dark overall. The MSE calculates the target amplitude and the error of the reconstructed image pixel by pixel, which can realize the compensation of the reconstructed image.

3. Experiment

The PDRNet algorithm is implemented with Python 3.8 and Pytorch 1.8.1. The super-resolution dataset DIV2K [28] was used to train PDRNet, with 800 images as the training set and 100 images as the validation set. The initial learning rate during training was 0.001, the learning rate optimization algorithm is RMSprop, and ReduceLRonPlateau provided by Pytorch is used as a learning rate scheduler. The learning rate reduction factor is set to 0.2. The training cycle is set to 65 epochs, but the loss gradually reaches a stable state around 35 epochs. The pixel pitch of the SLM phase is 8 μm. The laser wavelengths are 638 nm, 532 nm, and 450 nm, and the propagation distance is set to 40 cm. All algorithms shown in this paper are implemented on an i9-10900K CPU, 64GB RAM, and an NVIDIA RTX 3090 GPU with 24GB.

3.1 Simulation results

We compared several representative methods, as shown in Fig. 3. The Wirtinger algorithm [10] is used for 100 iterations to obtain a phase-only hologram, which is reconstructed using the angular spectrum method. Due to the smaller number of iterations, the reconstructed image contains more speckle noise, resulting in lower PSNR and SSIM. In addition, we also did experiments with 200 and 300 iterations. As the number of iterations increases, the noise becomes less and less, but the time also increases, so the iterative method cannot meet the needs of future real-time holographic displays. [16,20] used the U-net as an encoder-decoder, with the difference that [16] used the Fresnel propagation method and [20] used the angular spectrum method. The U-net generally uses deconvolution for upsampling, and improperly setting the kernel and the stride of the deconvolution is very likely to produce uneven overlap effects. Therefore, the obtained holograms are prone to contain ringing artifacts after reconstruction.

Fig. 3. Comparison of numerical reconstruction of green channel hologram. a, image reproduced from www.bigbuckbunny.org (© 2008, Blender Foundation) under the Creative Commons Attribution 3.0 license (https://creativecommons.org/licenses/by/3.0/); b, the image comes from DIV2 K [28].

Download Full Size | PDF

3.2 Validity of the Loss function

The effect of the loss function on the generated hologram can be observed indirectly by reconstructing the image. To illustrate the effectiveness of using MS-SSIM and MSE in this paper, several loss functions are used for comparison experiments with other experimental conditions fixed.

Numerical reconstructions of holograms are shown in Fig. 4, and a part of the 1080P reconstructed image is picked and cropped for illustration in this study. Among them, PDRNet, which only uses MSE as the loss function, generates the worst hologram, and the details of the numerical reconstruction are blurred, as shown in Fig. 4(a). The MSE loss function calculates the error pixel by pixel, which has a strong penalty for large errors and a weak penalty for small errors. It ignores the impact on the image content, and does not take into account that the human visual system is sensitive to changes in brightness and color in texture-free regions of the image. Figure 4(b) is trained using the perceptual loss and MSE commonly used in tasks such as super-resolution based on deep learning. It is much improved compared to Fig. 4(a), but the sky part is still not pure enough and the animal fuzz is not clear enough. The perceptual loss consumes the most GPU storage in the experiment. Figure 4(c) is the numerical reconstruction of the hologram generated by PDRNet trained with MS-SSIM and MSE, and the numerical reconstruction is the closest to the ground truth. Finally, we also performed a constraint experiment similar to [20]. The MSE loss between the amplitude in the hologram plane and the target amplitude in the object plane is calculated. However, this method does not apply to our algorithm as shown in Fig. 4(d). It may be that the increased MSE constraint affects the consistency of the PDRNet.

Fig. 4. Comparison of the reconstruction of loss function on PDRNet.

Download Full Size | PDF

3.3 Complexity and performance

We compared the PDRNet algorithm with several representative algorithms, as shown in Fig. 5. For the iterative methods, GS holography and Wirtinger holography, the reconstruction quality increases as the computation time increases. Wirtinger holography achieves rare high-quality reconstructions but has the longest computation time. The DPAC algorithm has the fastest computation speed of all algorithms but also has poorer reconstruction quality. Using U-net to learn the mapping between target amplitude and the hologram directly is a very good attempt to trade-off between computational speed and reconstruction quality. PDRNet algorithm achieves higher reconstruction quality while having comparable computational time to the U-net. For the generation of 1080P resolution holograms, the PDRNet only takes 57 ms and achieves the reconstruction quality of 31.17 dB PSNR and 0.93 SSIM.

Fig. 5. Runtime and image quality.

Download Full Size | PDF

In addition, Table 1 shows the complexity and performance of our PDRNet algorithm before and after optimization under the same test image. The dilated convolution can increase the perceptual field of the convolutional network without increasing the trainable parameters and FLOPs. FLOPs is the theoretical amount of floating point arithmetics. The grouped convolution reduces the trainable parameters of the network as well as the FLOPs, adding a little bit of memory usage of GPU during training. When the convolutional neural network is upsampling, the resolution of the feature map is doubled, and sub-pixel convolution will reduce the number of channels of the feature map by a factor of 4. When the number of channels is lowered, the number of trainable parameters and the number of multiplication and addition operations are reduced as well.

Table 1. Complexity before and after optimization

View Table

It will reduce considerable computation time for generating large-size holograms. Compared with the original dual-resolution network, our optimized dual-resolution network not only improves the computation speed, but also significantly improves the quality of the generated hologram.

3.4 Generalization capability

To evaluate the generalization capability of the PDRNet algorithm, the test images are randomly picked from the DIV2K validation set and the big buck bunny video. The image in the big buck bunny video has never appeared in the training and validation sets. The PDRNet algorithm is tested with single-channel images. The phase-only hologram and simulation results are shown in Fig. 6(top). The details on the two sets of images are the result of magnification 3 times and 2.5 times. The contrast and brightness of the image are close to the original target amplitude, and the detailed parts can be well reconstructed, such as the detail of the car and the abdomen of the big buck bunny.

Fig. 6. Hologram and reconstruction. a, the image comes from DIV2 K [28]; b, image reproduced from www.bigbuckbunny.org (©2008, Blender Foundation) under the Creative Commons Attribution 3.0 license (https://creativecommons.org/licenses/by/3.0/).

Download Full Size | PDF

In practice, the hologram is illuminated by white light in a full-color holographic display, so it is necessary to generate holograms with three channels. The wavelength is a hyperparameter when recording a hologram in the PDRNet algorithm, as known in Section 2.1. The images of the three channels have homogeneity. In the simulation, the PDRNet algorithm can be well generalized to different color channels for generating high-quality holograms. PDRNet generates three holograms, which are then merged into full-color holograms, as shown in Fig. 6(bottom). The reconstruction details can be seen more clearly in the RGB image.

Finally, because the numerical reconstruction is a discrete sample of the reconstructed intensity and does not represent the actual display, we use an optical device to reconstruct the hologram generated by the PDRNet algorithm. The experimental setup is illustrated in Fig. 7(a). The SLM was Holoeye Photonics AG with the pixel pitch of 8 um. The wavelength of the laser is 532 nm. The reconstruction distance is 0.4 m. Figure 7(b) shows the comparison of reconstruction. Our algorithm achieves less speckle noise than the Wirtinger method that iterates 300 times.

Fig. 7. Experimental results. (a) The experimental setup. (b) Comparison of grayscale image reconstruction quality.

Download Full Size | PDF

4. Conclusion

In this study, we present PDRNet for fast and accurate CGHs. In comparison to existing CNN-based CGH algorithms, the key to PDRNet for generating high-quality holograms is that the CNN is used to learn the mapping on the same optical plane rather than crossing optical planes. It fully utilizes CNN's capacity to address problems involving local correlation and spatial invariance. The dual-resolution CNN optimized by grouped convolution, dilated convolution, and sub-pixel convolution can generate holograms more quickly and accurately. The trained PDRNet generalizes better, and the computational complexity required to compute the same resolution hologram is fixed. It takes an average of 57 ms to generate a 1080P hologram. Furthermore, the reconstructed image has a higher SSIM and PSNR, and is more in line with the human visual system, owing to the use of MS-SSIM and MSE as the loss function to indirectly improve the hologram. Finally, PDRNet achieves unsupervised training by numerical reconstruction of holograms using the angular spectrum method, and no ground truth holograms are required. The user can select the training set to best match the user's needs to compute the hologram.

Funding

National Natural Science Foundation of China (51874300, 52074305); National Natural Science Foundation of China-Shanxi Joint Fund for Coal-Based Low-Carbon Technology (U1510115); Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences (20190902, 20190913).

Acknowledgments

This work was supported by the NSFC under Grant 51874300 and 52074305, respectively, the NSFC and Shanxi Provincial People’s Government Jointly Funded Project of China for Coal Base and Low Carbon under Grant U1510115 and the Open Research Fund of Key Laboratory of Wireless Sensor Network and Communication, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, under Grant 20190902 and Grant 20190913. We wish to thank the Beijing Engineering Research Center for Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, for providing optical devices.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Z. Zhang, J. Liu, X. Duan, and Y. Wang, “Enlarging field of view by a two-step method in a near-eye 3D holographic display,” Opt. Express 28(22), 32709–32720 (2020). [CrossRef]

2. D Wang, D Xiao, N. N. Li, C Liu, and Q. H. Wang, “Holographic display system based on effective area expansion of SLM,” IEEE Photonics J. 11(6), 1–12 (2019). [CrossRef]

3. Y. Z. Liu, J. W. Dong, Y. Y. Pu, B. C. Chen, H. X. He, and H. Z. Wang, “High-speed full analytical holographic computations for true-life scenes,” Opt. Express 18(4), 3345–3351 (2010). [CrossRef]

4. H. Dammann and K. Görtler, “High-efficiency in-line multiple imaging by means of multiple phase holograms,” Opt. Commun. 3(5), 312–315 (1971). [CrossRef]

5. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35(2), 237–246 (1972).

6. J. S. Liu and M. R. Taghizadeh, “Iterative algorithm for the design of diffractive phase elements for laser beam shaping,” Opt. Lett. 27(16), 1463–1465 (2002). [CrossRef]

7. P. Zhou, Y. Li, S. Liu, and Y. Su, “Dynamic compensatory Gerchberg-Saxton algorithm for multiple-plane reconstruction in holographic displays,” Opt. Express 27(6), 8958–8967 (2019). [CrossRef]

8. P. W. M. Tsang and T. C. Poon, “Novel method for converting digital Fresnel hologram to phase-only hologram based on bidirectional error diffusion,” Opt. Express 21(20), 23680–23686 (2013). [CrossRef]

9. J. Zhang, N. Pégard, J. Zhong, H. Adesnik, and L. Waller, “3D computer-generated holography by non-convex optimization,” Optica 4(10), 1306–1313 (2017). [CrossRef]

10. P. Chakravarthula, Y. Peng, J. Kollin, H. Fuchs, and F. Heide, “Wirtinger holography for near-eye displays,” ACM Trans. Graph. 38(6), 1–13 (2019). [CrossRef]

11. X. Sui, Z. He, G. Jin, D. Chu, and L. Cao, “Band-limited double-phase method for enhancing image sharpness in complex modulated computer-generated holograms,” Opt. Express 29(2), 2597–2612 (2021). [CrossRef]

12. P. W. M. Tsang, Y. T. Chow, and T. C. Poon, “Generation of edge-preserved noise-added phase-only hologram,” Chinese Opt. Lett. 14(10), 100901 (2016). [CrossRef]

13. L. Salmela, N. Tsipinakis, A. Foi, C. Billet, J. M. Dudley, and G. Genty, “Predicting ultrafast nonlinear dynamics in fibre optics with a recurrent neural network,” Nat Mach Intell 3(4), 344–354 (2021). [CrossRef]

14. R. Horisaki, R. Takagi, and J. Tanida, “Deep-learning-generated holography,” Appl. Opt. 57(14), 3859–3863 (2018). [CrossRef]

15. M. H. Eybposh, N. W. Caira, M. Atisa, P. Chakravarthula, and N. C. Pégard, “DeepCGH: 3D computer-generated holography using deep learning,” Opt. Express 28(18), 26636–26650 (2020). [CrossRef]

16. J. Wu, K. Liu, X. Sui, and L. Cao, “High-speed computer-generated holography using an autoencoder-based deep neural network,” Opt. Lett. 46(12), 2908–2911 (2021). [CrossRef]

17. A. Khan, Z. Zhijiang, Y. Yu, M. A. Khan, K. Yan, and K. Aziz, “GAN-Holo: Generative adversarial networks-based generated holography using deep learning,” Complexity 2021, 1–7 (2021). [CrossRef]

18. J. Lee, J. Jeong, J. Cho, D. Yoo, B. Lee, and B. Lee, “Deep neural network for multi-depth hologram generation and its training strategy,” Opt. Express 28(18), 27137–27154 (2020). [CrossRef]

19. L. Shi, B. Li, C. Kim, P. Kellnhofer, and W. Matusik, “Towards real-time photorealistic 3D holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]

20. Y. Peng, S. Choi, N. Padmanaban, and G. Wetzstein, “Neural holography with camera-in-the-loop training,” ACM Trans. Graph. 39(6), 1–14 (2020). [CrossRef]

21. G Liu and J Guo, “Bidirectional LSTM with attention mechanism and convolutional layer for text classification,” Neurocomputing 337, 325–338 (2019). [CrossRef]

22. Y. Hong, H. Pan, W. Sun, and Y. Jia, “Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes,” https://arxiv.org/abs/2101.06085.

23. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

24. F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2017), pp. 472–480.

25. P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Understanding convolution for semantic segmentation,” in Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision, (2018), pp. 1451–1460.

26. J. Gauthier, “Conditional generative adversarial nets for convolutional face generation,” Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester, 2014(5), 2.

27. W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2016), pp. 1874–1883.

28. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2017), pp. 126–135.

	Params	FLOPs	Memory	Time	PSNR	SSIM
Before optimization	4.02×10⁷	380.37G	7.03GB	79ms	23.87	0.61
After optimization	3.70×10⁷	271.63G	4.64GB	57ms	31.17	0.93

Phase dual-resolution networks for a computer-generated hologram

Abstract

1. Introduction

2. Methods

2.1 Differentiable properties of the angular spectrum

2.2 PDRNet algorithm

2.3 CNN structure

2.4 Loss function

3. Experiment

3.1 Simulation results

3.2 Validity of the Loss function

3.3 Complexity and performance

3.4 Generalization capability

4. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (1)

Equations (7)

Optics Express