Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Vision transformer empowered physics-driven deep learning for omnidirectional three-dimensional holography

Open Access Open Access

Abstract

The inter-plane crosstalk and limited axial resolution are two key points that hinder the performance of three-dimensional (3D) holograms. The state-of-the-art methods rely on increasing the orthogonality of the cross-sections of a 3D object at different depths to lower the impact of inter-plane crosstalk. Such strategy either produces unidirectional 3D hologram or induces speckle noise. Recently, learning-based methods provide a new way to solve this problem. However, most related works rely on convolution neural networks and the reconstructed 3D holograms have limited axial resolution and display quality. In this work, we propose a vision transformer (ViT) empowered physics-driven deep neural network which can realize the generation of omnidirectional 3D holograms. Owing to the global attention mechanism of ViT, our 3D CGH has small inter-plane crosstalk and high axial resolution. We believe our work not only promotes high-quality 3D holographic display, but also opens a new avenue for complex inverse design in photonics.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Holography is one of the most ideal approaches to realize three-dimensional (3D) display because its reconstructed image contains all the cues that the human visual system needs to conceive the 3D world [14]. Through calculation of the holographic field from the target 3D scenes and then stores the computed holographic field information in a holographic medium, one can reconstruct the 3D object at the pre-designed position upon appropriate light illumination. So far, the holographic medium has evolved from traditional photographic emulsions and polymers [5] to digitally reconfigurable spatial light modulators [6] (SLMs) and digital mirror devices [7] (DMDs), and the boomingly developed ultra-thin subwavelength metasurfaces [811] in recent decades. The development in holographic medium brings more possibilities to the holographic world, such as more flexible reconfigurability, higher resolution, larger field of view (FOV), and broader working bandwidth and so on. On the other hand, the calculation of the holographic field which reconstructs the far-field 3D image plays a vital role in the final 3D display performance.

Different from two-dimensional (2D) holograms and multiplane holograms, a 3D hologram needs to convey continuous depth information to form a realistic 3D scene. Therefore, how to overcome the crosstalk between the tightly positioned 2D layers becomes a critical issue in the calculation of 3D holographic field. So far, lots of efforts have been devoted to deal with this problem. A majority of current reported 3D holograms apply 3D objects composed of orthogonal 2D layers (e.g., RGB-D images [1,8,12,13], orthogonally sliced 2D layers or patterns [14,15] and so on) as initial targets to reduce the difficulty of eliminating the inter-plane crosstalk. However, such holograms are unidirectional 3D holograms as shown in the upper section of Fig. 1. To have a 4π solid angle view of the 3D object, one must regenerate the 3D patterns from other perspectives, and re-calculates the 3D holographic field at each different perspective, which is very time-consuming and computationally inefficient. Recently, some reported works apply 3D objects composed of quasi-orthogonal 2D layers [16] as initial targets to decrease inter-plane crosstalk. According to the central limit theorem and the law of large numbers, through adding random noise [16] or engineered noise [17] to the original 2D sliced images, the correlation between the sliced 2D layers will approach to zero when the number of pixels of the hologram is sufficiently large. In this case, the engineered sliced 2D layers constituting the 3D object will become quasi-orthogonal, and the crosstalk between different 2D layers thus be suppressed. However, this will intrinsically induce speckle noise to the 3D hologram, sacrificing the signal-to-noise ratio of the 3D display. On the other hand, this method has a stringent requirement of large pixel numbers to form quasi-orthogonal random vectors (RV). Extension of this method that exploits scattering medium instead of pseudo random phase masks alone to improve the axial resolution of the 3D hologram has also been reported recently [18]. In parallel to these efforts, iterative algorithms such as global Gerchberg–Saxton (GS) [14], iterative Fourier transform algorithm (IFTA) [17,19], non-convex optimization [20] and gradient-descent optimization [21,22] have also been investigated to solve this ill-posed inverse design problem. To lower the impact of the inter-plane crosstalk in the final reconstructed 3D display, the distances between adjacent layers are usually intentionally lengthened, leading the reconstructed 3D object out of its original scale. Therefore, how to realize low crosstalk omnidirectional 3D hologram while maintaining both high axial resolution and high signal-to-noise ratio in 3D display remains elusive.

 figure: Fig. 1.

Fig. 1. Comparison between uni-directional 3D hologram (the upper section) and omnidirectional 3D hologram (the lower section). A uni-directional 3D hologram can only present the complete view of a 3D object in a pre-designed direction (front view in this case) while in other views the reconstructed 3D object presents as incomplete. An omnidirectional 3D hologram can reconstruct a complete 3D object from all views.

Download Full Size | PDF

Recently, as the flourishing development in computer science, learning-based algorithms has nurtured the development of CGH [1,13,2328]. So far, most reported deep-learning based CGHs rely on convolutional neural network (CNN) architecture [29]. Due to the compatible necessity and limited receptive field of CNN, the reported 3D CGHs have limited axial resolution [23] and display quality [30,31]. Recently, Vision Transformer (ViT) has attracted researchers’ attention as it outperforms the most advanced convolutional networks in many computer vision tasks [32]. In this work, through embedding ViT into our physics-driven deep learning model, we break the current limitation in 3D holography generation and realize phase-only omnidirectional 3D holography generation with small crosstalk and no speckle noise while maintaining high axial resolution. Owing to the global attention mechanism of ViT, the fitting ability of the network is strongly enhanced, and the ill-posed inverse design problem is beautifully solved. We believe our work not only promotes the development of 3D display, but also provides a promising candidate for inverse design problems in photonics.

2. Method and results

Figure 2 shows the scheme of the proposed ViT empowered physics-driven deep learning neural network (VPDLNN) enabled omnidirectional 3D hologram generation. An auto-encoder architecture [26] is applied here to directly implant the physical diffraction propagation model into our network. The encoder part is composed of a ViT and a linear regression module, while the decoder part is implemented with a Fresnel diffraction propagation propagator. To realize the omnidirectional 3D hologram, the target 3D object should firstly be appropriately sampled. Then the sampled 3D target is directly sliced into sufficiently dense layers to preserve the detail of the original 3D object and forms a 3D matrix. This way, a set of parallel arranged 2D original images ${O_i}({x,y} )({i = 1,2,3, \ldots } )$ containing the cross-sectional contents at different depth ${z_i}({i = 1,2,3, \ldots } )$ of the original 3D object is contained in the 3D matrix. Then, the 2D images (i.e. the 3D matrix) are split into patches and these patches with position embedding are fed into the ViT module. The ViT module applied here is a little different from the original ViT [32]. To keep input and output images shape the same, we remove the classification head and add an inverse linear projection head to the output of each patch embedding. After the customized ViT module, a series of phase maps ${\varphi _i}({x,y} )({i = 1,2,3, \ldots } )$ for the 2D images ${T_i}({x,y} )$ can be obtained. It is worth noting that, in the implementation of ViT, the 3D object can be treated as 2D images with multiple channels, and each channel denotes a distinct depth ${z_i}$. So, the process of Fig. 2 can be accomplished by a single forward propagation. Then these phase maps will go through a linear regression module to fuse into a new phase map denoted as $\Psi ({x,y} )$, which is the phase-only hologram for the omnidirectional 3D hologram (see more detail about the vision transformer module and the linear regression module in Supplement 1 Part 1). This new phase distribution $\Psi ({x,y} )$ works as the input of the decoder. The decoder then generates a set of diffracted images ${T_i}({x,y} )({i = 1,2,3, \ldots } )$ at different depth ${z_i}$ based on the phase-only hologram $\Psi ({x,y} )$ according to the Fresnel diffraction propagation. Here, the diffraction propagation operator can be expressed as follows:

$${T_i}({x,y} )= {F^{ - 1}}\{{F\{{\textrm{exp}[{i\Psi ({x,y} )} ]} \}F\{{{h_i}({x,y} )} \}} \}, $$
where $F\{{\cdot} \}$ denotes the Fourier transform operator, and ${F^{ - 1}}\{{\cdot} \}$ denotes the inverse Fourier transform operator. Here, ${h_i}({x,y} )$ denotes the impulse response at propagation distance ${z_i}$, and is given as:
$${h_i}({x,y} )= \frac{{{e^{jk{z_i}}}}}{{j\lambda {z_i}}}\textrm{exp}\left[ {\frac{{jk}}{{2{z_i}}}({{x^2} + {y^2}} )} \right]. $$

Once obtaining the reconstructed parallel 2D images ${T_i}({x,y} )({i = 1,2,3, \ldots } )$, comparison between the reconstructed 2D images ${T_i}({x,y} )({i = 1,2,3, \ldots } )$ and the original 2D images ${O_i}({x,y} )({i = 1,2,3, \ldots } )$ will be processed according to the loss function. The computed loss then propagates backward to the encoder, where the parameters of both the ViT module and the linear regression module will be updated during the training process. In VPDLNN, the opposite value of the mean correlation coefficient (MCC) is applied as the loss function. The formulation of the MCC loss function can be written as:

$$Loss({T,O} )= ({ - 1} )\times mean\left( {\frac{{\mathop \sum \nolimits_{i = 1}^n ({{T_i} - \bar{T}} )({{O_i} - \bar{O}} )}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^n {{({{T_i} - \bar{T}} )}^2}\mathop \sum \nolimits_{i = 1}^n {{({{O_i} - \bar{O}} )}^2}} }}} \right), $$
where $\bar{T}$ and $\bar{O}$ refer to the averaged value of the reconstructed parallel 2D images ${T_i}({x,y} )({i = 1,2,3, \ldots } )$ and the original sliced 2D images ${O_i}({x,y} )({i = 1,2,3, \ldots } )$ respectively, and n is the total number of the sliced 2D images. After several epochs of training, the loss function comes to a relatively steady value, and the finalized $\Psi ({x,y} )$ is exported as the phase-only hologram for omnidirectional 3D hologram. In our model, the ViT module promotes the fitting ability of our network, and with further data fusion through the linear regression module, a low crosstalk 3D phase-only hologram can be achieved.

 figure: Fig. 2.

Fig. 2. Scheme of the ViT empowered physics-driven deep learning neural network (VPDLNN) omnidirectional 3D hologram generation. A 3D object is firstly sampled and sliced along the propagation direction to form a 3D matrix. Then the 3D matrix is fed into the encoder part which outputs a phase. The phase goes through the Fresnel diffraction propagation operator and generates a set of 2D images at different depths. The reconstructed image set compares with the original 2D image set through the loss function. The loss value is propagated backward and the parameters in the encoder is thus updated.

Download Full Size | PDF

To demonstrate the validity of our method, we choose 3D objects from point cloud dataset ModelNet40 as our initial target 3D objects. We set the pixel size of the hologram as 9.2${\mathrm{\mu} \mathrm{m}}$ in both numerical simulation and experiment to match the parameter of the SLM, and the pixel number used is $1024 \times 1024$. The SLM used in our measurement is from meadowlark optics with 9.2${\mathrm{\mu} \mathrm{m}}$ pitch size and a total pixel number of $1920 \times 1152$. The wavelength of the incident light is 532 nm. All the simulations were run on a notebook workstation with an Intel Core CPU i9-11950H (2.60 GHz) and a NVIDIA RTX A5000 GPU. The RAM size of our notebook workstation is 128GB.

Figure 3(a) shows the simulation results of a VPDLNN generated 3D omnidirectional airplane. The VPDLNN generated hologram is provided in Supplement 1 Fig. S2 (a). The original 3D airplane is cut along the optical path’s direction into 32 slices, and the original 32 cross-sections of the airplane are presented in Supplement 1 Fig. S3 as reference. The projection of the 3D airplane starts at around 16.3 cm after the SLM and spans 20 mm with uniform depth interval of 0.625 mm. The simulation results clearly show the variation of the cross-sections from the airplane head to its tail. The first slice shows the cross-section of the airplane’s head. As the distance increases, the diameter of the airplane head grows until it comes to the body of the airplane. The nose landing gear is clearly shown under the head of the airplane in slice 3 and 4. The wings begin to emerge from slice 13. The wings’ length generally grow as the distance increases. When it comes to the tail of the airplane, the turrets of the airplane come out. Finally, the stabilizer and elevator of the airplane appear. Our simulation results show almost every detail of the airplane with small inter-plane crosstalk. To further demonstrate the true three-dimensionality of our VPDLNN generated hologram, we reconstructed the 3D airplane using the volshow function from Matlab. The volshow function offers a volumetric display of the reconstructed 3D scene, that you can see any perspective of the reconstructed 3D scene through rotating the 3D volume. Here, we select different perspectives from the airplane’s side view, top view, front view and back view as shown in Fig. 3(b). Every different view shows a complete view of the airplane with all components clearly observed.

 figure: Fig. 3.

Fig. 3. Generation of the omnidirectional 3D airplane. (a) The simulated 32 cross-sections of the reconstructed 3D omnidirectional airplane. Scale bar, 1.57 mm. (b) The simulated reconstructed 3D omnidirectional airplane under different perspectives.

Download Full Size | PDF

To validate our simulation, experiments are carried out. The measured cross-sections of the 3D airplane at different depths are provided in Visualization 1. Figure 4(a) show the measured cross-sections at different depths which are captures from Visualization 1. It can be seen that our measured results agree well with our simulations. The important components of the airplane, such as the nose landing gear, wings, turrets, stabilizer etc. can all be clearly observed in the measured results. A 4-f optical system depicted in Fig. 4(b) is applied in our measurement. The camera is set on an optical rail so that it can be moved along light’s propagation direction to obtain the reconstructed images at different depths.

 figure: Fig. 4.

Fig. 4. Optical characterization of VPDLNN generated omnidirectional 3D airplane. (a) Measured cross-sections of the reconstructed omnidirectional 3D airplane at different depths. (b) The scheme of the optical characterization system. The camera is set on an optical rail so that it can be moved along the light’s propagation direction to capture the reconstructed images at different depths.

Download Full Size | PDF

To further demonstrate the axial resolution of our proposed VPDLNN 3D hologram, we choose to reconstruct a 3D chair with a total depth of only 5 mm. Again, the original 3D chair is cut into 32 slices from the chair’ front to back, and the original 32 slices is provided in Supplement 1 Fig. S4 as reference. The projection of the 3D chair starts at around 16.3 cm after the SLM with uniform depth interval of around 0.1563 mm. To prove the true three-dimensionality of our generated hologram, we directly reconstructed the 3D chair from front view, back view, top view and side view respectively as shown in Fig. 5(a). According to the 3D simulation results, under each perspective, the chair’s detail information including its front and back legs, arms and cushions, and the pattern of the back of the chair are all well reconstructed with small crosstalk. The simulated 32 cross-sections of the 3D chair are provided in Supplement 1 Fig.S5. To demonstrate the validity of our method, we measured the reconstructed 3D chair based on the 4-f system depicted in Fig. 4(b). The full captures of the cross-sections of the 3D chair at different depths are provided in Visualization 2. Figure 5(b) shows the measured 3D chair at front view, back view, top view and side view respectively. Due to the limited viewing angle of the SLM, the data of the measured 3D chair is obtained through stitching the captured slices at different depths from Visualization 2 together (via post-processing in 3D Slicer). The images from different perspectives are obtained through rotating the reconstructed 3D image in 3D Slicer to different perspectives. Our measured results have excellent agreement with the simulation ones, further guaranteeing the validity of our VPDLNN.

 figure: Fig. 5.

Fig. 5. Generation of an omnidirectional 3D chair. (a) Simulation results of the reconstructed omnidirectional 3D chair from different perspectives. (b) Measured 3D chair under different perspectives. The 3D image data is obtained through stitching the captured slices at different depths from Visualization 2 together via post-processing in 3D Slicer. The images from different perspectives are obtained through rotating the reconstructed 3D image in 3D Slicer to different perspectives.

Download Full Size | PDF

3. Discussion

Finally, to further demonstrate the superiority of our proposed VPDLNN enabled 3D CGH. We compared the overall performance of our reconstructed 3D holograms versus 3D holograms reconstructed from global GS (GG) and IFTA based on RV (see exemplified simulation results using these two methods in Supplement 1 Part 7) from the following four aspects: correlation coefficient (CC), root mean square estimation (RMSE), structural similarity (SSIM) and peak signal-to-noise ratio (PSNR). For each metric, we calculated the averaged value of all reconstructed 2D layers, and the standard derivation of all reconstructed 2D layers to evaluate the comprehensive quality of the 3D volumetric display. And the results are shown in Fig. 6. In this experiment, we used VPDLNN, GG (300 iterations), and IFTA (300 iterations) based on RV (indicated as RV in Fig. 6) to generate the phase-only holograms of the 3D airplane and 3D chair respectively. Figure 6(a) and Fig. 6(b) show the results of the performance comparison of the reconstructed 3D airplane using the three methods. And Fig. 6(c) and Fig. 6(d) show the performance comparison of the reconstructed 3D chair. To guarantee the validity of the data, all data provided in the diagram is the average value from 100 simulations.

 figure: Fig. 6.

Fig. 6. Performance comparison between VPDLNN, RV, and Global GS. (a) The mean CC values (0.4578, 0.2802 and 0.2485 respectively) and mean RMSE values (0.0207, 0.0224 and 0.0513 respectively) are evaluated for the 1024 × 1024 resolution 3D airplane using VPDLNN, RV + IFTA, and Global GS respectively. (b) The mean SSIM values (0.9305, 0.4735, and 0.2629 respectively) and mean PSNR values (33.8477, 33.3908 and 25.891 dB respectively) are evaluated for the 1024 × 1024 resolution 3D airplane using VPDLNN, RV + IFTA, and Global GS respectively. (c) The mean CC values (0.4806, 0.3691 and 0.2076 respectively) and mean RMSE values (0.0139, 0.016 and 0.0294 respectively) are evaluated for the 1024 × 1024 resolution 3D chair using VPDLNN, RV + IFTA, and Global GS respectively. (d) The mean SSIM values (0.9389, 0.7964, and 0.411 respectively) and mean PSNR values (37.4681, 36.6868 and 30.9591 dB respectively) are evaluated for the 1024 × 1024 resolution 3D chair using VPDLNN, RV + IFTA, and Global GS respectively.

Download Full Size | PDF

According to Fig. 6, it is obvious that our proposed VPDLNN show better performance than the other two from all four aspects. The reconstruction of the 3D airplane from VPDLNN has much higher mean CC and lower mean RMSE than the other two. And the mean SSIM of the 3D airplane generated from VPDLNN is 0.9305, which is almost twice as high as that generated from GG and is more than three times than that generated from RV. Meanwhile, our mean PSNR is also the highest. For the reconstruction of the 3D chair, which only spans 5 mm and has a bigger challenge in the axial resolution of the 3D hologram, the performance of VPDLNN is also superior to the other two methods from all four aspects. It should be noted that not only the 3D images reconstructed from VPDLNN has higher CC, SSIM, PSNR and lower RMSE than the other two methods, the 3D images reconstructed from VPDLNN also have lower standard derivations of the above valued metrics in most cases. This indicates that the VPDLNN reconstructed 3D hologram has a steadier performance among different layers. Therefore, we can conclude that our proposed VPDLNN exceeds other two methods in all four aspects.

4. Conclusion

In this work, we propose a ViT empowered physics-driven deep learning neural network to realize omnidirectional 3D CGH with small crosstalk and high axial resolution. The strong fitting ability of our network enables the generation of phase-only 3D hologram with excellent performance compared with previous methods. Due to the limited viewing angle of our SLM, the current measured 3D reconstruction results are obtained through post-processing of the measured data as explained above. Combining our method with subwavelength metasurface or other wide viewing-angle systems, our method can realize complete 3D volumetric display from any perspective within the viewing angle range. Our current demonstration is realized using commercial phase-only SLM, through minor modification of the network, our method may also apply for amplitude-controlled DMDs or other platforms. We believe our work not only promotes high quality omnidirectional 3D display, but also provides a promising candidate for solving complex inverse design problems in both 2D and 3D light manipulation. Through embedding appropriate physical models into the network, our proposed model may solve other complex inverse problems in the future.

Funding

National Natural Science Foundation of China (12004362, 12204446, 12274386, 12304434).

Acknowledgements

The authors would like to thank the AI training platform supporting this work provided by High-Flyer AI. (Hangzhou High-Flyer AI Fundamental Research Co., Ltd.).

Disclosures

The authors declare no conflicts of interest.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. L. Shi, Beichen Li, Changil Kim, et al., “Towards real-time photorealistic 3D holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]  

2. P.-A. Blanche, “Holography, and the future of 3D display,” Light: Advanced Manufacturing 2(4), 1 (2021). [CrossRef]  

3. A. H. Dorrah, Priyanuj Bordoloi, Vinicius S. de Angelis, et al., “Light sheets for continuous-depth holography and three-dimensional volumetric displays,” Nat. Photonics 17(5), 427–434 (2023). [CrossRef]  

4. Y. Zhao, Liangcai Cao, Hao Zhang, et al., “Accurate calculation of computer-generated holograms using angular-spectrum layer-oriented method,” Opt. Express 23(20), 25440–25449 (2015). [CrossRef]  

5. P. A. Blanche, A. Bablumian, R. Voorakaranam, et al., “Holographic three-dimensional telepresence using large-area photorefractive polymer,” Nature 468(7320), 80–83 (2010). [CrossRef]  

6. S. Fukushima, Takashi Kurokawa, Masayoshi Ohno, et al., “Real-time hologram construction and reconstruction using a high-resolution spatial light modulator,” Appl. Phys. Lett. 58(8), 787–789 (1991). [CrossRef]  

7. A. Drémeau, Antoine Liutkus, David Martina, et al., “Reference-less measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques,” Opt. Express 23(9), 11898–11911 (2015). [CrossRef]  

8. L. Huang, Xianzhong Chen, Holger Mühlenbernd, et al., “Three-dimensional optical holography using a plasmonic metasurface,” Nat. Commun. 4(1), 2808 (2013). [CrossRef]  

9. Z. Jin, Shengtao Mei, Shuqing Chen, et al., “Complex inverse design of meta-optics by segmented hierarchical evolutionary algorithm,” ACS Nano 13(1), 821–829 (2019). [CrossRef]  

10. X. Li, Lianwei Chen, Yang Li, et al., “Multicolor 3D meta-holography by broadband plasmonic modulation,” Sci. Adv. 2(11), e1601102 (2016). [CrossRef]  

11. H. Ren, Xinyuan Fang, Jaehyuck Jang, et al., “Complex-amplitude metasurface-based orbital angular momentum holography in momentum space,” Nat. Nanotechnol. 15(11), 948–955 (2020). [CrossRef]  

12. X. Shui, Huadong Zheng, Xinxing Xia, et al., “Diffraction model-informed neural network for unsupervised layer-based computer-generated holography,” Opt. Express 30(25), 44814–44826 (2022). [CrossRef]  

13. D. Yang, Wontaek Seo, Hyeonseung Yu, et al., “Diffraction-engineered holography: Beyond the depth representation limit of holographic displays,” Nat. Commun. 13(1), 6012 (2022). [CrossRef]  

14. A. Velez-Zea, John Fredy Barrera-Ramírez, Roberto Torroba, et al., “Improved phase hologram generation of multiple 3D objects,” Appl. Opt. 61(11), 3230–3239 (2022). [CrossRef]  

15. H. Gao, Yuxi Wang, Xuhao Fan, et al., “Dynamic 3D meta-holography in visible range with large frame number and high frame rate,” Sci. Adv. 6(28), eaba8595 (2020). [CrossRef]  

16. G. Makey, Özgün Yavuz, Denizhan K. Kesim, et al., “Breaking crosstalk limits to dynamic holography using orthogonality of high-dimensional random vectors,” Nat. Photonics 13(4), 251–256 (2019). [CrossRef]  

17. B. Xiong, Yu Liu, Yihao Xu, et al., “Breaking the limitation of polarization multiplexing in optical metasurfaces with engineered noise,” Science 379(6629), 294–299 (2023). [CrossRef]  

18. P. Yu, Yifan Liu, Ziqiang Wang, et al., “Ultrahigh-density 3D holographic projection by scattering-assisted dynamic holography,” Optica 10(4), 481–490 (2023). [CrossRef]  

19. D. Pi, Juan Liu, Jie Wang, et al., “Optimized computer-generated hologram for enhancing depth cue based on complex amplitude modulation,” Opt. Lett. 47(24), 6377–6380 (2022). [CrossRef]  

20. J. Zhang, Nicolas Pégard, Jingshan Zhong, et al., “3D computer-generated holography by non-convex optimization,” Optica 4(10), 1306–1313 (2017). [CrossRef]  

21. S. So, Joohoon Kim, Trevon Badloe, et al., “Multicolor and 3D Holography Generated by Inverse-Designed Single-Cell Metasurfaces,” Adv. Mater. 35, 2208520 (2023). [CrossRef]  

22. D. Pi, Juan Liu, Yongtian Wang, et al., “Review of computer-generated hologram algorithms for color dynamic holographic three-dimensional display,” Light: Sci. Appl. 11(1), 231 (2022). [CrossRef]  

23. L. Shi, Beichen Li, Wojciech Matusik, et al., “End-to-end learning of 3d phase-only holograms for holographic display,” Light: Sci. Appl. 11(1), 247 (2022). [CrossRef]  

24. R. Horisaki, Ryosuke Takagi, Jun Tanida, et al., “Deep-learning-generated holography,” Appl. Opt. 57(14), 3859–3863 (2018). [CrossRef]  

25. T. Zeng, Yanmin Zhu, Edmund Y. Lam, et al., “Deep learning for digital holography: a review,” Opt. Express 29(24), 40572–40593 (2021). [CrossRef]  

26. J. Wu, Kexuan Liu, Xiaomeng Sui, et al., “High-speed computer-generated holography using an autoencoder-based deep neural network,” Opt. Lett. 46(12), 2908–2911 (2021). [CrossRef]  

27. R. Zhu, Jiafu Wang, Xinmin Fu, et al., “Deep-Learning-Empowered Holographic Metasurface with Simultaneously Customized Phase and Amplitude,” ACS Appl. Mater. Interfaces 14(42), 48303–48310 (2022). [CrossRef]  

28. W. Meng, Baoli Li, Haitao Luan, et al., “Orbital Angular Momentum Neural Communications for 1-to-40 Multicasting with 16-Ary Shift Keying,” ACS Photonics 10(8), 2799–2807 (2023). [CrossRef]  

29. P. Chakravarthula, Ethan Tseng, Tarun Srivastava, et al., “Learned hardware-in-the-loop phase retrieval for holographic near-eye displays,” ACM Trans. Graph. 39(6), 1–18 (2020). [CrossRef]  

30. M. H. Eybposh, “DeepCGH: 3D computer-generated holography using deep learning,” Opt. Express 28(18), 26636–26650 (2020). [CrossRef]  

31. T. Yu, Shijie Zhang, Wei Chen, et al., “Phase dual-resolution networks for a computer-generated hologram,” Opt. Express 30(2), 2378–2389 (2022). [CrossRef]  

32. A. Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, et al., “An image is worth 16 × 16 words: Transformers for image recognition at scale,” arXiv, arXiv:2010.11929 (2020). [CrossRef]  

Supplementary Material (3)

NameDescription
Supplement 1       Supplement 1
Visualization 1       Measured cross-sections of the three-dimensional airplane at different depths
Visualization 2       Measured cross-sections of the three-dimensional chair at different depths

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1. Comparison between uni-directional 3D hologram (the upper section) and omnidirectional 3D hologram (the lower section). A uni-directional 3D hologram can only present the complete view of a 3D object in a pre-designed direction (front view in this case) while in other views the reconstructed 3D object presents as incomplete. An omnidirectional 3D hologram can reconstruct a complete 3D object from all views.
Fig. 2.
Fig. 2. Scheme of the ViT empowered physics-driven deep learning neural network (VPDLNN) omnidirectional 3D hologram generation. A 3D object is firstly sampled and sliced along the propagation direction to form a 3D matrix. Then the 3D matrix is fed into the encoder part which outputs a phase. The phase goes through the Fresnel diffraction propagation operator and generates a set of 2D images at different depths. The reconstructed image set compares with the original 2D image set through the loss function. The loss value is propagated backward and the parameters in the encoder is thus updated.
Fig. 3.
Fig. 3. Generation of the omnidirectional 3D airplane. (a) The simulated 32 cross-sections of the reconstructed 3D omnidirectional airplane. Scale bar, 1.57 mm. (b) The simulated reconstructed 3D omnidirectional airplane under different perspectives.
Fig. 4.
Fig. 4. Optical characterization of VPDLNN generated omnidirectional 3D airplane. (a) Measured cross-sections of the reconstructed omnidirectional 3D airplane at different depths. (b) The scheme of the optical characterization system. The camera is set on an optical rail so that it can be moved along the light’s propagation direction to capture the reconstructed images at different depths.
Fig. 5.
Fig. 5. Generation of an omnidirectional 3D chair. (a) Simulation results of the reconstructed omnidirectional 3D chair from different perspectives. (b) Measured 3D chair under different perspectives. The 3D image data is obtained through stitching the captured slices at different depths from Visualization 2 together via post-processing in 3D Slicer. The images from different perspectives are obtained through rotating the reconstructed 3D image in 3D Slicer to different perspectives.
Fig. 6.
Fig. 6. Performance comparison between VPDLNN, RV, and Global GS. (a) The mean CC values (0.4578, 0.2802 and 0.2485 respectively) and mean RMSE values (0.0207, 0.0224 and 0.0513 respectively) are evaluated for the 1024 × 1024 resolution 3D airplane using VPDLNN, RV + IFTA, and Global GS respectively. (b) The mean SSIM values (0.9305, 0.4735, and 0.2629 respectively) and mean PSNR values (33.8477, 33.3908 and 25.891 dB respectively) are evaluated for the 1024 × 1024 resolution 3D airplane using VPDLNN, RV + IFTA, and Global GS respectively. (c) The mean CC values (0.4806, 0.3691 and 0.2076 respectively) and mean RMSE values (0.0139, 0.016 and 0.0294 respectively) are evaluated for the 1024 × 1024 resolution 3D chair using VPDLNN, RV + IFTA, and Global GS respectively. (d) The mean SSIM values (0.9389, 0.7964, and 0.411 respectively) and mean PSNR values (37.4681, 36.6868 and 30.9591 dB respectively) are evaluated for the 1024 × 1024 resolution 3D chair using VPDLNN, RV + IFTA, and Global GS respectively.

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

Ti(x,y)=F1{F{exp[iΨ(x,y)]}F{hi(x,y)}},
hi(x,y)=ejkzijλziexp[jk2zi(x2+y2)].
Loss(T,O)=(1)×mean(i=1n(TiT¯)(OiO¯)i=1n(TiT¯)2i=1n(OiO¯)2),
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.