Terahertz deep learning fusion computed tomography

Yi-Chun Hung; Yi-Chun Hung; Weng-Tai Su; Ta-Hsuan Chao; Chia-Wen Lin; Shang-Hua Yang; Shang-Hua Yang; Shang-Hua Yang

doi:10.1364/OE.518997

1. Introduction

Terahertz (THz) imaging has been gaining significant attention recently due to its diverse and unique applications, ranging from material exploration, biomedical imaging to cultural heritage inspection [1–3]. The remarkable progress in THz devices and systems over the past decades has led to considerable enhancements in both speed and functionalities of commercially available THz imaging systems, rendering them well-suited for on-site inspection applications in sectors such as semiconductor, automotive, and pharmaceutical industries [4–6]. By adopting imaging modalities from neighboring electromagnetic wave bands, a great variety of THz imaging systems has emerged, encompassing near-field, time-of-flight, phase, synthetic, holographic, compressed sensing, and hybrid THz imaging modalities [7–12]. These advancements have significantly expanded the scope of visualizing internal information of objects, allowing for comprehensive insights into their temporal, spatial, material, and ultrafast dynamic properties.

Among all THz imaging modalities, THz time-domain spectroscopy (THz-TDS) system has been widely used mainly due to the unique capability of extracting multifunctional 3D object information into its time-resolved THz electric field signals [8,13–15]. Many works focused on the temporal profile changes – attenuation, distortion, time-of-flight – of the time-domain THz signals [8,16–18] for non-destructive image reconstruction. When the targeted spatial resolution comes to the sub-wavelength scale, the diffractive and scattering phenomena inevitably spread THz signals to nearby image voxels, resulting in severe image distortions and blurs. Recently, many model-based THz computational imaging methods with the combination of spatio-spectro-temporal THz datasets, physical models (e.g., Fresnel diffraction [10], Drude model [19]), and signal processing methods (e.g., deconvolution [20], denoising [21]) have demonstrated well-restored images with sub-wavelength-scale spatial resolutions. However, the application scope of model-based methods is constrained by the required prior knowledge of tested objects and the complexity of multi-physics models. Furthermore, prerequisite information (such as material properties and object placement) cannot be retrieved in many real-world scenarios, thereby limiting their practical feasibility severely. Other than the model-driven approach, post-processing THz images through data-driven approaches has recently been developed to address corrupted THz imaging issues [22–24]. By incorporating THz point-spread-function (PSF) physical priors with data-driven models for the data augmentation, 2D THz images with super-resolution features have been further achieved [25]. Hung et al. have established the THz deep learning computed tomography (THz DL-CT) framework demonstrating with mm-scale resolution 3D reconstructed images without prerequisite object information [26]. However, the inefficient use of the THz multi-dimensional dataset and without considering the spectral properties of the THz-TDS imaging system significantly undermined the capability of the THz DL-CT framework.

To address this issue, we propose a THz multi-dimensional tomographic framework, which can preprocess the measured THz signals into the THz multi-dimensional data and fuse the THz multi-dimensional data for the following tomographic reconstruction. Furthermore, in the demonstration of the proposed framework, a multi-scale spatio-spectral Unet (MS3-Unet) is presented for fusing the THz multi-dimensional data based on the mutual interaction properties among the 3D objects under observation, broadband THz radiation, and the ambient environment. The MS3-Unet specifically provides a fusion feature to incorporate THz signals in space, time, and frequency domains, extracting detailed object profile changes from the aspects of temporal-spatial energy loss, spatial-spectral amplitude difference, and spatial-spectral beam profile. In contrast to model-based THz tomographic imaging systems, the proposed THz multi-dimensional tomographic framework provides an easy-to-use platform for processing the THz multi-dimensional data from a great variety of THz time-domain and frequency-domain systems. The capability of this framework – including material property extraction, chemical profiling, and ultrafast phenomenon visualization – can also be achieved by incorporating existing physics-informed models. To this extent, this scalable and flexible fusion THz CT approach is well-suitable for non-contact functional 3D imaging in many real-world scenarios.

2. Experimental setup and data preparation

The proposed THz multi-dimensional tomographic framework aims to fuse the multifaceted information in the THz signal without directly processing the large dimension of the input signal (approximately $10^7$). To this end, the THz multi-dimensional tomographic framework comprises four modules for different functions: data measurement, data transform, data fusion, and tomographic reconstruction. More specifically, as shown in Fig. 1, the THz multi-dimensional tomographic framework is composed of (i) one or multiple THz imaging systems designed to extract THz multi-dimensional data from the object under measurement, (ii) a processing unit with the capability to transform the acquired THz multi-dimensional signals into THz multi-dimensional images, (iii) a THz multi-dimensional fusion model, enabling the effective integration of diverse information present in the THz multi-dimensional images, (iv) THz tomographic reconstruction for transforming the restored images from different views to the 3D reconstruction.

Fig. 1. Illustration of the THz multi-dimensional tomographic framework, which is composed by 4 modules – THz imaging systems measuring multi-dimensional signal, preprocessing of THz multi-dimensional signal into images, THz multi-dimensional fusion model, and tomographic reconstruction.

Download Full Size | PDF

To evaluate the THz multi-dimensional tomographic framework, we adopt the THz time-resolved dataset proposed in [26] as shown in Fig. 2. Note that the data of $\texttt{Insidehole} $ object is excluded since the low spatial variation of its cross sections can introduce bias in the dataset and further cause inaccurate evaluation. This dataset is retrieved based on a THz-TDS system, which provides time-resolved THz signal. Additionally, test objects are placed on a motorized stage composed of a two dimensional linear stage (horizontal and vertical directions) and a rotational stage. To process the THz dataset more effectively, the fast Fourier transform is employed for the THz time-resolved dataset. It converts the spatio-spatio-temporal-rotational 4D THz data, along with a temporal axis, into spatio-spatio-spectral-rotational 4D THz data. Furthermore, we perform an additional step of extracting the maximum value along the temporal axis from the spatio-spatio-temporal-rotational data, generating the Time-max images. This approach is chosen due to the high signal-to-noise ratio property of Time-max images and their capability to preserve crucial object details (e.g., Deer antler in Fig. 1).

Fig. 2. Illustration of ground truth of projected side views for the seven 3D-printed HIPS objects in our experiments. The gray scale colors represent the thickness of the object.

Download Full Size | PDF

Considering the distinct characteristics along the spectral dimension of the 4D THz data, we propose a multi-scale spatio-spectral fusion network (MS3-Unet) to efficiently fuse the information in different frequency bands. More specifically, the Time-max images $\boldsymbol {I_t}$ and the multi-spectral images $\boldsymbol {I_f}$ selected from 12 frequencies (i.e., 0.38, 0.448, 0.557, 0.621, 0.916, 0.970, 0.988, 1.097, 1.113, 1.163, 1.208, 1.229 THz) are utilized to estimate sinograms $P(\omega, x, z)$ that delivers the representative value for each projection angle $\omega$, horizontal and vertical location $(x, z)$:

(1)$$\begin{aligned} P(\omega, x, z) = f_{\boldsymbol{\theta}}(\boldsymbol{I_t}(\omega), \boldsymbol{I_f}(\omega)), \end{aligned}$$

where $f_{\boldsymbol {\theta }}$ denotes our MS3-Unet with the learning parameters $\boldsymbol {\theta }$. Here, the 12 frequencies are selected to exploit the high material contrast between ambient water vapour [27] and high-impact polystyrene (HIPS) objects, thereby enhancing the visualization capabilities of object contours. At the selected 12 frequencies, phase and amplitude multi-spectral images can provide complementary information. For example, amplitude images excel in object contour details, while phase images offer superior information about the local curvature and thickness of objects. Followed by the model-based inverse Radon transform [28,29], the 3D tomographic images are then reconstructed from the estimated sinograms. Moreover, opting for the model-based inverse Radon transform in the reconstruction process allows for circumventing the need for learning an extensive set of parameters when converting sinograms into tomographic images. The MS3-Unet architecture and the corresponding fusion approaches, including spatial fusion by hierarchical branches, and spatial-spectral fusion by the filter adaptive convolutional (FAC) layer, will be introduced and discussed in Section 3. When training the MS3-Unet, we adopt the data augmentation on THz multi-dimensional data and ground truth images based on the symmetric properties of projections; for example, a projected image from 0 degree equals the horizontal flip of a projected image from 180 degrees. Additionally, the typical data augmentations of random transformations of vertical and horizontal flipping and cropping to $120 \times 120$ patches are applied to those input images. The model initialization method follows the work in [30]. The Adam optimizer with $\beta _1 = 0.9$ and $\beta _2 = 0.999$ are adopted. The learning rate is initiated as $10^{-4}$ and decayed by $0.1$ in every $300$ epochs.

3. THz deep learning fusion framework

3.1 THz data fusion

As the object information pieces are well-distributed in different THz signal domains, extracting and integrating essential information from the THz multi-dimensional data for image reconstruction is vital. To this end, the specialized fusion deep learning architecture, a multi-scale spatio-spectral fusion network (MS3-Unet), is designed. Based on the Unet [31] backbone, MS3-Unet follows the multi-scale encoder-decoder architecture with skip connections that can help incorporate the THz multi-dimensional data, fuse the distinct features and allocate the multi-scale features in different scale branches. With those model properties, MS3-Unet is utilized to fuse the two distinct THz data, the Time-max and multi-spectral data, which correspond to different-oriented object information. the THz Time-max data directly present the energy loss map through the object, bridging the thickness information. THz multi-spectral data distinctively records light-matter interaction behaviors, which reveals the object’s regional details depending on the diffraction-limited beam sizes and SNR levels at each selected frequency band. MS3-Unet takes the THz Time-max image extracted by the maximum amplitude of the time-resolved signals in each pixel as the finest scale input since the THz Time-max image contains object thickness information with the superior SNR. In the following multi-scale branches, the THz multi-spectral images of the selected frequencies are sequentially feed-forwarded as complementary information since the multi-spectral images can offer better contour contrast and regional details of the object. It should be noted that system SNR and chromatic aberration levels vary among the entire THz spectrum, which causes certain issues of multi-spectral image fusion. More specifically, considering the frequency-dependent interaction between the THz beam and object (e.g., beam profiles, diffraction, scattering), the blurring artifacts vary at different spatial locations and frequency bands. If the fusion of the THz multi-spectral images is only conducted by several commonly-used convolutional blocks, the spatial-frequency dependent blurring artifacts cannot be well handled due to the weight-sharing nature of the convolution operation. To address these issues, the THz multi-spectral images are first processed with two $3 \times 3$ convolution layers to reduce the noise caused by THz power fluctuation, such as salt-and-pepper noise. Followed by the filter adaptive convolutional (FAC) layer, the spatially changing kernel can mitigate the space-variant blurring effect and improve the efficacy of fusion between branches. After the information fusion, the commonly-used multi-scale decoding approach with skip connection [32] is adopted for the THz image restoration.

3.2 Network architecture

MS3-Unet, the encoder-decoder architecture, hierarchically utilizes five scale branches to fuse the THz Time-max and multi-spectral images as shown in Fig. 3. The four coarser scale branches consisting of convolution layers and the FAC layer aim to extract the complementary multi-scale spatio-spectral features to fuse with the THz Time-max image. The shallow branches focus on finer scale texture and/or contrast features, and the deep branches provide high-level context features (e.g., $\texttt{Deer} $ body).

Fig. 3. Network architecture of the MS3-Unet demonstrated in the proposed THz multi-dimensional tomographic framework. The THz multi-spectral images from 12 frequencies is grouped into 4 piles from low to high frequency (3 images each pile) and forwarded to input before the FAC layers; the input in the middle row of the MS3-Unet illustration is composed of the Time-max image.

Download Full Size | PDF

In the encoder, by hierarchically merging the finer scale features into the coarser scale features (i.e., blue arrows), the extracted information can be efficiently shared between the two branches [31]. However, the pixel-level characteristics of the shared features and information from the upper scale branch can be heterogeneous to the deeper scale branch, thus leading to inferior learning efficacy in the deeper branches. To address this limitation, two convolution layers and the FAC (orange blocks) are implemented before the feature merging to mitigate the heterogeneity between the multi-scaled features. The two convolution layers are designed as simple filters to reduce the additive noise from the THz detector. The following FAC is designed to extract the distinct features within different spatial regions using the spatially changing learnable kernels. Worth noting that FAC can also mitigate different levels of the blurring effect originating from the diffractive/scattering-related interaction between the frequency-dependent THz beam and the tested objects.

In the decoder, the coarser features are transformed by two $3 \times 3$ convolutional layers. Followed by the $1 \times 1$ convolution layer (red arrows), the transformed features are further decoded into the finer scale features. Additionally, to reduce the information loss during the coarse-to-fine feature transformation, the multi-scale features are concatenated to the decoded features through skip connections (gray arrows). By following the rule of decode-fuse-decode, MS3-Unet can well utilize the learned coarse/fine features for the image restoration task.

3.3 Spatio-spectral fusion by FAC layer

The filter adaptive convolutional (FAC) layer is designed to address feature fusion and spatially-variant blurring issues: (i) the pixel-level heterogeneity between the finer features from the upper branch and the coarser features from the deeper branch. (ii) variant blurring effects caused by the non-collimated, diffractive THz Gaussian beam and the rotational scanning approach in THz CT. Both issues are critical for the THz image restoration task since inefficient feature fusion and learning can lead to inferior restoration performance. By applying the spatially changing learnable kernels instead of the space-invariant kernel (e.g., the commonly-used convolutional layer), the FAC layer uses corresponding kernel sizes to extract the features reducing the unfavorable impacts from the heterogeneity and the blurring issues. More specifically, the learnable filter map $G$ ($H \times W \times ck^2$) is maintained, and each pixel ($1 \times 1 \times ck^2)$ represents the reshaped kernel for one spatial location in down-sampling feature maps $F$. By this approach, we can formulate the extracted adaptive feature maps $\hat {F}$ as

(2)$$\begin{aligned} \hat{G}(x, y, c_i) &= G_{(x, y, c_i)} \ast F_{(x, y, c_i)}\\ &=\sum_{n={-}r}^{r}\sum_{m={-}r}^{r}G(x, y, c_i k^2+kn+m) \times F(x-n, y-m,c_i), \end{aligned}$$

where $r=(k-1)/2$, $k$ is the pre-configured hyperparameter of the kernel size, and $\ast$ denotes the convolution operator.

4. Experiments and analysis

4.1 Evaluation of data fusion

To evaluate the efficacy of the fusion approach in MS3-Unet, several deep learning models with similar architecture but different fusion approaches are selected: (1) baseline Unet (Base-Unet); (2) MS3-Unet$_4$; and (3) MS3-Unet$_{12\textit {rand}}$. Base-Unet, using the THz Time-max data without any THz multi-spectral images, represents the approach of not fusing any spatio-spectral information. MS3-Unet$_4$ is designed for taking a single THz spectral image for each scale (i.e., respective 0.38, 0.448, 0.557, 0.621 THz for each branch). MS3-Unet$_4$ can be considered as fusing Time-max data with limited spatio-spectral information. Compared with the frequency-ordering fusion approach in MS3-Unet, MS3-Unet$_{12\textit {rand}}$ stochastically selects (without replacement) 3 bands for each scale branch. Unet$_{12\textit {rand}}$ is designed to evaluate the utility of the frequency-order fusion approach in MS3-Unet. Additionally, all the compared methods and MS3-Unet are trained from scratch with the Adam optimizer and adopt the mean-square-error function as loss (Supplement 1).

For the qualitative and quantitative assessment, Deer and DNA objects are selected due to their different geometric characteristics. DNA object is selected due to the double-helix shape, which can help to evaluate the model performance on the rotational entanglement structure. The square-shaped opening within the double helix can provide the assessment of the models on highly transparent objects. Deer object is utilized to evaluate the model performance of the image restoration on the spatially variant objects. For example, the antler and body of the Deer represent the high and low spatial frequency, respectively. Additionally, to assess the restored THz images, two widely-used metrics, including peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), are used to evaluate the pixel-level and structure-level similarities, and their mathematical formula is defined and discussed in Supplement 1.

As shown in Fig. 4(a), the double helix shape of the DNA object is recognized in all the four variant models, indicating that the deep learning model structure of Unet can well address the entangling structure (e.g., double helix) of the object under test. Compared with MS3-Unet${_4}$ and MS3-Unet$_{12\textit {rand}}$, MS3-Unet shows the superior reconstruction results in the vacant regions between two double helix and delivers a better contrast between the crossing of double helix and a single helix. Furthermore, comparing with MS3-Unet${_4}$, MS3-Unet demonstrates that the measured data from different frequencies can be complementary to each other and help to reconstruct more details; comparing with MS3-Unet$_{12\textit {rand}}$, MS3-Unet shows that the frequency-ordering fusion approach is critical for reconstruction quality. Similarly, in the comparison of the cross sections as shown in Fig. 4(b), MS3-Unet exhibits better-reconstructed image quality on both edge and helical regimes, demonstrating that the frequency-ordering fusion of measured data from different frequencies can also improve the 3D reconstruction. In the comparison of the projected signal as shown in Fig. 4(c), MS3-Unet reconstructs sharper edges of the object, which can be observed in rising and falling edges and the calculated root mean square error (RMSE). Additionally, by comparing the region from 5 $mm$ to 11 $mm$ in space, Base-Unet shows an inferior contrast to the other variant models. This is because the FAC layer adopted in the other models can address the blurring issues and improve the image contrast and resolution. A similar experimental result has been presented in the comparison of the Deer object, as included in Supplemental 1. The quantitative results shown in Table 1 also indicate that MS3-Unet reconstructs superior tomographic images than other variant models in terms of both PSNR and SSIM.

Fig. 4. The quantitative comparison of DNA object reconstruction from Base-Unet, MS3-Unet$_4$, MS3-Unet$_{12\textit {rand}}$ and MS3-Unet models. (a) Projected side view comparison. (b) Cross-section comparison of the red sliced region in (a). (c) Projected signal comparison of the red sliced region in (a). The root mean square error (RMSE) is computed between the ground truth and the indicated model.

Download Full Size | PDF

Table 1. Quantitative comparison (PSNR and SSIM) of THz image restoration performances for Deer and DNA with different settings. $\uparrow$: higher is better; $\downarrow$: lower is better

View Table | View all tables in this article

4.2 Comparison with existing deep learning models

To evaluate the performance of MS3-Unet, we select two representative deep learning image restoration models for comparison, including DnCNN [33] and RED [34]. Two variants of MS3-Unet models are also included (i.e., Base-Unet, and MS-Unet) as the baseline models. Additionally, THz DLCT [26] and THz Dense [35] are also selected due to their specialized model architecture designs for THz imaging applications. DnCNN and RED are designed based on a fully convolutional framework; THz DLCT and THz Dense are designed based on the VGG16 and the dense residual network, respectively. DnCNN utilizes 20 convolutional blocks to learn the noise information as the residual image to restore the corrupted images; RED is proposed based on the convolutional encoder-decoder architecture with a few symmetric skip connections. THz DLCT used multiple layers of $1 \times 3$ convolutional kernels to extract features from THz time-resolved signals. THz Dense used the residual connection to address the training stability. MS-Unet adopts the same model architecture of MS3-Unet, but only incorporates the THz multi-spectral and Time-max images (i.e., 12 + 1 channels) within the finest scale branch; Base-Unet remains the same configuration as described in Sec. 4.1. Here, the DNA and Deer objects are also chosen for the qualitative comparison due to their geometric characteristics.

The quantitative results in Table 2 demonstrate that MS3-Unet delivers the superior THz image restoration efficacy in all seven objects. It implies that the three design features of MS3-Unet can improve the THz restoration efficacy. First, the THz multi-spectral images featured with more accurate object contour can complement the conventional THz time-max images for the THz restoration task. Second, the multi-scale branches of MS3-Unet can more effectively extract and fuse the thickness and contour information in the THz multi-spectral images according to the diffraction limit. Third, the spatio-spectral fusion by the FAC layer can address (i) the blurring effects caused by the non-collimated and diffractive THz beam and (ii) the pixel-level heterogeneity while fusing features from different scales of branches. Adopting these three designs, MS3-Unet outperforms the conventional method of time-max by at least 8.86dB and 0.63 in PSNR and SSIM, respectively. The visualization of tomographic reconstruction is also provided in Supplement 1.

Table 2. Quantitative comparison (PSNR and SSIM) of THz image restoration performances for Deer, DNA, Box, Eevee, Polarbear, Robot, and Skull with different restoration methods. $\uparrow$: higher is better; $\downarrow$: lower is better

View Table | View all tables in this article

In the qualitative comparison, we selected the models with higher SSIM values as shown in Fig. 5. The comparison among MS3-Unet and those models delivering lower SSIM values is shown in Supplement 1. Compared with DnCNN and RED, three Unet-variant models provide clearer contours in both high and low spatial frequency regions (e.g., antler and body) since the multi-scale structure of Unet can more effectively incorporate the features from different scales. Among Unet-variant models, MS-Unet and MS3-Unet demonstrate more accurate spatial resolution, such as the vacant regions in the DNA object. This is because MS-Unet and MS3-Unet can more effectively extract the object thickness information by incorporating the spatio-spectral information (i.e., FAC layers) in THz multi-frequency images. Additionally, compared to MS-Unet, MS3-Unet performs superior image contrast in the high absorption region (i.e., the crossing of the double helix) by reusing the rich thickness information of the THz Time-max image within every scale of the branch.

Fig. 5. Qualitative comparison of THz projected side views for Deer and DNA: (a) Time-max, (b) DnCNN [33], (c) RED [34], (d) Base-Unet [31], (e) MS-Unet, (f) MS3-Unet, and (g) the ground-truth.

Download Full Size | PDF

5. Conclusions

We propose a THz multi-dimensional tomographic framework, which is capable of fusing and collaborating the distinct features from various signal domains, such as the temporal and frequency domains. To evaluate the potential of the proposed framework, a MS3-Unet is presented as a fusion network in the framework. Considering the spatio-spectral correlation in the THz multi-dimensional data, the MS3-Unet implements two fusion modules: (i) the hierarchical fusion of the THz spectral images, and (ii) the spatio-spectral fusion by the FAC layers.

The hierarchical fusion module merges the finer scale features extracted from low-frequency THz spectral images into the coarser scale features. This merging approach can mitigate the learning inefficiency in directly processing the distinct features in different dimensions of THz multi-dimensional data (comparing Base-Unet and MS3-Unet in Table 1). On the other hand, the FAC layers are composed of the spatially changing learnable kernels to mitigate the heterogeneity and the blurring issue caused by the frequency-dependent THz noise level and THz beam profile, respectively. Compared to the other models without any modules specialized in fusing THz signals (Table 2), the MS3-Unet demonstrates the high potential of the proposed framework with the capability of fusing the distinct features from the different THz signal domains. In addition to the potential of fusion, the high scalability and adjustability in the proposed framework enable the users to select the processing signal domains and/or to plug and play their own fusion network. For example, users can implement their transformation of THz raw signals and concatenate the designed THz signal domain into the THz multi-dimensional data for further fusion. Users can also modify the state-of-the-art deep learning network from the image processing field as the backbone model to improve the fusion capability. Additionally, the prior knowledge of THz-material-object interaction can collaborate with learning-based networks [36]. For example, if the absorption coefficients of the test objects are given, the detected THz power in pixels can be converted to depth information and guides the learning-based network to learn more efficiently.

Funding

National Science and Technology Council (NSTC 112-2221-E-007-089-MY3).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. H. Guerboukha, K. Nallappan, and M. Skorobogatiy, “Toward real-time terahertz imaging,” Adv. Opt. Photonics 10(4), 843–938 (2018). [CrossRef]

2. D. M. Mittleman, “Twenty years of terahertz imaging,” Opt. Express 26(8), 9417–9431 (2018). [CrossRef]

3. K. Fukunaga, Thz technology applied to cultural heritage in practice (Springer, 2016).

4. K. Kawase, “Terahertz imaging for drug detection and large-scale integrated circuit inspection,” Opt. Photonics News 15(10), 34–39 (2004). [CrossRef]

5. Y.-C. Shen and P. F. Taday, “Development and application of terahertz pulsed imaging for nondestructive inspection of pharmaceutical tablet,” IEEE J. Sel. Top. Quantum Electron. 14(2), 407–415 (2008). [CrossRef]

6. F. Ellrich, M. Bauer, N. Schreiner, et al., “Terahertz quality inspection for automotive and aviation industries,” J. Infrared, Millimeter, Terahertz Waves 41(4), 470–489 (2020). [CrossRef]

7. K. Serita, S. Mizuno, H. Murakami, et al., “Scanning laser terahertz near-field imaging system,” Opt. Express 20(12), 12959–12965 (2012). [CrossRef]

8. H. Zhong, J. Xu, X. Xie, et al., “Nondestructive defect identification with terahertz time-of-flight tomography,” IEEE Sens. J. 5(2), 203–208 (2005). [CrossRef]

9. N. V. Petrov, M. S. Kulya, A. N. Tsypkin, et al., “Application of terahertz pulse time-domain holography for phase imaging,” IEEE Trans. Terahertz Sci. Technol. 6(3), 464–472 (2016). [CrossRef]

10. M. S. Heimbeck and H. O. Everitt, “Terahertz digital holographic imaging,” Adv. Opt. Photonics 12(1), 1–59 (2020). [CrossRef]

11. K. McClatchey, M. T. Reiten, and R. A. Cheville, “Time resolved synthetic aperture terahertz impulse imaging,” Appl. Phys. Lett. 79(27), 4485–4487 (2001). [CrossRef]

12. R. Stantchev, X. Yu, T. Blu, et al., “Real-time terahertz imaging with a single-pixel detector,” Nat. Commun. 11(1), 2535 (2020). [CrossRef]

13. Y.-C. Shen, P. F. Taday, D. A. Newnham, et al., “3D chemical mapping using terahertz pulsed imaging,” in Terahertz and gigahertz electronics and photonics IV, vol. 5727 (International Society for Optics and Photonics, 2005), pp. 24–31.

14. R. Ulbricht, E. Hendry, J. Shan, et al., “Carrier dynamics in semiconductors studied with time-resolved terahertz spectroscopy,” Rev. Mod. Phys. 83(2), 543–586 (2011). [CrossRef]

15. W. J. Padilla, A. J. Taylor, C. Highstrete, et al., “Dynamical electric and magnetic metamaterial response at terahertz frequencies,” Phys. Rev. Lett. 96(10), 107401 (2006). [CrossRef]

16. L. Duvillaret, F. Garet, and J.-L. Coutaz, “Highly precise determination of optical constants and sample thickness in terahertz time-domain spectroscopy,” Appl. Opt. 38(2), 409–415 (1999). [CrossRef]

17. J. Takayanagi, H. Jinno, S. Ichino, et al., “High-resolution time-of-flight terahertz tomography using a femtosecond fiber laser,” Opt. Express 17(9), 7533–7539 (2009). [CrossRef]

18. B.-Y. Wu and S.-H. Yang, “Sub-millimeter spatial resolution terahertz computed tomography system based on differential pulse delay method,” in 2019 44th International Conference on Infrared, Millimeter, and Terahertz Waves (IRMMW-THz), (IEEE, 2019), pp. 1–2.

19. S. Prabhu, S. Ralph, M. Melloch, et al., “Carrier dynamics of low-temperature-grown gaas observed via thz spectroscopy,” Appl. Phys. Lett. 70(18), 2419–2421 (1997). [CrossRef]

20. G. C. Walker, B. J. W. J. Labaune, J.-B. Jackson, et al., “Terahertz deconvolution,” Opt. Express 20(25), 27230–27241 (2012). [CrossRef]

21. X. Shen, C. R. Dietlein, E. Grossman, et al., “Detection and segmentation of concealed objects in terahertz images,” IEEE Trans. on Image Process. 17(12), 2465–2475 (2008). [CrossRef]

22. M. Ljubenovic, S. Bazrafkan, J. D. Beenhouwer, et al., “Cnn-based deblurring of terahertz images,” (2020), pp. 323–330.

23. B. Dutta, K. Root, I. Ullmann, et al., “Deep learning for terahertz image denoising in nondestructive historical document analysis,” Sci. Rep. 12(1), 22554 (2022). [CrossRef]

24. X. Yang, D. Zhang, Z. Wang, et al., “Super-resolution reconstruction of terahertz images based on a deep-learning network with a residual channel attention mechanism,” Appl. Opt. 61(12), 3363–3370 (2022). [CrossRef]

25. Y. Li, W. Hu, X. Zhang, et al., “Adaptive terahertz image super-resolution with adjustable convolutional neural network,” Opt. Express 28(15), 22200–22217 (2020). [CrossRef]

26. Y.-C. Hung, T.-H. Chao, P. Yu, et al., “Terahertz spatio-temporal deep learning computed tomography,” Opt. Express 30(13), 22523–22537 (2022). [CrossRef]

27. D. M. Slocum, E. J. Slingerland, R. H. Giles, et al., “Atmospheric absorption of terahertz radiation and water vapor continuum effects,” J. Quant. Spectrosc. Radiat. Transf. 127, 49–63 (2013). [CrossRef]

28. A. C. Kak, “Algorithms for reconstruction with nondiffracting sources,” Principles of computerized tomographic imaging pp. 49–112 (2001).

29. B. Recur, A. Younus, S. Salort, et al., “Investigation on reconstruction methods applied to 3d terahertz computed tomography,” Opt. Express 19(6), 5105–5117 (2011). [CrossRef]

30. K. He, X. Zhang, S. Ren, et al., “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., (2015), pp. 1026–1034.

31. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Medical Image Comput. Computer-Assisted Intervention, (2015), pp. 234–241.

32. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.

33. K. Zhang, W. Zuo, Y. Chen, et al., “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

34. X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” in Proc. Adv. Neural Inf. Process. Syst., (2016), p. 2802–2810.

35. Z. Hou, X. Cha, H. An, et al., “Super-resolution reconstruction of terahertz images based on residual generative adversarial network with enhanced attention,” Entropy 25(3), 440 (2023). [CrossRef]

36. S. Cuomo, V. S. Di Cola, F. Giampaolo, et al., “Scientific machine learning through physics–informed neural networks: Where we are and what’s next,” J. Sci. Comput. 92(3), 88 (2022). [CrossRef]

Method	PSNR $↑$		SSIM $↑$
Method	Deer	DNA	Deer	DNA
Base-Unet [31]	19.84	24.15	0.55	0.78
MS3-Unet $_{4}$	21.13	25.53	0.82	0.85
MS3-Unet $_{12 rand}$	21.83	25.76	0.83	0.82
MS3-Unet	22.02	26.32	0.83	0.85

Method	PSNR $↑$
Method	Deer	DNA	Box	Eevee	Polarbear	Robot	Skull
Time-max	12.42	12.07	11.97	11.20	11.21	11.37	10.69
THz DLCT [26]	18.02	17.58	21.60	18.53	15.02	18.85	18.92
THz Dense [35]	19.99	23.01	19.52	18.50	19.59	21.08	19.42
DnCNN [33]	19.94	23.95	19.13	19.69	19.44	19.72	17.33
RED [34]	19.30	24.17	20.18	19.97	19.17	19.76	16.28
Base-Unet [31]	19.84	24.15	19.77	19.95	19.09	18.80	17.49
MS-Unet	21.37	24.72	20.93	20.74	20.32	19.73	19.43
MS3-Unet (Ours)	22.02	26.32	21.06	20.89	20.45	21.16	19.55
Method	SSIM $↑$
Method	Deer	DNA	Box	Eevee	Polarbear	Robot	Skull
Time-max	0.05	0.05	0.14	0.14	0.12	0.08	0.09
THz DLCT [26]	0.24	0.30	0.49	0.41	0.35	0.30	0.36
THz Dense [35]	0.69	0.72	0.73	0.54	0.71	0.65	0.45
DnCNN [33]	0.73	0.77	0.73	0.72	0.63	0.77	0.36
RED [34]	0.81	0.83	0.74	0.77	0.75	0.80	0.74
Base-Unet [31]	0.55	0.78	0.77	0.76	0.56	0.76	0.51
MS-Unet	0.71	0.80	0.79	0.65	0.59	0.72	0.63
MS3-Unet (Ours)	0.83	0.85	0.80	0.77	0.77	0.82	0.75

Method	PSNR $↑$		SSIM $↑$
Method	Deer	DNA	Deer	DNA
Base-Unet [31]	19.84	24.15	0.55	0.78
MS3-Unet $_{4}$	21.13	25.53	0.82	0.85
MS3-Unet $_{12 rand}$	21.83	25.76	0.83	0.82
MS3-Unet	22.02	26.32	0.83	0.85

Method	PSNR $↑$
Method	Deer	DNA	Box	Eevee	Polarbear	Robot	Skull
Time-max	12.42	12.07	11.97	11.20	11.21	11.37	10.69
THz DLCT [26]	18.02	17.58	21.60	18.53	15.02	18.85	18.92
THz Dense [35]	19.99	23.01	19.52	18.50	19.59	21.08	19.42
DnCNN [33]	19.94	23.95	19.13	19.69	19.44	19.72	17.33
RED [34]	19.30	24.17	20.18	19.97	19.17	19.76	16.28
Base-Unet [31]	19.84	24.15	19.77	19.95	19.09	18.80	17.49
MS-Unet	21.37	24.72	20.93	20.74	20.32	19.73	19.43
MS3-Unet (Ours)	22.02	26.32	21.06	20.89	20.45	21.16	19.55
Method	SSIM $↑$
Method	Deer	DNA	Box	Eevee	Polarbear	Robot	Skull
Time-max	0.05	0.05	0.14	0.14	0.12	0.08	0.09
THz DLCT [26]	0.24	0.30	0.49	0.41	0.35	0.30	0.36
THz Dense [35]	0.69	0.72	0.73	0.54	0.71	0.65	0.45
DnCNN [33]	0.73	0.77	0.73	0.72	0.63	0.77	0.36
RED [34]	0.81	0.83	0.74	0.77	0.75	0.80	0.74
Base-Unet [31]	0.55	0.78	0.77	0.76	0.56	0.76	0.51
MS-Unet	0.71	0.80	0.79	0.65	0.59	0.72	0.63
MS3-Unet (Ours)	0.83	0.85	0.80	0.77	0.77	0.82	0.75

Terahertz deep learning fusion computed tomography

Abstract

1. Introduction

2. Experimental setup and data preparation

3. THz deep learning fusion framework

3.1 THz data fusion

3.2 Network architecture

3.3 Spatio-spectral fusion by FAC layer

4. Experiments and analysis

4.1 Evaluation of data fusion

4.2 Comparison with existing deep learning models

5. Conclusions

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (5)

Tables (2)

Equations (2)

Optics Express