Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Comparison of denoising tools for the reconstruction of nonlinear multimodal images

Open Access Open Access

Abstract

Biophotonic multimodal imaging techniques provide deep insights into biological samples such as cells or tissues. However, the measurement time increases dramatically when high-resolution multimodal images (MM) are required. To address this challenge, mathematical methods can be used to shorten the acquisition time for such high-quality images. In this research, we compared standard methods, e.g., the median filter method and the phase retrieval method via the Gerchberg-Saxton algorithm with artificial intelligence (AI) based methods using MM images of head and neck tissues. The AI methods include two approaches: the first one is a transfer learning-based technique that uses the pre-trained network DnCNN. The second approach is the training of networks using augmented head and neck MM images. In this manner, we compared the Noise2Noise network, the MIRNet network, and our deep learning network namely incSRCNN, which is derived from the super-resolution convolutional neural network and inspired by the inception network. These methods reconstruct improved images using measured low-quality (LQ) images, which were measured in approximately 2 seconds. The evaluation was performed on artificial LQ images generated by degrading high-quality (HQ) images measured in 8 seconds using Poisson noise. The results showed the potential of using deep learning on these multimodal images to improve the data quality and reduce the acquisition time. Our proposed network has the advantage of having a simple architecture compared with similar-performing but highly parametrized networks DnCNN, MIRNet, and Noise2Noise.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Medical imaging is an important and active area of research with the potential to significantly improve disease diagnosis and patient treatment. For decades, medical imaging modalities, e.g., X-ray [1], ultrasound imaging [24], and computerized tomography (CT) [1,5], have served as important tools to assist physicians in making their diagnostic decisions. Although several new especially optical imaging technologies have been developed in the last decades, their adoption in healthcare systems is still minimal. Nonlinear optical techniques, e.g., coherent anti-Stokes Raman scattering (CARS) [6], two-photon excited fluorescence (TPEF) [7], and second-harmonic generation (SHG) [8], and linear optics techniques, e.g., fluorescence lifetime imaging (FLIM) [9] are capable of measuring detailed information about the chemical composition and morphology of tissue sections with high spatial resolution and in a non-altering manner. In particular, the simultaneous combination of two or more of these optical spectroscopic methods, called multimodal imaging (MM), allows for maximizing the obtained chemical and morphological information of the measured tissues [1015]. For instance, Vogler et al. [8] presented a microscopic experiment that combines three nonlinear optical techniques; CARS, TPEF, and SHG, and shows how different kinds of molecules and different contrast mechanisms can be obtained in one image measurement. In detail, CARS measurements explore the molecular distribution like proteins and lipids, SHG measurements highlight collagen fiber distribution in the sample and TPEF measurements identify specific molecules like keratin and NADP(H). The combination of these three modalities is considered a label-free and non-destructive approach that is very useful for in vivo studies [16]. The multimodal imaging approach provides high-quality (HQ) images, but the acquisition of such high-quality images requires a relatively long acquisition process in comparison with low-quality images because photon shot noise is the prominent noise source in nonlinear imaging techniques. Mechanical methods such as using a faster motor [17] or time-stretching techniques [18] have shown great promise in improving the speed and performance of multimodal imaging systems, however, these methods have certain limitations. For instance, although using a faster motor can reduce scan times it may generate more heat which can potentially degrade the quality of the image. On the other hand, time-stretching techniques can increase the time resolution of imaging systems, but they may also introduce noise and distortions to the image. Additionally, the imaging system in this study uses laser scanning and only shifts the sample when jumping from tile to tile, so the measurement time is limited by the detector, not the scanning speed. Hence, the faster MM imaging required for real-time monitoring leads to an increase in the noise level of the images, which degrades their quality and affects the identification of tissues or their associated diseases, or abnormalities.

In addition to experimentally acquiring HQ images by increasing the acquisition time, image denoising is a fundamental preprocessing technique that can remove noise from images but may result in the loss of relevant information [1921]. Consequently, the trade-off between fast imaging and a suitable denoising method needs to be balanced and optimized for an effective diagnostic imaging tool. The denoising algorithms vary from basic digital image filters to iterative reconstruction techniques. Therefore, choosing a suitable denoising method is not simple, and the restored images should maintain the following properties [20]. First, the details and edges that are critical to detect malignant tissue should be preserved. This means that the denoising algorithms should not produce artifacts and the recovered images should be similar to the original image. In addition, the algorithm should be computationally efficient and have low complexity, which is a prerequisite in medical applications that require immediate results. Finally, the denoising algorithms should not depend on vast amounts of data, which is not practical or readily accessible in medical imaging.

Apart from the standard image denoising methods, deep learning featured a high potential for denoising and showed outstanding performance, especially in the processing of natural images and various medical imaging techniques, e.g., ultrasound imaging [24], CT scan [1,5], fluorescence microscopy [22], and CARS endoscopy [6]. Therefore, we evaluated deep learning methods on the multimodal images that comprise CARS, TPEF, and SHG modalities and compared them with the following standard techniques; the median filter (MF) method and the phase retrieval method via Gerchberg-Saxton (GS) [2327]. An example of a MM image is visualized in Fig. 1, where the CARS, TPEF, and SHG modalities are represented as the red, green, and blue channels, respectively. In this manuscript, we used two deep learning approaches. The first approach is a transfer learning-based method [28] in which we used the pre-trained network, namely DnCNN [29] directly to reconstruct the improved images. The second approach is to train a network using augmented neck and tissue MM images. In this context, we used two well-known architectures; the Noise2Noise (N2N) [30] and the MIRNet [31,32] architectures, in addition to our deep learning network that we referred to as incSRCNN. The incSRCNN network consists of a simple architecture derived from the super-resolution convolution neural network (SRCNN) [33,34] with a small trick in the first layer that was inspired by the inception network [35]. In this manuscript, we briefly explain all the methods at the beginning and then describe the data and workflow. We then discuss the reconstruction of synthetic and experimental low-quality images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks. Afterward, a generalizability section is presented with two different analyses. Finally, we summarize our results in the conclusion section.

 figure: Fig. 1.

Fig. 1. An example of a multimodal image consisting of the three modalities CARS of the CH2 stretching vibration at 2850 cm-1, TPEF, and SHG, as the red, green, and blue channels, respectively, is given.

Download Full Size | PDF

2. Method

2.1 Direct methods: median filter, GS algorithm, and pre-trained DnCNN network

This section briefly explains the implemented methods, grouped into a description of the classical methods and deep learning methods. First, the median filter with a 3 × 3 kernel size is used by computing the median value of the input image under the kernel window [36]. Then, the phase retrieval problem is implemented since it is applied to many phase-based denoising problems [3740]. Several well-known phase retrieval algorithms exist, e.g., hybrid input-output (HIO) and Gerchberg-Saxton (GS). We focused on applying GS [23,25] to the MM images since most of the other error reduction-based techniques represent a derived version of the GS algorithm. Briefly, GS is the recovery of the phase using the measured image and the source object. It is considered an error-reduction algorithm that iteratively calculates the error until it converges. The GS algorithm is shown in Fig. 2, and it is applied independently on each channel where the phase and the modified amplitudes are determined iteratively, enabling image reconstruction. Its input represents both the amplitudes of the sampled image $\sqrt x $ and a Gaussian estimation of the diffraction plane intensity $X$. First, an initial phase ${\varphi _0}$ in the object plane is used by generating randomly uniform numbers between −π and π. At iteration $k$, the initial field in the object plane is calculated using Eq. (1).

$${z_k} = \sqrt x \exp ({i{\varphi_{k - 1}}} )$$

The phase distribution in the target plane ${\phi _k}$ is then calculated via the fast Fourier transform (FFT), as shown in Eq. (2).

$${\phi _k} = \arg ({FFT({{z_k}} )} )$$

Equation (3) combines the phase distribution in the target plane with the target intensity $\sqrt X $ and finally, the phase in the object plane ${\varphi _k}$ is recovered by using Eq. (4).

$${A_k} = \sqrt X \exp ({i{\phi_k}} )$$
$${\varphi _k} = \arg ({FFT({{A_k}} )} ).$$

 figure: Fig. 2.

Fig. 2. The workflow of the GS algorithm. First, the LQ image with a random phase was fed to the algorithm, and after k iteration, the high-quality image was constructed. The GS algorithm depends on an estimation of the source object, which is unknown, and therefore Gaussian estimation was used.

Download Full Size | PDF

Apart from the classical methods used in image denoising, artificial intelligence (AI) based methods have widely been used for restoring images, especially in computer vision and medical imaging, e.g., X-ray, CT imaging, and ultrasound scans. In artificial intelligence, high-quality images, which represent improved images in terms of signal-to-noise ratio (SNR), can be acquired by transfer learning or directly constructing deep learning techniques. Transfer learning [28] consists of using knowledge obtained from one task and transferring it to another related task. The direct deep learning method, on the other hand, trains a neural network with specific architecture using the available data, optimizing the parameter during training. In this manuscript, we evaluated both approaches on the MM images.

First, the pre-trained neural network, the denoising convolutional neural network (DnCNN), was used as a transfer learning tool. DnCNN was trained on natural images to correct noise and artifacts in corrupted images [29]. Briefly, DnCNN is a pre-trained network that outputs the residual image, i.e., the difference between the noisy observation and the latent clean image, instead of predicting the denoised image. The architecture of this network is an adapted version of the VGG network [41] that is suitable for the image denoising task. Formally, the averaged mean squared error calculated in Eq. (5) between the desired residual images and estimated ones from noisy input

$$l(\theta )= \frac{1}{{2N}}\sum\limits_{i = 1}^N {\|{\Re ({{y_i};\theta } )- ({{y_i} - {x_i}} )} \|_F^2} ,$$
can be adopted as the loss function to learn the trainable parameters in DnCNN. $\Re (y )$ represents the residual mapping and $\{{({{y_i},{x_i}} )} \}_{i = 1}^N$ represents $N$ noisy-clean training image patch pairs. In a nutshell, the DnCNN model has two main features: the residual learning formulation is adopted to learn $\Re (y )$, and batch normalization is incorporated to speed up training and boost the denoising performance.

2.2 Trained networks: incSRCNN, N2N, and MIRNet

Then, we constructed and trained a simple network which is a modified version of the super-resolution convolutional neural network (SRCNN) [33,34], namely incSRCNN. Our architecture is inspired by both the inception and the SRCNN networks, therefore we call it incSRCNN. The architecture of this network is shown in Fig. 3. Like the SRCNN, the proposed network consists of three layers; however, it is implemented as a denoising task that outputs the same input size. The input image is convolved in the first layer with three different kernel sizes 3, 5, and 9 into 192 feature maps. The second layer then applies a 1×1 kernel to condense to 64 feature maps. Finally, the third layer uses a 3×3 kernel to construct the output image. All layers involve the ReLu activation function. We used the mean absolute error as a loss function between the original HQ image and the output from the trained networks and the weights in the network layers are updated using the Adam optimizer with a learning rate equal to 3e-4.

 figure: Fig. 3.

Fig. 3. The transfer learning-based approach via DnCNN and the trained deep learning networks via Noise2Noise, MIRNet, and our proposed deep learning networks (incSRCNN). On top of the figure, we used a pre-trained network, DnCNN, to predict MM images with higher quality. Moreover, the Noise2Noise and The MIRNet networks are trained using augmented neck and head tissue images. The architecture of our proposed network, incSRCNN, is shown at the bottom. This network represents a modified version of the SRCNN and is inspired by the inception network. Initially, the first layer convolves the input image with different kernel sizes into 192 feature maps. The second layer then applies a 1 × 1 kernel to condense to 64 feature maps. Finally, the third layer uses a 3 × 3 kernel to construct the output image.

Download Full Size | PDF

Afterward, we aimed to compare our simple architecture with more complex ones. Therefore, we chose well-known networks: the Noise2Noise (N2N) and the MIRNet architecture which are usually implemented for denoising tasks. Briefly, N2N and MIRNet architectures consist of deep convolutional neural networks (CNN) layers. The N2N network learns to remove noise from a noisy image by training on pairs of noisy images, effectively learning to denoise without ever seeing a clean image. On the other hand, the MIRNet network uses a multi-scale architecture to capture both local and global image features, and incorporates a feature fusion module to combine information from different scales (we refer readers for more details about N2N and MIRNet to the Ref. [30] and the Refs. [31,32], respectively).

3. Data acquisition, description, and workflow

3.1 Data acquisition and description

The data used for developing the denoising method has been acquired using a laser scanning microscope (LSM510, Zeiss, Germany) equipped with a ps-laser system for coherent anti-Stokes Raman scattering (CARS), second harmonic generation (SHG), and two-photon excited fluorescence (TPEF) microscopy as described in detail previously [42]. Briefly, the sample is illuminated with two spatially and temporally synchronized laser pulse trains of ps-pulse duration. The difference frequency of both lasers matches the symmetric CH2 stretching vibration at 2850 cm-1. The pump laser is operating at 672.5 nm, the Stokes laser is at 832 nm. The specimen is illuminated through a 20x planapochromatic objective (Zeiss, Germany, NA = 0.8) using a 50 mW pump and 70 mW of Stokes power. CARS and SHG signals are collected and detected in forward direction by PMT detectors. The signals are split by a 514 nm dichroic longpass mirror. The CARS signal is detected using a 550 nm bandpass filter, the SHG signal using a 415 nm bandpass filter. The TPEF signal is collected in epi-direction through the illumination objective and reflected by a 600 nm longpass dichroic mirror to the PMT detector. In front of the PMT the TPEF signal is filtered using a 650 nm shortpass filter and a 458/64 nm bandpass filter (both Semrock, USA). All analyzed images have been acquired using 1.6 µs pixel dwell time, a field of view of 450 µm and 512 pixels length. For HQ images 16 frames have been averaged, for LQ images four frames averaging was applied.

The data represent the head and neck tissue of a mouse, with ten positions measured using the nonlinear multimodal imaging technique. The nonlinear multimodal imaging combines three modalities that are simultaneously excited using a 672.5 nm pump and 832 nm Stokes and detected at 550 nm (CARS), 458 nm (TPEF), and 415 nm (SHG). In this manuscript, we utilized high-quality (HQ) and (experimental) low-quality (LQ) images acquired within 8s and 2s, respectively. The HQ and LQ images were obtained by averaging 16 and 4 frames, respectively, and each has a spatial resolution of 512×512 pixels for a 450×450 µm2 tile scan which is approximately equal to 0.88 µm/pixel.

3.2 Workflow

As mentioned before, we compared various denoising methods; the phase retrieval via GS, the median filter method, the pre-trained deep network, DnCNN, the N2N network, the MIRNet network, and our incSRCNN network. In the GS algorithm, each modality of the nonlinear multimodal imaging is processed independently. Since this algorithm depends greatly on knowing the source object, a Gaussian estimation is incorporated into the algorithm. Similarly, the MF method is applied directly to each channel of the MM images for a 3 × 3 kernel size. For the DnCNN network, the pre-trained network was loaded and employed separately on each of the modalities of the nonlinear multimodal images to predict high-quality images. In the case of the N2N, the MIRNet, and our proposed network, data augmentation is applied. Before data augmentation, one image was left aside for testing, and nine were split into 7 for training and 2 for validation. Various techniques can be considered for data augmentation; however, we used rotation, blurring, and Poisson noise for our medical images. In the analysis, we first created artificial LQ images by generating Poisson noise from the HQ images. The experiment and the artificial LQ images are rotated by 90°, 180°, and 270°, and the experiment LQ images were blurred using a Gaussian filter. The total number of images equals 63 for the training part and 18 for the validation. We simultaneously applied data augmentation for both HQ and LQ images. In addition, each image was split into 16 patches. Consequently, the total patch images in the training and validation sets are 1008 and 288 patch images for each channel, respectively. Since each modality of the nonlinear multimodal imaging techniques measures specific molecular contributions, for instance, CARS modality explores the molecular distribution of proteins and lipids, SHG modality highlights collagen distribution in the sample, and TPEF modality identifies specific molecules like keratin and NADP(H), we considered these channels as independent images. Accordingly, the total number of patch images equals 3024 and 864 for the training and validation sets, respectively. However, only the CARS channel is used to train the network modality with 1008 and 288 images for training and validation sets, respectively. Since the CARS channel includes more structures while the background is more prominent in both the TPEF and the SHG modalities.

4. Results

Our analysis was split into two sections; first, we created artificial low-quality (LQ) images by generating Poisson noise from high-quality images (HQ). These artificial LQ images are created intentionally of lower quality as our experimental low-quality images are to be used subsequently in the training of the N2N, the MIRNet, and the incSRCNN deep learning networks. Therefore, the trained deep learning networks can generalize and cover other measurements with a different setup and lower quality. We then evaluated the GS algorithm, the MF method, the pre-trained DnCNN, the trained N2N, the trained MIRNet, and the trained incSRCNN networks on these artificial LQ images. We generated Poisson noise from HQ images, which results in an average PSNR decrease from 19.7 to 16.4. Finally, we tested all these methods on the experimental LQ image and compared their performances. However, image reconstruction evaluation is a tricky task, particularly for medical images, and as far as the authors know, no image metric is (always) recommended. Therefore, we used a panel of image metrics: the peak signal-to-noise ratio (PSNR), the structural similarity index measure (SSIM), the image correlation coefficient (ICC) [43], and the mean absolute error (MAE). Briefly, the PSNR is a widely used metric for measuring the quality of an image which compares the original image $x$ to the reconstructed one $\hat{x}$ by calculating the ratio of the peak signal to the noise and it can be formulated mathematically as follows

$$PSNR(x,\hat{x}) = 10 \times {\log _{10}}\left( {{\raise0.7ex\hbox{${Ma{x_x}}$} \!\mathord{\left/ {\vphantom {{Ma{x_x}} {\sqrt {\frac{1}{{IJ}}\sum\limits_{i = 1}^I {\sum\limits_{j = 1}^J {{{\|{x(i,j) - \hat{x}(i,j)} \|}^2}} } } }}}\right.}\!\lower0.7ex\hbox{${\sqrt {\frac{1}{{IJ}}\sum\limits_{i = 1}^I {\sum\limits_{j = 1}^J {{{\|{x(i,j) - \hat{x}(i,j)} \|}^2}} } } }$}}} \right),$$
where $I$ represents the number of rows of pixels of the image and $J$ is the number of columns of pixels of the image. SSIM, on the other hand, is used to quantify the similarity of a reconstructed image to its original one. It compares the luminance, contrast, and structure of the two images, and it is given by the following equation
$$SSIM({x,\hat{x}} )= \frac{{({2\bar{x}\bar{\hat{x}} + {C_1}} )({2{\sigma_{x\hat{x}}} + {C_2}} )}}{{({{x^2} + {{\bar{x}}^2} + {C_1}} )({\sigma_x^2 + \sigma_{\hat{x}}^2 + {C_2}} )}},$$
where $\bar{x}$ is the mean of the original image, $\hat{x}$ is the mean of the reconstructed image, ${\sigma _x}$ is the standard deviation of the original image, ${\sigma _{\hat{x}}}$ is the standard deviation of the reconstructed image, ${C_1}$ and ${C_2}$ are two constants, and ${\sigma _{x\hat{x}}}$ represents the covariance of the original image $x$ and the reconstructed one $\hat{x}$. ICC measures how well two images are correlated, and it is calculated as follows
$$ ICC(x,\hat{x}) = {\raise0.7ex\hbox{${{\sigma _{x\hat{x}}}}$} \!\mathord{\left/ {\vphantom {{{\sigma _{x\hat{x}}}} {{\sigma _x}{\sigma _{\hat{x}}}}}}\right. }\!\lower0.7ex\hbox{${{\sigma _x}{\sigma _{\hat{x}}}}$}}, $$
and finally, MAE measures the average magnitude of the errors between the constructed values and the original one and it is given by the following formula
$$MAE({x,\hat{x}} )= \frac{1}{{IJ}}\sum\limits_{i = 1}^I {\sum\limits_{j = 1}^J {|{x({i.j} )- \hat{x}({i,j} )} |} } .$$

In addition, we visualized the residual images and the histogram of the residual images. Moreover, the time for reconstructing one channel by these methods was illustrated in Table S1 in Supplement 1.

First, the GS algorithm is implemented independently on the three channels that form the MM LQ images. The GS algorithm requires an approximation of the source beam and the LQ image as input. Therefore, the source beam is represented by Gaussian approximation (its illustration is shown as X in Fig. 2). A detailed explanation of the GS algorithm is discussed in the method section. The number of iterations that the algorithm carries on is 50000, and the code was built using Matlab 2020b (The MathWorks, Natick, MA).

The GS reconstruction of the artificial LQ image, displayed in Fig. 4(c-1), generally preserves the structure but includes dark regions resulting from the Gaussian estimation. Furthermore, the overall similarity between the HQ and reconstructed images was significantly low. Since the CARS channel has a more complex structure than the TPEF and SHG channels where the background is more prominent, its reconstruction compared to the other two differs dramatically with an increase from 0.27 to 0.39 in the CARS channel to an increase from 0.17 and 0.28 to 0.53 and 0.56 in the TPEF and SHG channels, respectively. In addition, although the noise level for the three channels decreases, only a slight improvement in the PSNR from 14.8 to 14.9 is shown for the CARS channel. However, the increase in the PSNR reached 21.4 and 21.7 from 16.1 and 18.4 for TPEF and SHG channels, respectively (refer to Table 1 for more details). Moreover, the SSIM is improved for the three channels. Similar to the PSNR, an increase in the SSIM value is deduced in the three channels’ reconstruction. For instance, the SSIM for the CARS, TPEF, and SHG reconstructions increase from 0.27, 0.17, and 0.28 to 0.39, 0.53, and 0.56, respectively. Furthermore, the average ICC and MAE of the reconstructed image are equal to 0.86 and 0.09, respectively, while their values for the artificial image were 0.68 and 0.12. For more insights about the GS reconstructions, the residual images of the HQ and the reconstruction images are visualized in Fig. S1 in Supplement 1 and compared to the residual images of the HQ and the artificial LQ images. Although the GS reconstruction was able to reconstruct some parts of the image compared with the artificial LQ case, various regions are still not well reconstructed. In addition, we illustrated in Supplement 1 Fig. S2 the histogram of the residual of the reconstructed image and compared it with the residual of the artificial LQ image. Although the PSNR, SSIM, ICC, and MAE showed improved values and the histogram of the residual images showed fewer variations, the reconstructions were poor and revealed darker regions.

 figure: Fig. 4.

Fig. 4. The artificial LQ image with corresponding reconstructions using direct methods via the GS algorithm, the MF method, and the DnCNN, and the results using trained networks via the Noise2Noise (N2N), the MIRNet, and the incSRCNN networks. The experimental HQ and artificial LQ images are displayed in a) and b). The reconstructions of the artificial LQ image using the GS algorithm, the MF method, the DnCNN network, the N2N network, the MIRNet network, and the incSRCNN network are shown in part c subpanel 1,2,3,4,5,6, respectively. At first glance, the DnCNN network represents the HQ image better. On the other hand, the trained N2N and MIRNet networks show inefficiency in some regions due to the lack of data. Moreover, the proposed incSRCNN network preserves detailed structures compared to the smooth region produced by the DnCNN network. Still, some artifacts were produced, resulting from the small data size used to train the network. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.

Download Full Size | PDF

Tables Icon

Table 1. The PSNR, the SSIM, the ICC, and the MAE between the HQ image and the artificial LQ image and between the HQ image and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks

Next, we applied the median filter as a second standard method. The MF reconstruction of the artificial LQ image, displayed in Fig. 4(c-2), preserves the overall structure of the image. Although the PSNR, the SSIM, the ICC, and the MAE metrics showed improved values in the MF reconstruction (refer to Table 1 for more details), the MF method could not completely remove the noise.

Afterward, we applied a pre-trained network, the DnCNN, to predict the reconstruction of the artificial MM images. The DnCNN was implemented in Matlab 2020b (The MathWorks, Natick, MA). Similar to the GS algorithm, the DnCNN network was used independently on each of the three modalities. The reconstruction of the artificial LQ image is shown in Fig. 4(c-3). The spatial structures in the image are preserved, and the noise level is reduced. Furthermore, the PSNR has increased to 23.03, 26.2, and 25.4 for the CARS, TPEF, and SHG channels from 14.8, 16.1, and 18.4, respectively. In addition, the SSIM has significantly increased from 0.27, 0.17, and 0.28 to 0.59, 0.64, and 0.77 for the CARS, TPEF, and SHG channels, respectively. Consequently, the overall PSNR and SSIM improved from 16.4 and 0.24 to 24.9 and 0.67, respectively. Furthermore, the average ICC and MAE of the reconstructed image are equal to 0.92, and 0.04 while their values for the artificial image were 0.68, and 0.12. Figure 5 shows three regions of interest (ROIs) for all reconstruction algorithms. The colors in the DnCNN reconstruction are well conserved, and the noise level in the reconstruction was reduced significantly. However, smoothed structures are displayed in these ROIs, which is critical for biomedical applications because some important information may be compromised and lost, affecting the diagnosis of tissue abnormalities and diseases. In addition, the residual images per channel between the HQ image and the reconstructed one are shown in Supplement 1 Fig. S1. In this figure, most of the values across the image are zero, which means that the DnCNN was able to reconstruct the exact value of the high-quality image.

Moreover, the histogram of the residual images between the HQ image and the DnCNN reconstruction in Supplement 1 Fig. S2 significantly reduces the values compared with the artificial LQ case. Therefore, the DnCNN reconstructed a good representation of the HQ image successfully.

Finally, we evaluated our proposed network (incSRCNN) on the same artificial LQ images and compare it with two deep learning networks via the Noise2Noise and MIRNet networks that were trained with the same augmented MM images. The detailed architecture was described in the method section. The training of the network was performed by minimizing the mean absolute error (MAE)-based loss between the HQ images and the output of the incSRCNN network. The Adam algorithm was used for the optimization with a learning rate of 3e-4. A total of 1008 and 288 coupled HQ and LQ images were used for the training and the validation, respectively; refer to Supplement 1 Table S2 for more details. All computations were done using Google Colab. The total number of parameters to be trained is 20,481 (refer to Fig. S7 and Fig. S8 for more details about the architecture and parameters of the incSRCNN network). The training and prediction time for our architecture is around 10 minutes and 7 seconds, respectively. However, the N2N and MIRNet networks require around 48 minutes and 3 hours in the training phase and 33 and 68 seconds for the reconstruction of the image (refer to Table S1 for more details). We assessed different cases to train the network; three independent incSRCNN on each modality, one incSRCNN comprising all channels as separate data, and one incSRCNN that includes only the CARS channel. We found out that training with only the CARS channel produces better results. The training time of this network is approximately 10 minutes compared to 1 hour in the first and second cases.

The incSRCNN reconstruction of the artificial LQ image is shown in Fig. 4(c-6). While the N2N and MIRNet reconstructions are shown in the same Fig. 4(c-4 and 5, respectively). The spatial structures and the color in the images are preserved, and the noise level is reduced. Furthermore, the PSNR has increased to 20.5, 22.4, and 21.9 for the CARS, TPEF, and SHG channels from 14.8, 16.1, and 18.4, respectively. In addition, the SSIM has significantly increased from 0.27, 0.17, and 0.28 to 0.54, 0.55, and 0.57 for the CARS, TPEF, and SHG channels, respectively. Consequently, the overall PSNR and SSIM improved from 16.4 and 0.24 to 21.6 and 0.56, respectively. Furthermore, the average ICC and MAE of the reconstructed image are equal to 0.91, and 0.06, while their values for the artificial image were 0.68, and 0.12. Figure 5 shows three regions of interest (ROIs) of all reconstructions. The colors in the incSRCNN reconstruction are well conserved, and the noise level in the reconstruction was reduced significantly. In addition, the residual images per channel between the HQ image and the reconstructed one are shown in Supplement 1 Fig. S1. In this figure, a significant reduction of the STD values in the residual images from 0.2, 0.2, and 0.1 for CARS, TPEF, and SHG of the artificial LQ case, respectively to 0.08, 0.06, and 0.06 for CARS, TPEF, and SHG of the incSRCNN case, respectively.

Moreover, the histogram of the residual images between the HQ image and the incSRCNN reconstruction in Supplement 1 Fig. S2 significantly reduces the values compared with the artificial LQ case. Therefore, the incSRCNN reconstructed a good representation of the HQ image successfully. Compared to the other more complex deep learning networks N2N and MIRNet, all three networks produced black spots which resulted from the lack of data in the training procedure.

The next step is to assess the six methods on the experimental LQ MM image. First, we evaluated the GS algorithm on the experimental LQ image with the exact source estimation used for the artificial LQ image.

In the experimental LQ image reconstruction using the GS algorithm, the PSNR for the TPEF channel increased from 20.0 to 20.1. However, the PSNR decreased from 19.0 and 20.1 to 14.9 and 20.1 in the CARS and SHG channels, respectively. In addition, the SSIM improved for only the TPEF channel but worsened in the CARS and the SHG channels. All characteristics are given in Table 2. Furthermore, the average ICC showed an improved correlation value from 0.78 to 0.80, but the average MAE showed an increased value. It is worth noticing that the worsened values are mainly related to the CARS channel reconstruction since the other channels presented acceptable results. Moreover, the GS reconstruction of the experimental LQ image does not differ from the artificial LQ reconstruction, which can be deduced from the residual images and the residual histogram in Supplement 1 Fig. S3 and Fig. S4, respectively. The reason is that the algorithm converged to a local minimum and could not improve more.

Tables Icon

Table 2. The PSNR, the SSIM, the ICC, and the MAE between the HQ and the experimental LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks

Then, in the experimental LQ image reconstruction using the MF method, the PSNR for the CARS reconstruction decreased from 19.0 to 18.8. In addition, the SSIM of the CARS reconstruction also dropped to 0.54 from 0.56. We could conclude that the experimental LQ image is already in a high-quality condition that even a filter-based method could not improve the quality of the CARS channel.

Afterward, we tested the performance of the DnCNN in the experimental LQ image. In Fig. 6(c-3), we showed the reconstruction using the DnCNN network, where the spatial structure in the image is preserved, and the noise level is slightly reduced. In Table 2, we compared the PSNR, SSIM, ICC, and MAE between the DnCNN reconstructions and the HQ image with the PSNR, SSIM, ICC, and MAE between the experiment LQ image and the HQ image. Compared to the experiment LQ image, we deduced a slight improvement in the PSNR, SSIM, ICC, and MAE values per channel and overall, when using the DnCNN network. In Fig. 7, three ROIs showed a reduction in the noise level. Like the artificial case, smoothed regions were produced, which might cause the removal of important features that are highly sensitive in the diagnosis of diseases and abnormalities. In addition, we compared the residual images per channel of the DnCNN reconstructions with the residual images of the experiment LQ image in Supplement 1 Fig. S3. In this figure, almost similar residual values to the experimental LQ case can be detected. Furthermore, we visualized the histogram of the residual images of the DnCNN reconstruction and the experiment LQ images in Supplement 1 Fig. S4.

 figure: Fig. 5.

Fig. 5. Region of interests (ROIs) of the HQ image, the artificial LQ image, and its reconstructions using the GS algorithm, the MF method, the DnCNN, the N2N, the MIRNet, and the incSRCNN networks. The GS algorithm produces blurry images with dark spots/regions. Moreover, the MF method is not able to remove completely the noise. In the DnCNN reconstruction, some fine structures were lost while these structures were preserved using the incSRCNN network. However, some black dots were produced as artifacts using the trained N2N, MIRNet, and incSRCNN networks, which resulted from the small data size used to train the network. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. The experimental LQ image with corresponding reconstructions using direct methods via the GS algorithm, the MF method, and the DnCNN and trained networks via the Noise2Noise (N2N), the MIRNet, and the incSRCNN networks. The experimental HQ and LQ images are displayed in a) and b), respectively. The reconstruction of the experiment LQ image using the GS algorithm, the MF method, the DnCNN network, the N2N network, the MIRNet network, and the incSRCNN network is shown in part c / subpanel 1,2,3,4,5,6, respectively. At first glance, the DnCNN network better represents the HQ image. However, the N2N, MIRNet, and incSRCNN reconstructions preserve detailed structures while the DnCNN reconstruction displays smoothed structures. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. Region of interest (ROIs) of the HQ image, the experimental LQ image, and its constructions using the GS algorithm, the MF method, the DnCNN, the N2N, the MIRNet, and the incSRCNN networks. The GS algorithm produces a dark region due to the Gaussian estimation. Moreover, the MF reconstruction could not remove the noise completely. In the DnCNN reconstruction, some fine structures are lost while preserved using the incSRCNN network. However, some dark dots were produced, resulting from the lack of data used to train the 3 networks; N2N, MIRNet, and incSRCNN. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.

Download Full Size | PDF

Finally, we tested the performance of the incSRCNN in the experimental LQ image and compare it with the N2N and MIRNet networks. In Fig. 6(c-4, 5, 6), we showed the reconstruction using the N2N, the MIRNet, and our proposed network, where the spatial structures and the color in the image are preserved; however, the noise level is slightly reduced. In addition, we compared in Table 2 the PSNR, SSIM, ICC, and MAE between the three trained network reconstructions and the HQ image with the PSNR, SSIM, ICC, and MAE between the experiment LQ image and the HQ image. Compared to the experiment LQ image, although the average ICC improved, the PSNR and SSIM values per channel overall decreased. In Fig. 7, our proposed network preserves the color and spatial structures. However, the decrease mentioned above might result from the small data size that the network failed to estimate the values in some areas indicated by the arrow in the figure. However, we continued to assess this matter by checking the intensity values across an arbitrary region and evaluating the incSRCNN network on other noisy experimental LQ data. These noisy LQ data were derived by generating Poisson noise from the experimental LQ images. The results are illustrated in Supplement 1 Fig. S5 and Fig. S6. Figure S5 shows the intensity values for the HQ, LQ, and reconstructed images across the specified region in the image on the top left of the figure. The intensity values in GS differ totally from those in the LQ image, while the deep learning methods maintained a similar trend. However, the smoothed nature of the DnCNN reconstruction can be reflected by showing fewer details than in the incSRCNN reconstruction. Besides, the incSRCNN reconstructions for both noisy experimental LQ images illustrated in Figure S6 showed improvements in terms of PSNR, SSIM, ICC, and MAE values. In addition, we compared the residual images per channel of the incSRCNN reconstructions with the residual images of the experiment LQ image in Supplement 1 Fig. S3. In this figure, almost similar residual values to the experimental LQ case can be detected. Furthermore, we visualized the histogram of the residual images of the DnCNN reconstruction and the experiment LQ images in Supplement 1 Fig. S4. We previously discussed the performance of each method compared to the artificial and experimental LQ images. The GS reconstruction shows similar but poor performance for both artificial and experimental LQ images. The GS reconstructions include dark regions, and the algorithm showed limited abilities even in noiseless settings. In addition, it seems that the optimization algorithm of the GS method converges to a local minimum that causes poor reconstructions. However, the DnCNN and the incSRCNN reconstructions preserved the colors and detailed structures. Both networks performed well in the artificial LQ case, but the DnCNN produced smoothed regions critical for medical applications. Our proposed network consists of a simple architecture that only uses the CARS channels and predicts the other two channels. Similar to the artificial case, the DnCNN and incSRCNN networks performed better. These two networks preserve the color and the spatial structures of the image. However, the DnCNN network produced smoothed region, which is a drawback compared to our proposed network that shows a slight reduction in the noise due to the lack of data that the network could not train some regions. We additionally trained two other networks with deeper layers; the N2N and MIRNet networks. We showed that the lack of data affects the reconstruction also in these two networks.

5. Generalizability

The results explained previously involve only one image position where all the methods were either applied directly on this image or trained using the remaining 9 images and then using the trained networks to reconstruct this particular image. Therefore, in this section, we want to evaluate the PSNR, SSIM, ICC, and MAE values in two different ways: patch-wise analysis and cross-validation analysis. Both methods are utilized to investigate the variability of the reconstructions within an image (patch-wise analysis) and between images (cross-validation). In the patch-wise analysis, one single testing image of size 512 × 512 was used for validation. In this method, the PSNR, SSIM, ICC, and MAE are not evaluated on the whole reconstruction from both the direct methods and the trained networks, but the markers are calculated for 16 patches of size 128 × 128. This means that PSNR, SSIM, ICC, and MAE are calculated per patch resulting in 16 values per metric and then the average and standard deviation of these metrics are computed and visualized in Table 3 and Table 4. The second analysis is the cross-validation analysis and one MM image was left out within the cross-validation loop. In this study, a total of 10 MM images of size 512 × 512 were predicted and each of these images was reconstructed using the direct methods: the GS algorithm, the MF method, and the DnCNN network. Then, the PSNR, SSIM, ICC, and MAE per reconstruction were calculated, and finally, the average and the standard deviation of these metrics were computed. While for training the N2N, MIRNet, and incSRCNN networks, 10 networks were trained in which one image is left for testing purposes and the remaining 9 images were used for training of the N2N, MIRnet, and incSRCNN network. In the end, the aforementioned metrics are calculated on each testing image, and the average and standard deviation of these metrics is computed and visualized in Table 5 and Table 6.

Tables Icon

Table 3. The average PSNR, SSIM, ICC, and MAE between the HQ and the artificial LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the patch wise analysis

Tables Icon

Table 4. The average PSNR, SSIM, ICC, and MAE between the HQ and the experimental LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the patch wise analysis

Tables Icon

Table 5. The average PSNR, SSIM, ICC, and MAE between the HQ and the artificial LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the cross-validation analysis

Tables Icon

Table 6. The average PSNR, SSIM, ICC, and MAE between the HQ and the experimental LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the cross-validation analysis

First, the results of the patch-wise analysis for both artificial and experimental LQ images are summarized in Table 3 and Table 4, respectively. The DnCNN in terms of metrics showed a higher improvement for the artificial and experimental reconstructions. While the incSRCNN in the experimental reconstruction showed less variation. These results are consistent with the overall calculation for both artificial and experimental LQ reconstructions.

For the second method, the results of the cross-validation analysis for both artificial and experimental LQ images are summarized in Table 3 and Table 4, respectively. We used in this part the leave one MM image out cross-validation. For the direct methods, the reconstruction is implemented directly on each channel for the 10 MM images, and the average of the evaluation metrics is calculated. However, for the trained networks, the analysis consists of leaving one image for testing purposes and using the remaining 9 images to train the N2N, MIRNet, and incSRCNN networks. Afterward, these trained networks were used to reconstruct the testing MM image. Furthermore, the evaluation metrics were constructed and the average of the 10 cases is calculated and summarized in the tables below for the artificial and the experimental LQ images. The DnCNN in terms of metrics showed a higher improvement for the artificial and experimental reconstructions. While the incSRCNN network in the artificial and experimental reconstruction showed less variation.

6. Conclusion

The multimodal imaging approach (MM), which combines the CARS, the TPEF, and the SHG modalities provide information on the structure of the measured tissue and its components. However, the MM approach offers high-quality images only, when they are measured longer compared to faster MM image measurements, which results in MM images being distorted with noise and other artifacts. Therefore, image denoising techniques are helpful when fast measurements are needed or carried out. However, image denoising techniques feature the drawback that a suitable method needs to be chosen for different settings, which varies between application scenarios. In this context, we compared two classical methods; the median filter (MF) and the phase retrieval method via Gerchberg-Saxton (GS) with two deep learning approaches. The first approach is to use transfer learning via the pre-trained network namely DnCNN and the second approach is to use augmented MM images to train a deep learning network. In this context, we trained three networks; the N2N network, the MIRNet network, and our built-in network the incSRCNN.The data consists of MM images of the neck and head tissue of a mouse. First, we evaluated the GS algorithm, the MF method, the DnCNN, the N2N, the MIRNet, and the incSRCNN networks on artificial LQ images. Afterward, we tested all these methods on an experimental LQ image.

The artificial LQ image was constructed by generating Poisson noise from the HQ image. The GS algorithm of the artificial LQ image showed poor reconstruction, where dark regions are produced due to the Gaussian estimation used to describe the input beam. In addition, the MF reconstruction could not remove completely the noise. However, the DnCNN and the incSRCNN reconstructions preserve the color and the spatial structures in the image and improve the PSNR, SSIM, ICC, MAE, and STD compared to the artificial LQ image. However, the DnCNN produced smoothed region that might cause a compromise in the diagnosis of diseases and abnormalities. When comparing our incSRCNN network with the trained N2N and MIRNet networks, we concluded that the incSRCNN reconstruction is better since more black spots are produced by the MIRNet.

Afterward, we compared the performance of the six methods on the experimental LQ image. Like the artificial case, the GS algorithm showed poor performance, the MF showed good reconstruction, and the DnCNN network preserved the color and spatial structures in the images, but smoothed regions were produced. However, the incSRCNN networks maintained the color and the spatial structures in the image and did not produce smoothed areas. However, our proposed network showed a slight decrease in the PSNR, which resulted from the lack of data. In conclusion, a priori knowledge of the beam source is vital for the GS reconstruction, and the algorithm has limited recovery abilities even in a noiseless setting.

In summary, deep learning networks produced very promising results. However, the DnCNN network preserved the color and spatial structures of the image but produced smoothed regions, resulting in the loss of relevant information. However, our proposed network, the incSRCNN, consists of simple architecture, and it reconstructs the complex structures of the testing image and shows good PSNR than the other standard methods. Nevertheless, the incSRCNN network produced some artifacts represented by arrows in the zoomed figures, resulting from the lack of data used to train the network. It is worth mentioning that in all implemented methods the SSIM was around 0.6 which is quite low and this fact needs more in-depth analysis that might suggest the best evaluation metrics that can be used for denoising MM images, which we plan to investigate further as part of our future research. On the other hand, the shorter time to reconstruct an HQ image, run on a limited CPU computer, is through the incSRCNN network. Additionally, only 0.08 second is needed for predicting a patch of the image.

Funding

Freistaat Thüringen (2019 FGR 0083 (Morphotox), 5575/10-9 (Digleben)); Horizon 2020 Framework Programme (101016923 (CRIMSON)); Bundesministerium für Bildung und Forschung (13GW0370E (TheraOptik), 13N15706 (LPI-BT2-FSU), 13N15710 (LPI-BT3-FSU), 13N15464 (LPI-BT1-Leibniz-IPHT)); Thueringer Universitaetsund Landesbibliothek Jena Open Access Publication Fund; German Research Foundation (512648189).

Acknowledgments

We acknowledge support by the German Research Foundation Projekt-Nr. 512648189 and the Open Access Publication Fund of the Thueringer Universitaetsund Landesbibliothek Jena.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. A. Bhandary, G. A. Prabhu, V. Rajinikanth, K. P. Thanaraj, S. C. Satapathy, D. E. Robbins, C. Shasky, Y.-D. Zhang, J. M. R. S. Tavares, and N. S. M. Raja, “Deep-learning framework to detect lung abnormality – A study with chest X-Ray and lung CT scan images,” Pattern Recognit. Lett. 129, 271–278 (2020). [CrossRef]  

2. R. J. G. van Sloun, R. Cohen, and Y. C. Eldar, “Deep Learning in Ultrasound Imaging,” Proc. IEEE 108(1), 11–29 (2020). [CrossRef]  

3. S. Vedula, O. Senouf, A. M. Bronstein, O. V. Michailovich, and M. Zibulevsky, “Towards CT-quality Ultrasound Imaging using Deep Learning,” arXiv, ArXiv171006304 Phys. (2017). [CrossRef]  

4. Y. H. Yoon and J. C. Ye, “Deep Learning for Accelerated Ultrasound Imaging,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 6673–6676.

5. M. Grewal, M. M. Srivastava, P. Kumar, and S. Varadarajan, “RADnet: Radiologist level accuracy using deep learning for hemorrhage detection in CT scans,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) (2018), pp. 281–284.

6. N. Yamato, H. Niioka, J. Miyake, and M. Hashimoto, “Improvement of nerve imaging speed with coherent anti-Stokes Raman scattering rigid endoscope using deep-learning noise reduction,” Sci. Rep. 10(1), 15212 (2020). [CrossRef]  

7. S. Wang, B. Lin, G. Lin, R. Lin, F. Huang, W. Liu, X. Wang, X. Liu, Y. Zhang, and F. Wang, “Automated label-free detection of injured neuron with deep learning by two-photon microscopy,” J. Biophotonics 13, e201960062 (2020). [CrossRef]  

8. N. Vogler, A. Medyukhina, I. Latka, S. Kemper, M. Böhm, B. Dietzek, and J. Popp, “Towards multimodal nonlinear optical tomography – experimental methodology,” Laser Phys. Lett. 8(8), 617–624 (2011). [CrossRef]  

9. W. Becker, “Fluorescence lifetime imaging – techniques and applications,” J. Microsc. 247(2), 119–136 (2012). [CrossRef]  

10. V. B. Pelegati, J. Adur, A. A. De Thomaz, D. B. Almeida, M. O. Baratti, L. A. L. A. Andrade, F. Bottcher-luiz, and C. L. Cesar, “Harmonic optical microscopy and fluorescence lifetime imaging platform for multimodal imaging,” Microsc. Res. Tech. 75(10), 1383–1394 (2012). [CrossRef]  

11. C. A. Patil, N. Bosschaart, M. D. Keller, T. G. van Leeuwen, and A. Mahadevan-Jansen, “Combined Raman spectroscopy and optical coherence tomography device for tissue characterization,” Opt. Lett. 33(10), 1135–1137 (2008). [CrossRef]  

12. P. C. Ashok, B. B. Praveen, N. Bellini, A. Riches, K. Dholakia, and C. S. Herrington, “Multi-modal approach using Raman spectroscopy and optical coherence tomography for the discrimination of colonic adenocarcinoma from normal colon,” Biomed. Opt. Express 4(10), 2179–2186 (2013). [CrossRef]  

13. A. T. Yeh, B. Kao, W. G. Jung, Z. Chen, J. Stuart Nelson, and B. J. Tromberg, “Imaging wound healing using optical coherence tomography and multiphoton microscopy in an in vitro skin-equivalent tissue model,” J. Biomed. Opt. 9(2), 248–253 (2004). [CrossRef]  

14. N. Iftimia, R. D. Ferguson, M. Mujat, A. H. Patel, E. Z. Zhang, W. Fox, and M. Rajadhyaksha, “Combined reflectance confocal microscopy/optical coherence tomography imaging for skin burn assessment,” Biomed. Opt. Express 4(5), 680–695 (2013). [CrossRef]  

15. K. Kong, C. J. Rowlands, S. Varma, W. Perkins, I. H. Leach, A. A. Koloydenko, H. C. Williams, and I. Notingher, “Diagnosis of tumors during tissue-conserving surgery with integrated autofluorescence and Raman scattering microscopy,” Proc. Natl. Acad. Sci. 110(38), 15189–15194 (2013). [CrossRef]  

16. N. Vogler, S. Heuke, T. W. Bocklitz, M. Schmitt, and J. Popp, “Multimodal Imaging Spectroscopy of Tissue,” Annu. Rev. Anal. Chem. 8(1), 359–387 (2015). [CrossRef]  

17. M. Beeres, J. L. Wichmann, J. Paul, E. Mbalisike, M. Elsabaie, T. J. Vogl, and N.-E. A. Nour-Eldin, “CT chest and gantry rotation time: does the rotation time influence image quality?” Acta Radiol. 56(8), 950–954 (2015). [CrossRef]  

18. J. Huang, Y. Cao, J. Wang, A. Liu, Q. Wu, Z. Chang, Z. Li, Y. Luo, L. Gao, and G. Yin, “Time-stretch-based multidimensional line-scan microscopy,” Opt. Lasers Eng. 160, 107197 (2023). [CrossRef]  

19. N. Goel, A. Yadav, and B. M. Singh, “Medical image processing: A review,” in 2016 Second International Innovative Applications of Computational Intelligence on Power, Energy and Controls with Their Impact on Humanity (CIPECH) (2016), pp. 57–62.

20. S. V. M. Sagheer and S. N. George, “A review on medical image denoising algorithms,” Biomed. Signal Process. Control 61, 102036 (2020). [CrossRef]  

21. JNTUH University, Telangana, India and also Dept of ECE, S R Engineering College (Autonomous), Warangal, India, S. Kollem, K. R. L. Reddy, D. S. Rao, “A Review of Image Denoising and Segmentation Methods Based on Medical Images,” Int. J. Mach. Learn. Comput. 9(3), 288–295 (2019). [CrossRef]  

22. V. Mannam, Y. Zhang, Y. Zhang, Y. Zhu, E. Nichols, Q. Wang, V. Sundaresan, S. Zhang, C. Smith, P. W. Bohn, P. W. Bohn, S. S. Howard, and S. S. Howard, “Real-time image denoising of mixed Poisson–Gaussian noise in fluorescence microscopy images using ImageJ,” Optica 9(4), 335–345 (2022). [CrossRef]  

23. R. W. Gerchberg and W. O. Saxton, “Comment on `A method for the solution of the phase problem in electron microscopy’,” J. Phys. Appl. Phys. 6(5), 101L31 (1973). [CrossRef]  

24. G. Yang, B. Dong, B. Gu, J. Zhuang, and O. K. Ersoy, “Gerchberg–Saxton and Yang–Gu algorithms for phase retrieval in a nonunitary transform system: a comparison,” Appl. Opt. 33(2), 209–218 (1994). [CrossRef]  

25. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]  

26. F. Fogel, I. Waldspurger, and A. d’Aspremont, “Phase retrieval for imaging problems,” Math. Program. Comput. 8(3), 311–335 (2016). [CrossRef]  

27. G. Whyte and J. Courtial, “Experimental demonstration of holographic three-dimensional light shaping using a Gerchberg–Saxton algorithm,” New J. Phys. 7, 117 (2005). [CrossRef]  

28. K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” J. Big Data 3(1), 9 (2016). [CrossRef]  

29. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising,” IEEE Trans. Image Process. 26(7), 3142–3155 (2017). [CrossRef]  

30. J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2Noise: Learning Image Restoration without Clean Data,” arXivarXiv.1803.04189 (2018). [CrossRef]  

31. S. W. Zamir, A. Arora, S. H. Khan, H. Munawar, F. S. Khan, M.-H. Yang, and L. Shao, “Learning Enriched Features for Fast Image Restoration and Enhancement,” IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1934–1948 (2023). [CrossRef]  

32. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Learning Enriched Features for Real Image Restoration and Enhancement,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, eds., Lecture Notes in Computer Science (Springer International Publishing, 2020), Vol. 12370, pp. 492–511.

33. C. Dong, C. C. Loy, K. He, and X. Tang, “Image Super-Resolution Using Deep Convolutional Networks,” arXiv, ArXiv150100092 Cs (2015). [CrossRef]  

34. C. Dong, C. C. Loy, K. He, and X. Tang, “Image Super-Resolution Using Deep Convolutional Networks,” IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). [CrossRef]  

35. C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2015), pp. 1–9.

36. L. Tan and J. Jiang, “Chapter 13 - Image Processing Basics,” in Digital Signal Processing (Third Edition), L. Tan and J. Jiang, eds. (Academic Press, 2019), pp. 649–726.

37. Y. Gao and L. Cao, “A Complex Constrained Total Variation Image Denoising Algorithm with Application to Phase Retrieval,” 11 (n.d.).

38. H. Chang, Y. Lou, Y. Duan, and S. Marchesini, “Total Variation–Based Phase Retrieval for Poisson Noise Removal,” SIAM J. Imaging Sci. 11(1), 24–55 (2018). [CrossRef]  

39. O. Oh, Y. Kim, D. Kim, D. S. Hussey, and S. W. Lee, “Phase retrieval based on deep learning in grating interferometer,” Sci. Rep. 12(1), 6739 (2022). [CrossRef]  

40. Ç. Işıl, F. S. Oktem, and A. Koç, “Deep iterative reconstruction for phase retrieval,” Appl. Opt. 58(20), 5422–5431 (2019). [CrossRef]  

41. K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv, ArXiv14091556 Cs (2015). [CrossRef]  

42. S. Heuke, N. Vogler, T. Meyer, D. Akimov, F. Kluschke, H. J. Rowert-Huber, J. Lademann, B. Dietzek, and J. Popp, “Detection and Discrimination of Non-Melanoma Skin Cancer by Multimodal Imaging,” Healthcare 1(1), 64–83 (2013). [CrossRef]  

43. J. Xiao, Z. Liu, P. Zhao, Y. Li, and J. Huo, “Deep Learning Image Reconstruction Simulation for Electromagnetic Tomography,” IEEE Sens. J. 18(8), 3290–3298 (2018). [CrossRef]  

Supplementary Material (1)

NameDescription
Supplement 1       SI document

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. An example of a multimodal image consisting of the three modalities CARS of the CH2 stretching vibration at 2850 cm-1, TPEF, and SHG, as the red, green, and blue channels, respectively, is given.
Fig. 2.
Fig. 2. The workflow of the GS algorithm. First, the LQ image with a random phase was fed to the algorithm, and after k iteration, the high-quality image was constructed. The GS algorithm depends on an estimation of the source object, which is unknown, and therefore Gaussian estimation was used.
Fig. 3.
Fig. 3. The transfer learning-based approach via DnCNN and the trained deep learning networks via Noise2Noise, MIRNet, and our proposed deep learning networks (incSRCNN). On top of the figure, we used a pre-trained network, DnCNN, to predict MM images with higher quality. Moreover, the Noise2Noise and The MIRNet networks are trained using augmented neck and head tissue images. The architecture of our proposed network, incSRCNN, is shown at the bottom. This network represents a modified version of the SRCNN and is inspired by the inception network. Initially, the first layer convolves the input image with different kernel sizes into 192 feature maps. The second layer then applies a 1 × 1 kernel to condense to 64 feature maps. Finally, the third layer uses a 3 × 3 kernel to construct the output image.
Fig. 4.
Fig. 4. The artificial LQ image with corresponding reconstructions using direct methods via the GS algorithm, the MF method, and the DnCNN, and the results using trained networks via the Noise2Noise (N2N), the MIRNet, and the incSRCNN networks. The experimental HQ and artificial LQ images are displayed in a) and b). The reconstructions of the artificial LQ image using the GS algorithm, the MF method, the DnCNN network, the N2N network, the MIRNet network, and the incSRCNN network are shown in part c subpanel 1,2,3,4,5,6, respectively. At first glance, the DnCNN network represents the HQ image better. On the other hand, the trained N2N and MIRNet networks show inefficiency in some regions due to the lack of data. Moreover, the proposed incSRCNN network preserves detailed structures compared to the smooth region produced by the DnCNN network. Still, some artifacts were produced, resulting from the small data size used to train the network. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.
Fig. 5.
Fig. 5. Region of interests (ROIs) of the HQ image, the artificial LQ image, and its reconstructions using the GS algorithm, the MF method, the DnCNN, the N2N, the MIRNet, and the incSRCNN networks. The GS algorithm produces blurry images with dark spots/regions. Moreover, the MF method is not able to remove completely the noise. In the DnCNN reconstruction, some fine structures were lost while these structures were preserved using the incSRCNN network. However, some black dots were produced as artifacts using the trained N2N, MIRNet, and incSRCNN networks, which resulted from the small data size used to train the network. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.
Fig. 6.
Fig. 6. The experimental LQ image with corresponding reconstructions using direct methods via the GS algorithm, the MF method, and the DnCNN and trained networks via the Noise2Noise (N2N), the MIRNet, and the incSRCNN networks. The experimental HQ and LQ images are displayed in a) and b), respectively. The reconstruction of the experiment LQ image using the GS algorithm, the MF method, the DnCNN network, the N2N network, the MIRNet network, and the incSRCNN network is shown in part c / subpanel 1,2,3,4,5,6, respectively. At first glance, the DnCNN network better represents the HQ image. However, the N2N, MIRNet, and incSRCNN reconstructions preserve detailed structures while the DnCNN reconstruction displays smoothed structures. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.
Fig. 7.
Fig. 7. Region of interest (ROIs) of the HQ image, the experimental LQ image, and its constructions using the GS algorithm, the MF method, the DnCNN, the N2N, the MIRNet, and the incSRCNN networks. The GS algorithm produces a dark region due to the Gaussian estimation. Moreover, the MF reconstruction could not remove the noise completely. In the DnCNN reconstruction, some fine structures are lost while preserved using the incSRCNN network. However, some dark dots were produced, resulting from the lack of data used to train the 3 networks; N2N, MIRNet, and incSRCNN. All the MM images represent CARS, TPEF, and SHG modalities as the red, green, and blue channels, respectively.

Tables (6)

Tables Icon

Table 1. The PSNR, the SSIM, the ICC, and the MAE between the HQ image and the artificial LQ image and between the HQ image and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks

Tables Icon

Table 2. The PSNR, the SSIM, the ICC, and the MAE between the HQ and the experimental LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks

Tables Icon

Table 3. The average PSNR, SSIM, ICC, and MAE between the HQ and the artificial LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the patch wise analysis

Tables Icon

Table 4. The average PSNR, SSIM, ICC, and MAE between the HQ and the experimental LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the patch wise analysis

Tables Icon

Table 5. The average PSNR, SSIM, ICC, and MAE between the HQ and the artificial LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the cross-validation analysis

Tables Icon

Table 6. The average PSNR, SSIM, ICC, and MAE between the HQ and the experimental LQ images and between the HQ and the reconstructed images using the GS algorithm, the MF method, the DnCNN, N2N, MIRNet, and incSRCNN networks for the cross-validation analysis

Equations (9)

Equations on this page are rendered with MathJax. Learn more.

z k = x exp ( i φ k 1 )
ϕ k = arg ( F F T ( z k ) )
A k = X exp ( i ϕ k )
φ k = arg ( F F T ( A k ) ) .
l ( θ ) = 1 2 N i = 1 N ( y i ; θ ) ( y i x i ) F 2 ,
P S N R ( x , x ^ ) = 10 × log 10 ( M a x x / M a x x 1 I J i = 1 I j = 1 J x ( i , j ) x ^ ( i , j ) 2 1 I J i = 1 I j = 1 J x ( i , j ) x ^ ( i , j ) 2 ) ,
S S I M ( x , x ^ ) = ( 2 x ¯ x ^ ¯ + C 1 ) ( 2 σ x x ^ + C 2 ) ( x 2 + x ¯ 2 + C 1 ) ( σ x 2 + σ x ^ 2 + C 2 ) ,
I C C ( x , x ^ ) = σ x x ^ / σ x x ^ σ x σ x ^ σ x σ x ^ ,
M A E ( x , x ^ ) = 1 I J i = 1 I j = 1 J | x ( i . j ) x ^ ( i , j ) | .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.