Hybrid-structure network and network comparative study for deep-learning-based speckle-modulating optical coherence tomography

Guangming Ni; Renxiong Wu; Junming Zhong; Ying Chen; Ling Wan; Yao Xie; Jie Mei; Yong Liu

doi:10.1364/OE.454504

1. Introduction

Optical coherence tomography (OCT) [1,2] as a noninvasive biomedical imaging technology in yielding 3D imaging of biological tissues with micrometer resolution [3], has been widely utilized in medical diagnosis and treatment of ophthalmology [4,5], dermatology [6], cardiology [7] and neurology [8]. Based on interferometry, OCT relies on the coherence of backscattered light to resolve tissue morphology [9]. However, this coherence will inevitably generate speckle noise, which gives rise to degrading the spatial resolution and causing the subtle but significant microstructure of the tissue to be obliterated. Moreover, speckle noise negatively affects the accuracy of clinical diagnosis examinations [10], and also brings trouble to OCT image processing and analyzing, such as OCT image segmentation [11].

Following OCT development, various methods have been proposed to denoise OCT images and improve OCT resolving power via hardware-based and software-based approaches. Conventional hardware-based approaches modify the acquisition system to produce uncorrelated speckle patterns within or between B-scans, including angular [12], spatial [13,14], and frequency compounding [15]. These hardware-based methods can partly remove speckle noise while they increase system complexity. In addition, some hardware-based methods also require repeated scanning that reduces the imaging temporal resolution, and sample motion during the repeated scanning also brings motion artifacts that lead to poor image quality [16]. Another newly proposed hardware-based approach in practice is speckle-modulating OCT [17], which utilizes a moving optical diffuser to implement local, random time-varying phase shifts to acquire an unlimited number of uncorrelated speckle patterns and effectively remove speckle patterns. However, speckle-modulating OCT seriously reduces the imaging sensitivity and temporal resolution because it requires moving the diffuser in the sample arm and repeating scanning. More software-based approaches have been proposed. Some straightforward software-based approaches are earlier utilized for OCT image denoise, such as image averaging which averages multiple B-scans obtained at the same location [18,19], digital filters including mean filters [12], low-pass filters [20], etc. More complex software-based approaches were further proposed, including wavelet-based methods [21,22], BM3D [23] and MSBTD [24]. Recently, image sparsity-based methods such as dictionary learning and sparse representation are also introduced into OCT image denoising [25]. However, these conventional software-based methods cannot remove speckle well, and always degrade the spatial resolution of OCT images, tending to show over-smoothening or losing meaningful subtle features.

Deep learning is playing a dominant role in the field of image processing over the past few years, such as image recognition [26], image segmentation [27], super-resolution reconstruction [28], etc. Deep learning has also demonstrated the great power to address OCT image processing tasks, such as DeSpecNet [29], a convolutional neural network (CNN) based method, has shown its delighted performance in retinal OCT denoising. Zhou et al. [30] proposed a conditional generative adversarial network (cGAN) which performs well in retinal OCT image speckle-noise suppression and contrast enhancement. Moreover, DN-GAN [31] applied context encoding block in GAN to get a balanced performance in noise reducing and detail preservation for retinal OCT image. Cheong et al. [32] proposed OCT-GAN to remove noise and retinal shadow within a single step. More works have also been performed for OCT speckle reducing, such as Qiu et al. [33] and Mehdizadeh et al. [34] employed new perceptual loss functions, respectively, which has shown a strong ability to retain detail structure and high spatial resolution. These aforementioned deep-learning-based works have achieved better performances than conventional methods in the context of speckle suppression, demonstrated on retinal OCT images. However, those works using average retinal OCT images as training dataset still have shortcomings because of the following factors: First, although image averaging operation can remove electrical noise well, it can not work well for speckle, as our following result shows in section 3.2.1; Second, a single retinal OCT image dataset only contains limited speckle patterns while speckle patterns of general OCT images are much more massive, for speckle patterns also depend on scanning volume sizes and sample structures. Meanwhile, repeated scanning of eye boxes may degrade the spatial resolution of averaged images because of eyeball motion. Therefore, only using original and averaged retinal OCT images as the training dataset for deep learning has limited the purpose of removing speckle noise for many cases. Some works have been aware of this limitation mentioned above, and further proposed unsupervised and semi-supervised learning which show a popular trend, such as Wang et al. [35] proposed Caps-cGAN to construct the semi-supervised system, and Huang et al. [36] employed the ideas of disentangled representation and GAN to achieve a semi-supervised system. Rico-Jimenez et al. [37] introduced a self-fusion network to reduce speckle by using similarity between from adjacent frames without the need for repeated image acquisition. Huang et al. [38] compared networks and loss functions based on ground truth free and Qiu et al. [39] conducted a more detailed comparative study about the Noise2Noise methods in speckle reduction. However, these works achieved reducing the requirement of well-registration image pairs, but can not remove speckle noise completely and improve the microstructure-resolving ability because semi-supervised learning still has to use more or less averaged retinal OCT images, and the Noise2Noise approaches have to require two preconditions satisfied to guarantee effectiveness while OCT speckle can not meet these requirements in many cases because speckle is not same as electronic noise, and also depends on sample structures having the correlation feature [40].

Based on speckle-modulating OCT and deep learning, our previously proposed deep-learning-based speckle-modulating OCT has preliminarily shown its conspicuous advantages on OCT speckle removing and resolving ability improvement [41]. In this work, we proposed a deep-learning-based speckle-modulating OCT based on a hybrid-structure network RDBU-Net GAN and further conducted a comprehensively comparative study to explore multi-type deep-learning architectures’ abilities to extract speckle pattern characteristics and remove speckle to optimally perform deep-learning-based speckle-modulating OCT. More specifically, seven representative networks are divided into three types of deep learning network architectures to study separately, namely Line-shaped network, U-shaped network, and GAN-based network. The effectiveness of each structure was investigated by using a customized large speckle-modulating OCT dataset containing massive more-general speckle patterns, which is the first time that network comparative study has been performed on such kind of OCT dataset, but not retinal OCT datasets with limited speckle patterns. With the outstanding performances of the proposed RDBU-Net GAN, we have further paid more attention to compare more GAN-based networks, by using the U-shaped network, modified ResNet, and multi-connection DenseNet as the generator of GAN in turn for comparison, and further performed speckle removing on pork meat and scotch tape OCT images to demonstrate the performances of the networks trained using the customized large speckle-modulating OCT dataset.

2. Methods

2.1 Speckle-modulating OCT algorithms

Speckle-modulating OCT (SM-OCT) [17] is an innovative hardware-based denoise approach that removes speckle noise effectively without degrading the spatial resolution of the image but reducing imaging sensitivity and temporal resolution. The non-correlated speckle patterns are yielded by moving a diffuser in the sample arm and performing repeated scanning. Many images further are averaged together to create an image without speckle noise subsequently. Here it assumes that M is the number of scatters inside a voxel and ${\varphi _{m,n}}$ is the time-varying local phase shift implemented by the moving diffuser, and this method can be described as

(1)$$S = \frac{1}{N}\sum\limits_{n = 1}^N {\left|{\sum\limits_{m = 1}^M {{a_m}{e^{i{\lambda_m}}}{e^{i{\varphi_{m.n}}}}} } \right|} ,$$

where ${a_m}$ represents the scattering amplitude and ${\lambda _m}$ indicates the phase delay caused by the axial location of the $mth$ scatter. Moreover, S denotes the pixel value which is obtained by averaging N images at different times with different local phase shifts.

2.2 Deep-learning-based speckle-modulating OCT principle

Deep-learning-based speckle-modulating OCT can perform speckle-free OCT imaging with maintaining imaging temporal resolution and sensitivity well, compared with speckle-modulating OCT based on hardware [41]. Deep-learning-based speckle-modulating OCT is achieved by integrating a conventional OCT setup and a trained-well deep learning network, as Fig. 1 shows. The key part of deep-learning-based speckle-modulating OCT is to extract speckle pattern characteristics and further remove speckle using a deep learning network trained with a customized large speckle-modulating OCT dataset containing massive speckle patterns. The customized large speckle-modulating OCT dataset was obtained by provisionally rebuilding the conventional OCT setup into a speckle-modulating OCT, more detailed information can be found in our previous work [41]. Here we proposed a deep-learning-based speckle-modulating OCT based on a hybrid-structure network RDBU-Net GAN. More detailed information about the proposed RDBU-Net GAN can be found in the following Section 2.3.3. To achieve the deep-learning-based speckle-modulating OCT, three main phases were conducted, including dataset preparing, model training, and integrating for speckle removing. More schematic details can be found in Fig. 1.

Fig. 1. Schematic diagram of deep-learning-based speckle-modulating OCT. (a) Dataset preparing, (b) model training, and (c) speckle removing.

Download Full Size | PDF

As Fig. 1(a) shows, each speckle-free ground truth image was generated by averaging 100-repeated-scanning B-scans frames obtained from a speckle-modulating OCT setup in the dataset preparing phase. 30 images were randomly selected from those 100-repeated-scanning B-scans as the training input, paired with corresponding ground truth. During the training phase, seven networks divided into three types were constructed to compare the performances for extracting speckle pattern characteristics and denoising, as Fig. 1(b) shows. The training data was fed to the seven networks respectively to start forward propagation and output the predicted image. By calculating the loss between the predicted image $\hat{y}$ and the ground truth image y, back-propagation passed the loss value and updates the model parameters in the direction of gradient decline. Meanwhile, we made a validation set using the same method with making a training set to monitor the training process and test the network performance using the peak signal-to-noise ratio (PSNR) as the evaluation metric. The PSNR is defined as Eq. (2):

(2)$$PSNR = 10 \times lo{g_{10}}\left( {\frac{{max{{({\hat{y}} )}^2}}}{{\frac{1}{{H \times W}}\sum\limits_{H,W} {{{({y - \hat{y}} )}^2}} }}} \right),$$

where $max ({\hat{y}} )$ represents the maximum value of the pixel in the image $\hat{y}$, and H, W are the height and width of the B-scan, respectively. After the training phase, the well-trained network model was integrated with the conventional OCT setup which was provisionally rebuilt into a speckle-modulating OCT to generate the training dataset for the networks, to finally obtain the deep-learning-based speckle-modulating OCT, as Fig. 1(c) shows.

2.3 Network architectures

To comprehensively analyze the performances of different deep learning networks to extract the characteristics of general speckle patterns and further remove speckle, here we investigated DnCNN, U-Net, RDBU-Net, U-Net GAN, ResGAN, RDBGAN, and the proposed RDBU-Net GAN, as deep learning networks for deep-learning-based speckle-modulating OCT. During the training process, we adjusted the hyperparameters, such as learning rate and iterations, to optimize the performances of those models. Meanwhile, different loss functions were also applied to achieve better results.

2.3.1 Line-shaped network

DnCNN [42] is a representative Line-shaped architecture, which is widely used in image denoising and super-resolution reconstruction owing to its simple and intuitive structure. In this paper, a deep CNN modified from the DnCNN was investigated in OCT image speckle reduction, as Fig. 2 shows. The input of the model was a noisy OCT image and the output was the denoised image. 18 convolutional layers with three different types were concluded of the network. For the input layer, a convolutional layer with 64 kernels of size $3 \times 3 \times 1$ and a rectified linear unit (ReLU, max(0; ·)) were utilized. From layer 2 to layer 17, each layer had a convolution kernel with a size of $3 \times 3 \times 64$ followed by a batch normalization (BN, momentum = 0.99, epsilon = 0.001) layer for improving convergence performance, and a ReLU layer as the activation function. The last layer consisted of a single convolution kernel of size $3 \times 3 \times 64$ to yield residual noisy images n. Denoised images $\hat{y}$ were generated through the formulation: $\hat{y} = x - n$, where x was the input of the network.

Fig. 2. Structure of the DnCNN for speckle patterns extraction, Conv: convolutional layer; BN: BN layer; ReLU: ReLU layer.

Download Full Size | PDF

2.3.2 U-shaped network

U-shaped network [43,44] is composed of a down-sampling encoder and an up-sampling deconvolution decoder, which is named after its symmetrical structure like U-shape. U-Net network has made great achievements in the field of medical image segmentation, and it has also been applied to image denoising recently. Lan et al. [45] used U-Net network to remove speckle patterns in ultrasonic images, and Dong et al. [46] explored the performance of U-Net network for dehazing problem. In this work, a U-Net network architecture shown in Fig. 3 was used to conduct OCT speckle removal. Firstly, a noisy OCT image was fed into the encoder of the U-Net. The number of feature maps of our proposed model was as follows: 1→32→64→128→256→512→256→128→64→32→1. In the encoder stage, subsequent convolutions downsample the image five times to a latent variable. Each down-sampling stage contained two $3 \times 3$ 2-D convolutions, followed by a Leaky ReLU ($\mathrm{\alpha } = 0.3$) nonlinear function, and a $2 \times 2$ max-pooling. In the decoder stage, up-sampling gradually restored the latent variable back to the original input size. Each up-sampling stage contained a $3 \times 3$ deconvolution with a stride of 2 and two $3 \times 3$ 2D convolutions, followed by a Leaky ReLU nonlinear function. After each deconvolution, the image was concatenated with the output feature maps of the corresponding symmetric layer on the encoder side through skip connections. A final $1 \times 1$ convolution layer was then used to reconstruct the desired denoise image.

Fig. 3. Structure of the U-Net for speckle patterns extraction, Conv: convolutional layer; LR: Leaky ReLU layer; MaxPooling: max-pooling layer; DeConv: deconvolution layer.

Download Full Size | PDF

Since conventional U-Net networks utilize skip connections between encoder and decoder to obtain outstanding representation capability, more variants of U-Net structure were constantly being studied. Here a residual dense block U-Net (RDBU-Net) was also investigated to study whether residual learning and dense connect can make U-Net achieve better results in speckle pattern characteristic extraction of OCT images. As shown in Fig. 4 (a), the convolution layers were replaced by RDB, and the feature map used for skip connection between encoder and decoder passed through RDB once more than the feature map of the input max-pooling layer. The $1 \times 1$ convolution layer after max-pooling and deconvolution was used to expand or compress the number of feature channels. As Fig. 4 (b) shows, every RDB contained two $3 \times 3$ convolution layers which were followed by a Leaky ReLU ($\mathrm{\alpha } = 0.3$) layer and a convolution as the output layer at the end.

Fig. 4. Structure of the RDBU-Net for speckle patterns extraction, RDB: residual dense block; Conv: convolutional layer; Leaky ReLU: Leaky ReLU layer; MaxPooling: max-pooling layer; DeConv: deconvolution layer.

Download Full Size | PDF

2.3.3 GAN-based network

As an important type of deep learning network, GAN [47] has been widely used in the field of image processing. GAN contains a generator and a discriminator, which are trained at the same time and confront each other. The predicted OCT image is reconstructed by a generator, whose goal is generating a realistic (“fake”) image close enough to ground truth (“real”) to make the discriminator cannot distinguish. Meanwhile, the discriminator is trained to identify the difference as much as possible between “fake” and “real” images. With iterative adversarial training, the generator produces high-quality images that the discriminator cannot distinguish from the ground truth. Pix2Pix GAN [48] is an application of conditional GAN (cGAN), which performs image translation tasks with paired images and is widely used in various image processing.

In this work, multi-type Pix2Pix GAN networks were investigated and compared to extract speckle pattern characteristics and further remove speckle, including our proposed RDBU-Net GAN. For more details, four different network structures containing U-Net network, RDBU-Net network, Residual network, and Res-Dense network were employed as the generator in Pix2Pix GAN to study their performance for speckle removing of OCT images, respectively, as Fig. 5 shows. Figure 5 (a) shows the network structure of GAN. For discriminator networks, except the first layer, convolution layers with $3 \times 3$ kernels followed by a Leakey ReLU ($\mathrm{\alpha } = 0.3$) layer. Two dense layers followed by Leaky ReLU and sigmoid activation functions were utilized to output classification probability. For generator networks, the same U-Net and RDBU-Net networks in Section 2.3.2 were employed as the generator, as Fig. 5(b) and Fig. 5(c) show, respectively. The network structure of the Residual network adapted from Ref. [41] was demonstrated in Fig. 5 (d). Owing to the advantage of parameters reduction brought by removing batch normalization layers, a deeper residual network with 24 residual blocks was applied to achieve better performance. Every residual block contained two $3 \times 3$ convolution layers followed by a Leaky ReLU ($\mathrm{\alpha } = 0.3$) layer. An elementwise sum layer at the end of the residual block added input and output feature maps as the residual block output. DenseNet possessed a good performance in preserving details through densely skip connection and reusing feature maps. Therefore, the Res-Dense network consisting of six residual dense blocks adopted from Ref. [49] was proposed as the generator of GAN in Fig. 5 (e). Each RDB contained five $3 \times 3$ convolution layers followed by a Leaky ReLU ($\mathrm{\alpha } = 0.3$) layer except the last one. After the last convolution layer, an elementwise sum layer added the input and output feature maps.

Fig. 5. Structures of GAN-based deep learning networks used here. (a) Pipeline of GAN-based architecture including generator and discriminator; (b) U-Net network introduced in Section 2.3.2 as the generator of U-Net GAN; (c) RDBU-Net network introduced in Section 2.3.2 as the generator of RDBU-Net GAN; (d) Residual network as the generator of ResGAN; (d) Res-Dense network as the generator of RDBGAN; Conv: convolutional layer; LR: Leaky ReLU layer; RB: residual block; RDB: residual dense block.

Download Full Size | PDF

2.4 Objective functions

Pixel-wise losses ${L_2}$ is a commonly used loss function for model optimizing in image process tasks, defined as the difference square sum of pixels of predicted image $G({{x_i}} )$ and the ground truth ${y_i}$, as Eq. (3) expressed:

(3)$${L_2} = \frac{1}{n}\sum\limits_{i = 1}^n {{{||{G({{x_i}} )- {y_i}} ||}^2}} .$$

However, using ${L_2}$ loss function alone always causes the predicted image to be excessively smooth, which is not conducive to preserving image details. Here a combined loss function called content loss ${L_{content}}$ was utilized to remove speckle and resolve microstructures, as Eq. (4) expressed:

(4)$${L_{content}} = \alpha \times {L_2} + \beta \times {L_{VGG}},$$

where $\alpha $ and $\beta $ are the weighting coefficients, respectively. ${L_{VGG}}$ is the Euclidean distance between the feature representations of the predicted image and the ground truth extracted by the VGG network, as Eq. (5) expressed:

(5)$${L_{VGG}} = \frac{1}{n}\frac{1}{{whd}}\sum\limits_{i = 1}^n {{{||{VG{G_{19}}({G({{x_i}} )} )- VG{G_{19}}({{y_i}} )} ||}^2}} ,$$

where $VG{G_{19}}$ represents the VGG-19 network [50] pre-trained on ImageNet [51], and w, h, d are the width, height, and depth of the feature maps, respectively.

Our experiments showed the content loss in the Line-shaped network had no obvious improvement, so loss function ${L_2}$ was adopted to optimize the Line-shaped network. For U-shaped networks, the objective was to minimize content loss ${L_{content}}$ and $\mathrm{\alpha }$, $\mathrm{\beta }$ were set to 0.1 and 1, respectively to achieve the optimized performance, and this setting was also applicable to the GAN-based networks with the U-shaped network as the generator. The $\mathrm{\alpha }$ and $\mathrm{\beta }$ parameters in the content loss of ResGAN and RDBGAN were set to 0.001 and 1, respectively. For GAN-based networks, the content loss was combined with adversarial loss as a perceptual loss, which is expressed as Eq. (6):

(6)$${L_{perceptual}} = {L_{content}} + {10^{ - 3}} \times {L_{adversarial}},$$

where ${L_{\textrm{adversarial}}}$ is adversarial loss, as Eq. (7) expressed. $D[{G({{x_i}} )} ]$ represents the probability that the predicted image is a free-speckle image.

(7)$${L_{adversarial}} ={-} \frac{1}{n}\sum\limits_{i = 1}^n {logD[{G({{x_i}} )} ]} .$$

The generator tries to minimize perceptual loss while the discriminator tries to optimize towards the opposite objective. Discriminator loss function can be written as Eq. (8):

(8)$${L_{discriminator}} = \frac{1}{n}\sum\limits_{i = 1}^n {\{{logD({{y_i}} )- log[{1 - D({G({{x_i}} )} )} ]} \}} .$$

2.5 Evaluation metrics

Here we used several evaluation metrics to evaluate the performance of those deep learning networks. Six evaluation metrics were calculated: PSNR, structural similarity index (SSIM), signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), equivalent number of looks (ENL), and edge preservation index (EPI). SSIM was a metric to quantify the structural similarity between the predicted image and the ground truth based on human visual perception, as Eq. (9) expressed, where ${\mu _y}$, ${\mu _{\hat{y}}}$, ${\sigma _y}$, ${\sigma _{\hat{y}}}$ and ${\sigma _{y\hat{y}}}$ are the mean value, the standard deviation, and cross-covariance for ground truth y and predicted image $\hat{y}$, respectively.

(9)$$SSIM({y,\hat{y}} )= \frac{{({2{\mu_y}{\mu_{\hat{y}}} + {C_1}} )\times ({{\sigma_{y\hat{y}}} + {C_2}} )}}{{({\mu_y^2 + \mu_{\hat{y}}^2 + {C_1}} )\times ({\sigma_y^2 + \sigma_{\hat{y}}^2 + {C_2}} )}}.$$

Metric SNR was to measure the ratio of signal to noise in the OCT image, as Eq. (10) expressed, where ${\sigma _b}$ represents the standard deviation (SD) of the background ROI in the image, ${I_r}$ denotes the value of ROI. Moreover, m and n are the height and width of ROI, respectively.

(10)$$SNR = 10 \times lo{g_{10}}\left( {\frac{{\sum\limits_{i = 1}^m {\sum\limits_{j = 1}^n {I_r^2({i,j} )} } }}{{\sigma_b^2}}} \right).$$

CNR indicated the contrast between the signal region and noisy background region, as Eq. (11) expressed, where ${\mu _r}$ represents the mean of the ROI, ${\mu _b}$ and ${\sigma _b}$ are the mean and SD of the background ROI.

(11)$$CNR = 10 \times lo{g_{10}}\left( {\frac{{|{{\mu_r} - {\mu_b}} |}}{{{\sigma_b}}}} \right).$$

ENL evaluation metric represented an index applied to measure the smoothness of the homogeneous region in OCT image, and the ENL of each ROI can be calculated by Eq. (12). Here we further used the average value of three ROI’s ENL in the predicted image as the final ENL to effectively reflect the ability of the network to remove noise.

(12)$$EN{L_r} = \frac{{\mu _r^2}}{{\sigma _r^2}}.$$

EPI evaluation metric represented the ability of image edge preservation after image process, as Eq. (13) expressed, where ${I_d}$ and ${I_g}$ denote the predicted image and the ground truth respectively, $\textrm{H}$ and $\textrm{W}$ are the height and width of the image.

(13)$$EPI = \frac{{\sum\limits_{i = 0}^{H - 1} {\sum\limits_{j = 0}^{W - 1} {|{{I_d}({i + 1,j} )- {I_d}({i,j} )} |} } }}{{\sum\limits_{i = 0}^{H - 1} {\sum\limits_{j = 0}^{W - 1} {|{{I_g}({i + 1,j} )- {I_g}({i,j} )} |} } }}.$$

3. Experiment and results

3.1 Dataset and training details

Here we collected a customized speckle-modulating OCT dataset obtained by provisionally rebuilding the conventional OCT setup into a speckle-modulating OCT and conducting speckle-modulating imaging using multiple scanning patterns. The customized speckle-modulating OCT dataset contained mass more-general speckle patterns to train those networks. To obtain massive speckle patterns of the OCT setup for the deep learning networks, we imaged 30 different fresh biological tissue samples and Scotch tape, from various meats, fruits, and vegetables. For each sample, we had imaged different parts of the sample using the above rebuilt speckle-modulating OCT setup with different scan patterns. Multiple scan patterns were used to make the speckle-pattern dataset more general. For each sample, we used different objective lenses to acquire images at several different locations with different scanning ranges, exposure time, and A-line numbers. For each imaging, we repeatedly scanned the same position 100 times. More detailed information about the dataset can be found in our previous work [41]. In total, 3,600 speckle-modulating OCT sub-datasets, concluding various meat, fruit, vegetables were collected. The corresponding ground truth images were obtained by averaging 100 repeated speckle-modulating OCT cross-sectional images, which significantly reduced speckle noise. Here 30 frames were randomly chosen from the 100 repeated-scanning speckle OCT images as the training data. No images from the same dataset were used as both training and validation data. After augmentation operations, such as images image flip, randomly crop, and so on, a total of 864,000 OCT images was used for training and 216,000 OCT images were used for validation.

With the pre-processed customized speckle-modulating dataset as the training dataset, all networks were implemented with Python (v3.8.8) based on Tensorflow (v2.4.0), and further trained on NVIDIA Geforce RTX 3080 GPU. The details of the network structures, including DnCNN, U-Net, RDB U-Net, ResGAN, RDBGAN, U-Net GAN, and RDBU-Net GAN, are listed in Table 1. Learning rate, iterations, and batch size of models were selected experimentally. Note that the learning rate of generator and discriminator in all GAN-based networks is the same.

Table 1. Learning rate, iterations, batch size of all training models and comparison of model training time and model size

View Table | View all tables in this article

3.2 Results

3.2.1 Effectiveness of averaging operation and speckle-modulating OCT

As previously mentioned, performing image averaging can remove electronic noise well, while it can not remove the speckle noise well in many cases because speckle also depends on the scanning volume sizes and sample structure. Speckle-modulating OCT can achieve speckle-free OCT images although it seriously reduces the imaging sensitivity and temporal resolution. Figure 6 shows the difference of ground truth images obtained by averaging conventional OCT images and speckle-modulating OCT method, respectively. Performing image averaging on original noisy images obtained by conventional OCT can remove electrical noise, as shown in Fig. 6 (a), while most speckle patterns still existed in the processed image, as shown in Fig. 6 (b). Figure 6 (d) demonstrates the ground truth image obtained from the speckle-modulating OCT method. A large number of speckle patterns in Fig. 6 (c) was removed and the spatial resolution of the image was maintained well after denoising as shown in Fig. 6(d). Therefore, our customized speckle-modulating OCT dataset containing mass more-general speckle patterns has significant advantages over the conventional image-averaged dataset when used to analyze deep learning networks’ abilities to extract speckle pattern characteristics and remove speckle noise, because of more general speckle patterns and speckle-free ground truth.

Fig. 6. Comparison of ground truth OCT images obtained by image averaging and speckle-modulating OCT algorithms. (a) and (b) are original noisy OCT images of Scotch tape obtained by conventional OCT setup and corresponding ground truth acquired by image averaging, respectively. (c) and (d) are the original noisy OCT image of Scotch tape obtained by speckle-modulating OCT setup and speckle-free ground truth acquired from SM-OCT algorithms, respectively. Scale bar: $100{\;\ \mathrm{\mu} \mathrm{m}}$

Download Full Size | PDF

3.2.2 Performance comparison between networks on test dataset

To compare the ability of deep learning networks proposed in Section 2.3 to extract speckle pattern characteristics in the general-speckle dataset, we tested those trained deep learning networks on OCT images of pork meat and Scotch tape which have not been used for training those deep learning networks. Figure 7 contains the original noisy OCT image and the ground truth of Scotch tape and denoising results, and Fig. 8 shows the original noisy OCT image and ground truth of pork meat image and denoising results. We further quantitatively measured the performances of networks using the above-mentioned evaluation metrics, as shown in Table 2.

Fig. 7. Noise reduction result comparison of deep learning networks on the test dataset. (a) Original noisy OCT image of Scotch tape. (b) DnCNN [42] result, (c) U-Net [45] result, (d) U-Net GAN [32] result, (e) ResGAN [41] result, (f) RDBGAN [49] result, (g) RDBU-Net result, (h) RDBU-Net GAN result, (i) Ground truth image. The zoom-in images of the patch covered by the orange box and green box are shown at the bottom of the corresponding image. The red boxes show the objective regions and the blue boxes show the background regions, which are used to calculate quantization metrics. Scale bar: $100{\;\ \mathrm{\mu} \mathrm{m}}$

Download Full Size | PDF

Fig. 8. Noise reduction result comparison of deep learning networks on the test dataset. (a) Original noisy OCT image of Scotch tape. (b) DnCNN [42] result, (c) U-Net [45] result, (d) U-Net GAN [32] result, (e) ResGAN [41] result, (f) RDBGAN [49] result, (g) RDBU-Net result, (h) RDBU-Net GAN result, (i) Ground truth image. Scale bar: $100{\;\ \mathrm{\mu} \mathrm{m}}$

Download Full Size | PDF

Table 2. Calculated PSNR, SSIM, SNR, CNR, ENL, and EPI of pork meat and Scotch tape OCT images shown in Fig. 7 and Fig. 8. SNR, CNR, and ENL values are the mean of selected ROI regions. The best results are bold-highlighted and the second-best are in italics.

View Table | View all tables in this article

Figure 7(a) is the original noisy OCT image of Scotch tape selected from the test dataset, and Fig. 7(i) is the corresponding ground truth. Figure 7 (b)–(h) are corresponding images predicted by DnCNN, U-Net, U-Net GAN, ResGAN, RDBGAN, RDBU-Net, RDBU-Net GAN networks, respectively. From Fig. 7, we can observe that predicted results of DnCNN network and U-Net network have poor image quality, such as airball in Scotch tape can’t be resolved well, as Fig. 7(b) and Fig. 7(c) show, while RDBU-Net using dense connections performs a strong ability to remove speckle and resolve airball, as Fig. 7 (g) shows. Deep learning networks integrating U-Net and RDBU-Net into GAN-based networks achieve better performances than corresponding separate networks, as shown in Fig. 7 (d) and Fig. 7 (h). ResGAN and RDBGAN have similar performances on Scotch tape OCT images, and both can remove speckle noise well and recover the detail texture almost consistent with ground truth, as Fig. 7(e) and Fig. 7(f) show. However, ResGAN and RDBGAN can not resolve microstructure as clearly as RDBU-Net GAN.

Figure 8 shows the original noisy OCT image and corresponding ground truth of pork meat, and denoised results of seven deep learning networks. Figure 8(a) and Fig. 8(i) are the original noisy OCT image of pork meat and its corresponding ground truth. Figure 8 (b)–(h) are images predicted by DnCNN, U-Net, U-Net GAN, ResGAN, RDBGAN, RDBU-Net, RDBU-Net GAN networks, respectively. From Fig. 8(a), the detailed microstructure cannot be observed because of the speckle noise, while we can observe those microstructures after denoised by deep-learning networks, as shown in Fig. 8 (b)–(h). Figure 8 (b)–(h) have demonstrated that all deep learning networks can effectively remove the speckle noise, and resolve the microstructures at some level. Compared to the ground truth, RDBU-Net GAN has demonstrated the best ability to resolve microstructures and maintain the spatial resolution, and we can clearly observe the fringe details obliterated by speckled in the original noisy OCT image, as Fig. 8(h) shows, which are quite similar to the ground truth. For ResGAN and RDBGAN, the image quality they predicted is similar to ground truth, but the fringe information is not completely consistent with ground truth.

Table 2 demonstrates the denoising performance of the seven networks quantitatively. We calculated metrics by averaging metric values of pork meat and Scotch tape, as Fig. 7 and Fig. 8 show. From the evaluation metrics, all deep learning networks show the ability to remove speckle compared with the original noise image. The results indicate that RDBU-Net GAN has the best performance in PSNR, SSIM, and SNR. The RDBGAN and ResGAN preserve edge texture better than all other networks from the best and second-best EPI value. The large CNR and ENL of U-Net and U-Net GAN indicate that the speckle noise is removed but the denoised image is over-smoothing, while the RDBU-Net GAN avoids the over-smoothing and performance well in resolving microstructure. Meanwhile, From Table 2, it can be quantitatively seen that GAN-based networks, including ResGAN, RDBGAN, U-Net GAN, RDBU-Net GAN, perform better than U-shaped networks and Line-shaped networks in general.

Fig. 9. Imaging results of deep-learning-based speckle-modulating OCT consisting using different deep learning networks. (a) Original noisy image of pork meat obtained from conventional OCT, (b) DnCNN [42] result, (c) U-Net [45] result, (d) U-Net GAN [32] result, (e) ResGAN [41] result, (f) RDBGAN [49] result, (g) RDBU-Net result, (h) RDBU-Net GAN result. The zoom-in image of the patch covered by the orange box is shown at the bottom of the corresponding image. Zoom-in images show that the pork membrane structure has three stripes with very small gaps. RDBU-Net GAN has a powerful capacity of removing speckle and resolving microstructure. Scale bar: $100{\;\ \mathrm{\mu} \mathrm{m}}$

Download Full Size | PDF

3.2.3 Performance comparison between deep-learning-based speckle-modulating OCTs consisting of difference networks

To demonstrate and compare the imaging performances of deep-learning-based speckle-modulating OCT consisting of those deep learning networks respectively, we have also imaged pork meat and Scotch tape using those deep-learning-based speckle-modulating OCT, respectively, as Fig. 9 and Fig. 10 show. Figure 9 and Fig. 10 show the original noisy images of pork meat and scotch tape containing speckle-noise obtained from conventional OCT and their corresponding images from different deep-learning-based speckle-modulating OCT. From Fig. 9 and Fig. 10, we can see that all those deep-learning-based speckle-modulating OCT based on different deep learning networks can remove speckle noise, and RDBU-Net GAN has shown the best performances of speckle removing, spatial-resolution maintaining, and microstructure resolving, which Table 3 also quantitatively demonstrated. From Fig. 9 and Fig. 10 we can see that microstructure information can be clearly observed in RDBU-Net GAN’s image, such as the membrane gap of pork meat in Fig. 9(h) and the blemish gap Scotch tape shown in Fig. 10(h). Table 3 shows their corresponding quantitative performances, which contains time-costing information indicating the processing speed of each network. To be specific, RDBU-Net GAN shows the best results in terms of SNR and CNR value, and the second-best result in ENL, while RDBU-Net achieves the second-best SNR and CNR. The time-costing indicates the averaged time consumed by each model to process a B-scan frame. DnCNN has the fastest processing speed due to its simple structure. RDBU-Net GAN consumes more time than others, but compared with its excellent speckle removal and microstructure resolving ability, it was acceptable.

Fig. 10. Imaging results of deep-learning-based speckle-modulating OCT consisting using different deep learning networks. (a) Original noisy image of pork meat obtained from conventional OCT, (b) DnCNN [42] result, (c) U-Net [45] result, (d) U-Net GAN [32] result, (e) ResGAN [41] result, (f) RDBGAN [49] result, (g) RDBU-Net result, (h) RDBU-Net GAN result. There is a blemish gap Scotch tape in the zoom-in image, and only RDBU-Net GAN can clearly resolve this structure from the image containing mass speckle noise. Scale bar: $100{\;\ \mathrm{\mu} \mathrm{m}}$

Download Full Size | PDF

Table 3. SNR, CNR, ENL of pork meat and Scotch tape images shown in Fig. 9 and Fig. 10 and computing time (CT) of one frame B-scan processed. Deep-learning-based speckle-modulating OCT is working in the actual state, so there is no ground truth, and PSNR and SSIM are not calculated in the table. SNR, CNR, and ENL values are the mean of selected ROI regions. The best results are bold-highlighted and the second-best are in italics.

View Table | View all tables in this article

4. Discussion and conclusion

In this study, a hybrid structure network, RDBU-Net GAN, has been proposed for deep-learning-based speckle modulating OCT, and a comparative study on deep learning networks to extract speckle pattern characteristics and remove speckle noise and improve resolving ability was further conducted. It was the first time that the abilities of deep learning networks to extract speckle pattern characteristics and remove speckle noise and improve resolving ability were studied on an OCT dataset containing mass more-general speckle patterns, rather than limited on the retinal OCT image dataset. Based on more general speckle patterns, this study has analyzed and revealed the factual performance of different deep learning networks to extract speckle pattern characteristics and remove speckle noise, maintain spatial resolution, and resolve microstructures.

Here three types of architectures including seven different networks, i.e., Line-shaped network, U-shaped network, and GAN-based network, have been studied. Results demonstrated our proposed RDBU-Net GAN has the best capability of extracting speckle pattern characteristics and removing speckle, maintaining spatial resolution, and resolving microstructures. In general, the overall speckle removing and microstructure resolving performance of the seven networks can be ranked as RDBU-Net GAN > RDBU-Net > RDBGAN > ResGAN > U-Net GAN > U-Net > DnCNN. For DnCNN’s simple structure and objective function, it is difficult for DnCNN to achieve high performance of speckle removal and resolving in the case of mass speckle noise. For U-Net, since a skip-connection is utilized between the encoding path and the decoding path, it has a better feature extraction ability than DnCNN. Combining U-Net and dense connection which adds more skip-connection and reuses the output features of all previous layers, RDBU-Net can effectively promote the microstructure resolving. The result that RDBU-Net GAN is better than RDBU-Net indicates that deep learning networks integrating RDBU-Net into GAN-based networks achieve better performances than corresponding separate networks. One possible reason is that GAN can extract the high-frequency features of images and obtain the potential data distribution through the contest between the generator and the discriminator to generate images that can fool the discriminator. Moreover, the residual blocks in ResGAN connect the input and output to maintain the information, which is conducive to training deeper networks and extracting features with more semantic information. Because of the added dense connections in RDBGAN that alleviate the vanishing gradient problem, it makes GAN easier to train and lets the microstructure resolved better than ResGAN.

In conclusion, here we proposed a hybrid-structure network based on RDBU-Net and GAN to achieve deep-learning-based speckle-modulating OCT, and conducted a comprehensive comparison of different deep learning networks’ abilities to extract speckle pattern characteristics and remove speckle, maintain spatial resolution, and resolve microstructures. It was the first time that different deep learning networks’ abilities have been fully studied on a dataset containing mass more-general speckle patterns. Results demonstrated that deep-learning-based speckle-modulating OCT using our proposed RDBU-Net GAN has achieved a more excellent performance. This work will be useful for future studies on OCT speckle removing and deep-learning-based speckle-modulating OCT, and improving OCT resolving ability and application values.

Funding

National Natural Science Foundation of China (61905036); China Postdoctoral Science Foundation (2021T140090, 2019M663465); Fundamental Research Funds for the Central Universities(University of Electronic Science and Technology of China) (ZYGX2021J012, ZYGX2021YGCX019).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Ni, X. Ge, L. Liu, J. Zhang, X. Wang, J. Liu, L. Liu, and Y. Liu, “Towards Indicating Human Skin State In Vivo Using Geometry-Dependent Spectroscopic Contrast Imaging,” IEEE Photonics Technol. Lett. 32(12), 697–700 (2020). [CrossRef]

2. G. Ni, J. Zhang, L. Liu, X. Wang, X. Du, J. Liu, and Y. Liu, “Detection and compensation of dispersion mismatch for frequency-domain optical coherence tomography based on A-scan’s spectrogram,” Opt. Express 28(13), 19229–19241 (2020). [CrossRef]

3. X. Ge, S. Chen, S. Chen, and L. Liu, “High Resolution Optical Coherence Tomography,” J. Lightwave Technol. 39(12), 3824–3835 (2021). [CrossRef]

4. W. Drexler and J. G. Fujimoto, “State-of-the-art retinal optical coherence tomography,” Prog. Retinal Eye Res. 27(1), 45–88 (2008). [CrossRef]

5. T. Klein, W. Wieser, L. Reznicek, A. Neubauer, A. Kampik, and R. Huber, “Multi-MHz retinal OCT,” Biomed. Opt. Express 4(10), 1890–1908 (2013). [CrossRef]

6. T. Gambichler, G. Moussa, M. Sand, D. Sand, P. Altmeyer, and K. Hoffmann, “Applications of optical coherence tomography in dermatology,” J. Dermatol. Sci. 40(2), 85–94 (2005). [CrossRef]

7. M. Paulo, J. Sandoval, V. Lennie, J. Dutary, M. Medina, N. Gonzalo, P. Jimenez-Quevedo, J. Escaned, C. Banuelos, R. Hernandez, C. Macaya, and F. Alfonso, “Combined use of OCT and IVUS in spontaneous coronary artery dissection,” J. Am. Coll. Cardiol. Img. 6(7), 830–832 (2013). [CrossRef]

8. C. Lamirel, N. Newman, and V. Biousse, “The use of optical coherence tomography in neurology,” Rev. Neurol. Dis. 6, E105–120 (2009).

9. J. M. Schmitt, S. H. Xiang, and K. M. Yung, “Speckle in optical coherence tomography,” J. Biomed. Opt. 4(1), 95–105 (1999). [CrossRef]

10. V. J. Srinivasan, M. Wojtkowski, A. J. Witkin, J. S. Duker, T. H. Ko, M. Carvalho, J. S. Schuman, A. Kowalczyk, and J. G. Fujimoto, “High-definition and 3-dimensional imaging of macular pathologies with high-speed ultrahigh-resolution optical coherence tomography,” Ophthalmology 113(11), 2054–2065.e3 (2006). [CrossRef]

11. Q. Yan, B. Chen, Y. Hu, J. Cheng, Y. Gong, J. Yang, J. Liu, and Y. Zhao, “Speckle reduction of OCT via super resolution reconstruction and its application on retinal layer segmentation,” Artif. Intell. Med. 106, 101871 (2020). [CrossRef]

12. A. Ozcan, A. Bilenca, A. E. Desjardins, B. E. Bouma, and G. J. Tearney, “Speckle reduction in optical coherence tomography images using digital filtering,” J. Opt. Soc. Am. A 24(7), 1901–1910 (2007). [CrossRef]

13. M. R. N. Avanaki, R. Cernat, P. J. Tadrous, T. Tatla, A. G. Podoleanu, and S. A. Hojjatoleslami, “Spatial Compounding Algorithm for Speckle Reduction of Dynamic Focus OCT Images,” IEEE Photonics Technol. Lett. 25(15), 1439–1442 (2013). [CrossRef]

14. B. F. Kennedy, T. R. Hillman, A. Curatolo, and D. D. Sampson, “Speckle reduction in optical coherence tomography by strain compounding,” Opt. Lett. 35(14), 2445–2447 (2010). [CrossRef]

15. M. Pircher, E. Götzinger, R. Leitgeb, A. F. Fercher, and C. K. Hitzenberger, “Speckle reduction in optical coherence tomography by frequency compounding,” J. Biomed. Opt. 8(3), 565 (2003). [CrossRef]

16. S. Song, Z. Huang, and R. K. Wang, “Tracking mechanical wave propagation within tissue using phase-sensitive optical coherence tomography: motion artifact and its compensation,” J. Biomed. Opt. 18(12), 121505 (2013). [CrossRef]

17. O. Liba, M. D. Lew, E. D. SoRelle, R. Dutta, D. Sen, D. M. Moshfeghi, S. Chu, and A. de la Zerda, “Speckle-modulating optical coherence tomography in living mice and humans,” Nat. Commun. 8(1), 15845 (2017). [CrossRef]

18. G. J. Ughi, T. Adriaenssens, M. Larsson, C. Dubois, P. R. Sinnaeve, M. Coosemans, W. Desmet, and J. D’Hooge, “Automatic three-dimensional registration of intravascular optical coherence tomography images,” J. Biomed. Opt. 17(2), 026005 (2012). [CrossRef]

19. A. Sakamoto, M. Hangai, and N. Yoshimura, “Spectral-Domain Optical Coherence Tomography with Multiple B-Scan Averaging for Enhanced Imaging of Retinal Diseases,” Ophthalmology 115(6), 1071–1078.e7 (2008). [CrossRef]

20. M. R. Hee, J. A. Izatt, E. A. Swanson, D. Huang, J. S. Schuman, C. P. Lin, C. A. Puliafito, and J. G. Fujimoto, “Optical Coherence Tomography of the Human Retina,” Arch. Ophthalmol. 113(3), 325–332 (1995). [CrossRef]

21. S. Chitchian, M. A. Fiddy, and N. M. Fried, “Denoising during optical coherence tomography of the prostate nerves via wavelet shrinkage using dual-tree complex wavelet transform,” J. Biomed. Opt. 14(1), 014031 (2009). [CrossRef]

22. F. Zaki, Y. Wang, H. Su, X. Yuan, and X. Liu, “Noise adaptive wavelet thresholding for speckle noise removal in optical coherence tomography,” Biomed. Opt. Express 8(5), 2720–2731 (2017). [CrossRef]

23. B. Chong and Y.-K. Zhu, “Speckle reduction in optical coherence tomography images of human finger skin by wavelet modified BM3D filter,” Opt. Commun. 291, 461–469 (2013). [CrossRef]

24. L. Fang, S. Li, Q. Nie, J. A. Izatt, C. A. Toth, and S. Farsiu, “Sparsity based denoising of spectral domain optical coherence tomography images,” Biomed. Opt. Express 3(5), 927–942 (2012). [CrossRef]

25. X. Zhang, Z. Li, N. Nan, and X. Wang, “Denoising algorithm of OCT images via sparse representation based on noise estimation and global dictionary,” Opt. Express 30(4), 5788–5802 (2022). [CrossRef]

26. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.

27. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2015), pp. 3431–3440.

28. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, and Z. Wang, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 4681–4690.

29. F. Shi, N. Cai, Y. Gu, D. Hu, Y. Ma, Y. Chen, and X. Chen, “DeSpecNet: a CNN-based method for speckle reduction in retinal optical coherence tomography images,” Phys. Med. Biol. 64(17), 175010 (2019). [CrossRef]

30. Y. Zhou, K. Yu, M. Wang, Y. Ma, Y. Peng, Z. Chen, W. Zhu, F. Shi, and X. Chen, “Speckle Noise Reduction for OCT Images based on Image Style Transfer and Conditional GAN,” IEEE J. Biomed. Health Inform. 26(1), 139–150 (2022). [CrossRef]

31. Z. Chen, Z. Zeng, H. Shen, X. Zheng, P. Dai, and P. Ouyang, “DN-GAN: Denoising generative adversarial networks for speckle noise reduction in optical coherence tomography images,” Biomed. Signal Process. Control 55, 101632 (2020). [CrossRef]

32. H. Cheong, S. Krishna Devalla, T. Chuangsuwanich, T. A. Tun, X. Wang, T. Aung, L. Schmetterer, M. L. Buist, C. Boote, A. H. Thiery, and M. J. A. Girard, “OCT-GAN: single step shadow and noise removal from optical coherence tomography images of the human optic nerve head,” Biomed. Opt. Express 12(3), 1482–1498 (2021). [CrossRef]

33. B. Qiu, Z. Huang, X. Liu, X. Meng, Y. You, G. Liu, K. Yang, A. Maier, Q. Ren, and Y. Lu, “Noise reduction in optical coherence tomography images using a deep neural network with perceptually-sensitive loss function,” Biomed. Opt. Express 11(2), 817–830 (2020). [CrossRef]

34. M. Mehdizadeh, C. MacNish, D. Xiao, D. Alonso-Caneiro, J. Kugelman, and M. Bennamoun, “Deep feature loss to denoise OCT images using deep neural networks,” J. Biomed. Opt., 26 (2021).

35. M. Wang, W. Zhu, K. Yu, Z. Chen, F. Shi, Y. Zhou, Y. Ma, Y. Peng, D. Bao, S. Feng, L. Ye, D. Xiang, and X. Chen, “Semi-Supervised Capsule cGAN for Speckle Noise Reduction in Retinal OCT Images,” IEEE Trans. Med. Imaging. 40(4), 1168–1183 (2021). [CrossRef]

36. Y. Huang, W. Xia, Z. Lu, Y. Liu, H. Chen, J. Zhou, L. Fang, and Y. Zhang, “Noise-Powered Disentangled Representation for Unsupervised Speckle Reduction of Optical Coherence Tomography Images,” IEEE Trans. Med. Imaging 40(10), 2600–2614 (2021). [CrossRef]

37. J. J. Rico-Jimenez, D. Hu, E. M. Tang, I. Oguz, and Y. K. Tao, “Real-time OCT image denoising using a self-fusion neural network,” Biomed. Opt. Express 13(3), 1398–1409 (2022). [CrossRef]

38. Y. Huang, N. Zhang, and Q. Hao, “Real-time noise reduction based on ground truth free deep learning for optical coherence tomography,” Biomed. Opt. Express 12(4), 2027–2040 (2021). [CrossRef]

39. B. Qiu, S. Zeng, X. Meng, Z. Jiang, Y. You, M. Geng, Z. Li, Y. Hu, Z. Huang, C. Zhou, Q. Ren, and Y. Lu, “Comparative study of deep neural networks with unsupervised Noise2Noise strategy for noise reduction of optical coherence tomography images,” J. Biophotonics. 14(11), e202100151 (2021). [CrossRef]

40. Y. Wang, Y. Wang, A. Akansu, K. D. Belfield, B. Hubbi, and X. Liu, “Robust motion tracking based on adaptive speckle decorrelation analysis of OCT signal,” Biomed. Opt. Express 6(11), 4302–4316 (2015). [CrossRef]

41. G. Ni, Y. Chen, R. Wu, X. Wang, M. Zeng, and Y. Liu, “Sm-Net OCT: a deep-learning-based speckle-modulating optical coherence tomography,” Opt. Express 29(16), 25511–25523 (2021). [CrossRef]

42. C. M. Ward, J. Harguess, B. Crabb, and S. Parameswaran, “Image quality assessment for determining efficacy and limitations of Super-Resolution Convolutional Neural Network (SRCNN),” in Proc. SPIE (2017).

43. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241.

44. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). [CrossRef]

45. Y. Lan and X. Zhang, “Real-Time Ultrasound Image Despeckling Using Mixed-Attention Mechanism Based Residual UNet,” IEEE Access 8, 195327–195340 (2020). [CrossRef]

46. H. Dong, J. Pan, L. Xiang, Z. Hu, X. Zhang, F. Wang, and M.-H. Yang, “Multi-scale boosted dehazing network with dense feature fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2020), pp. 2157–2167.

47. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Information Processing Systems (MIT Press, 2014), pp. 2672–2680.

48. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 1125–1134.

49. Y. Huang, Z. Lu, Z. Shao, M. Ran, J. Zhou, L. Fang, and Y. Zhang, “Simultaneous denoising and super-resolution of optical coherence tomography images based on generative adversarial network,” Opt. Express 27(9), 12289–12307 (2019). [CrossRef]

50. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (2014).

51. J. Deng, W. Dong, R. Socher, L. Li, L. Kai, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in Proveedings on IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009), pp. 248–255.

	Learning rate	Iterations	Batch size	Training time (h)	Model size (MB)
DnCNN	$5 \times 10^{- 5}$	350,000	1	22.3	2.3
U-Net	$5 \times 10^{- 5}$	350,000	1	30.2	87.2
U-Net GAN	$5 \times 10^{- 5}$	350,000	1	33.5	152.3
ResGAN	$5 \times 10^{- 5}$	350,000	1	36.2	67.1
RDBGAN	$5 \times 10^{- 5}$	350,000	1	38.5	68.5
RDBU-Net	$5 \times 10^{- 5}$	350,000	1	45.2	725.1
RDBU-Net GAN	$5 \times 10^{- 5}$	350,000	1	48.9	790.2

	PSNR	SSIM	SNR	CNR	ENL	EPI
Noisy image	13.0220	0.1394	20.9451	1.7474	3.1761	-
DnCNN	25.8262	0.6952	36.3411	6.4663	40.6707	0.7205
U-Net	25.9436	0.7094	41.5324	*6.6730*	*57.8671*	0.5430
U-Net GAN	25.6897	0.6951	40.0593	6.9073	62.8276	0.5166
ResGAN	25.3527	0.6314	35.8292	5.8303	27.6485	*0.8970*
RDBGAN	25.0446	0.4901	36.6175	6.0198	28.0320	0.9354
RDBU-Net	*30.4928*	*0.7812*	*42.0475*	5.5795	23.6602	0.7066
RDBU-Net GAN	30.6889	0.7978	42.8343	5.6000	24.1721	0.6978
Ground truth	-	1.0000	37.0575	5.3470	21.5524	1.0000

	SNR	CNR	ENL	CT
Noisy image	22.1317	2.6929	4.7051	-
DnCNN	38.3871	7.8360	81.8282	0.0689
U-Net	43.3606	8.3182	128.3841	0.0911
U-Net GAN	43.6703	8.1137	123.7207	*0.0763*
ResGAN	38.7926	7.4670	79.9525	0.0876
RDBGAN	37.6640	7.4225	70.5996	0.0945
RDBU-Net	*43.9735*	*8.3935*	235.3022	0.1445
RDBU-Net GAN	45.1563	8.7484	*222.5154*	0.1447

	Learning rate	Iterations	Batch size	Training time (h)	Model size (MB)
DnCNN	$5 \times 10^{- 5}$	350,000	1	22.3	2.3
U-Net	$5 \times 10^{- 5}$	350,000	1	30.2	87.2
U-Net GAN	$5 \times 10^{- 5}$	350,000	1	33.5	152.3
ResGAN	$5 \times 10^{- 5}$	350,000	1	36.2	67.1
RDBGAN	$5 \times 10^{- 5}$	350,000	1	38.5	68.5
RDBU-Net	$5 \times 10^{- 5}$	350,000	1	45.2	725.1
RDBU-Net GAN	$5 \times 10^{- 5}$	350,000	1	48.9	790.2

	PSNR	SSIM	SNR	CNR	ENL	EPI
Noisy image	13.0220	0.1394	20.9451	1.7474	3.1761	-
DnCNN	25.8262	0.6952	36.3411	6.4663	40.6707	0.7205
U-Net	25.9436	0.7094	41.5324	*6.6730*	*57.8671*	0.5430
U-Net GAN	25.6897	0.6951	40.0593	6.9073	62.8276	0.5166
ResGAN	25.3527	0.6314	35.8292	5.8303	27.6485	*0.8970*
RDBGAN	25.0446	0.4901	36.6175	6.0198	28.0320	0.9354
RDBU-Net	*30.4928*	*0.7812*	*42.0475*	5.5795	23.6602	0.7066
RDBU-Net GAN	30.6889	0.7978	42.8343	5.6000	24.1721	0.6978
Ground truth	-	1.0000	37.0575	5.3470	21.5524	1.0000

Hybrid-structure network and network comparative study for deep-learning-based speckle-modulating optical coherence tomography

Abstract

1. Introduction

2. Methods

2.1 Speckle-modulating OCT algorithms

2.2 Deep-learning-based speckle-modulating OCT principle

2.3 Network architectures

2.3.1 Line-shaped network

2.3.2 U-shaped network

2.3.3 GAN-based network

2.4 Objective functions

2.5 Evaluation metrics

3. Experiment and results

3.1 Dataset and training details

3.2 Results

3.2.1 Effectiveness of averaging operation and speckle-modulating OCT

3.2.2 Performance comparison between networks on test dataset

3.2.3 Performance comparison between deep-learning-based speckle-modulating OCTs consisting of difference networks

4. Discussion and conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (3)

Equations (13)

Optics Express