MAS-Net OCT: a deep-learning-based speckle-free multiple aperture synthetic optical coherence tomography

Renxiong Wu; Shaoyan Huang; Junming Zhong; Meixuan Li; Fei Zheng; En Bo; Linbo Liu; Yong Liu; Xin Ge; Guangming Ni

doi:10.1364/BOE.483740

1. Introduction

Optical coherence tomography (OCT) is a non-invasive cross-sectional and three-dimensional (3D) imaging technique that is widely used in ophthalmology [1], cardiology [2], tissue development [3], etc. In OCT imaging, the axial resolution and transverse resolution are determined independently: the former is governed by the optical bandwidth of the laser source while the latter is governed by the objective’s numerical aperture (NA) and wavelength [4,5]. Specifically, the transverse resolution is inversely proportional to the NA of the objective lens and the depth of focus (DOF) is proportional to the square of the transverse resolution. The transverse resolution of out-of-focus regions decreases significantly when objective NA is high. Consequently, there is a trade-off between transverse resolution and the DOF in conventional OCT.

Conventional hardware-based approaches such as Bessel beams with axicon optics [6,7] or binary-phase spatial filters [8] decrease the imaging sensitivity and introduce sidelobe artifacts due to a major loss of spatial frequency components. In addition, adaptive optics uses pupil segmentation or active phase modulation of the pupil to extend the DOF by introducing spherical aberrations [9,10]. However, this approach needs a phase-stabilizing setup and a long imaging time. N. Romodina et al. [11] proposed using chromatic dispersion of zinc selenide lens to overcome the DOF limitation because the different spectral components are focused at a different position along the beam axis, but it reduces the axial resolution. Some computational algorithms such as interferometric synthetic microscopy (ISAM), a digital refocusing approach, were used to significantly extends the DOF [12]. While computational algorithms are very time-consuming and require great phase stability.

Recently, some novel techniques have achieved big progress in improving transverse resolution and DOF simultaneously. One is multiple aperture synthetic (MAS) OCT [13,14]. By positioning a microcylindrical lens from the tip of the sample fiber, with a multiple aperture synthesis algorithm, MAS OCT extended the DOF. Deep learning methods show the inspiring capabilities of image processing and have attracted widespread attention, such as image enhancement and reconstruction. Further, generative adversarial network (GAN) in deep learning has been proposed to enhance axial and transverse resolution and verified the feasibility of deep learning methods, but it still struggled to reproduce more realistic speckle noise when fixing OCT images, especially non-ophthalmic OCT images [15,16]. Yuan et al. [17] proposed deep learning-based digital refocusing approach to extend DOF but for en-face OCT image. Huang et al. [18] and Qiu et al. [19] proposed GAN and Noise2Noise to simultaneously denoise and super-resolution, respectively. However, different from the transverse resolution of OCT imaging, they solved the problem of image resolution degradation by downsampling the original images to obtain a low-resolution image. In addition, speckle noise, as the primary factor that reduces the quality of OCT images and restricts potential resolution-enhancement techniques, has been studied for a long time [20–22]. Current deep learning denoising methods needed to obtain noise-clean image pairs or noise image pairs with consistent background, which made them difficult to use in practice [23–25].

In this paper, we proposed a deep-learning-based multiple aperture synthetic OCT with speckle-free, termed MAS-Net OCT, which can reduce speckle noise and extend DOF. MAS OCT system [13,26,27] was built to collect low-resolution images and high-resolution images with multiple aperture synthesis algorithm. Furthermore, we collected B-scans of microparticles, esophagus, urothelium, stomach and lemon samples to generate a large customized dataset. To avoid the adverse effect of speckle noise on transverse resolution improvement, the high-resolution images were despeckled using a self-supervised learning method. The MAS-Net which was trained with pairs of low-resolution images and despeckled high-resolution images was integrated with a conventional OCT setup to obtain the proposed MAS-Net OCT. It was demonstrated by the experimental result of microparticle and biological tissues that the MAS-Net OCT had the potential to significantly optimize DOF and improve transverse resolution without modifying the conventional OCT setup.

2. Related works

2.1 Multiple aperture synthetic optical coherence tomography

Multiple aperture synthetic OCT improves transverse resolution, which is analogous to synthetic aperture radar (SAR). As shown in Fig. 1, we divided the illumination beam in the sample arm by its amplitude and displace them evenly along the radial direction to synthesize the aperture of the objective lens. In the pupil plane, each beamlet creates a sub-aperture with distinct center spatial frequency.

Fig. 1. Schematic of multiple aperture synthetic OCT. SLD: super-luminescent diode; OC1-2: fiber optic coupler; L1-L6: lens; NDF: neutral density filter; RM: reference mirror; BS: beam splitter; GS: galvo scanner; PC: polarization controller; BD: beam displacer; OPE: optical pathlength encoder; G: transmission diffraction grating; IMAQ: image acquisition; C: computer.

Download Full Size | PDF

In the theory of MAS OCT, the original cross-correction term for the $n$-th round-trip path is defined as:

(1)$${I_n}(k )= \sqrt {{I_r}(k ){I_s}(k )} [{\exp ({2ik{z_n}} )+ \textrm{C}\textrm{.C}.} ]$$

where $\textrm{C}\textrm{.C}.$ stands the abbreviation for the complex conjugate, which will be omitted in the following derivation, ${I_r}(k )$ and ${I_s}(k )$. are the electric field reflectively from the reference arm and the sample arm at the depth ${z_n}$, respectively. The MAS algorithm is to digitally correct the aberrations in the optical path difference (OPD) domain.

Correction of aberrations in the OPD domain can be simply achieved by eliminating the phase delay ${\alpha _n}$ of ${I_n}$ with regard to ${I_1}$. The resultant aberration corrected A-line is given by

(2)$$i\left( z \right) = \sum\limits_{n = 1}^m {F\left[ {{I_n}\left( k \right) \cdot \exp \left( {i\alpha _n^{coh}} \right)} \right]} $$

where $F[\cdot ]$ denotes Fourier transformer. The ${\alpha _n}$ achieves the optimal correct phase $\alpha _n^{coh}$ when $\mathop \sum \nolimits_{n = 1}^m F\left[ {{I_n}\left( k \right) \cdot \exp \left( {i\alpha _n^{coh}} \right)} \right]$ achieves its maximum value; here m represents the total path number. While the interferometric signals from the local scatterer are coherently summed, those from the out-of-focus scatterers are suppressed. We found the optimal value for ${\alpha _n}$ when the $\max \left\{ {\left|{\mathop \sum \nolimits_{n = 1}^m F[{{I_n}(k )\cdot \exp ({i\alpha_n^{coh}} )} ]} \right|} \right\}$ achieve its maximum value. Therefore, the digital refocused A-line is given by,

(3)$${i_{re}}(z )= \sum\limits_{n = 1}^m {|{F[{{I_n}(k )\cdot \exp ({i\alpha_n^{incoh}} )} ]} |}$$

2.2 Neighbor2Neighbor and Noisier2Noise

The unsupervised Noise2Noise (N2N) strategy [28,29], which is only trained with noise OCT images captured in the same position has been proposed. Neighbor2Neighbor (NBR) [22,30,31], a self-supervised training method, conducts the noisy pairs generated from single noisy images y by sampling as two noisy observations. The minimize function is defined as:

(4)$${{\mathbb{E}}_{x,y}}{||{{f_\theta }({{g_1}(y )} )- {g_2}(y )} ||^2}, $$

where ${g_1}(y )$ and ${g_2}(y )$ are the noisy pairs generated by a sampler $G = ({{g_1},{g_2}} )$. When the sampler generates two images that are close to each other, the training result is similar to the result trained with clean images supervision.

Another denoising method derived from N2N, Noisier2Noise [32], it adds the same noise distribution to the noise image to synthesize additional noise samples, and only uses the noise images to train the model for denoising. Assume the clean image X and the noisy image $Y = X + N$, where $N \sim \mathrm{\Re }$ and $\mathrm{\Re }$. is the known noise distribution. We create a noisier image: $Z = Y + M = X + N + M$, where $M \sim \mathrm{\Re }$. The model is trained by minimizing:

(5)$${{\mathbb{E}}_Z}||{{f_\theta }(Z )- Y} ||_2^2. $$

Note that M and N are i.i.d. and we have

(6)$$\begin{aligned} 2{\mathbb{E}}[{Y|Z} ]& ={\mathbb{E}}[{X|Z} ]+ {\mathbb{E}}[{N|Z} ]+ {\mathbb{E}}[{X|Z} ]+ {\mathbb{E}}[{N|Z} ]\\ & ={\mathbb{E}}[{X|Z} ]+ {\mathbb{E}}[{X + N + M|Z} ]\\ & ={\mathbb{E}}[{X|Z} ]+ Z. \end{aligned}$$

Therefore, the estimate of the clean image can be recovered by $E[{X|Z} ]= 2E[{Y|Z} ]- Z$, i.e. doubling the network’s output and subtracting its input.

3. Methods

3.1 Overview

In this work, to address the fund.mental problem of limited DOF and transverse resolution, we proposed a system named MAS-Net OCT, as shown in Fig. 2. Here, the high-resolution images (HR) were synthesized by the MAS algorithm introduced in Section 2.1 from B-scans of different aperture captured by the multiple aperture synthetic OCT setup shown in Fig. 1. We obtained the denoised high-resolution image (SHR) by the denoising module and set it as the ground truth for MAS-Net. Then, the pairs of low-resolution images selected from five distinctive apertures and denoised high-resolution images were input into the proposed MAS-Net for training. Finally, the well-trained network model was integrated with the conventional OCT setup to obtain the MAS-Net OCT. In the denoising module, we proposed a process of extracting the speckle pattern with the Neighbor2Neighbor strategy, synthesizing the Noisier image, and then using the Noisier2Noise method to denoise. Generate adversarial network (GAN) which generates better results by adversarial training of generator and discriminator is applied as the structure of MAS-Net.

Fig. 2. Overview of the proposed MAS-Net OCT.

Download Full Size | PDF

3.2 Deep learning networks

As shown in Fig. 3, we proposed the deep learning frameworks of the denoiser module. The noisy image pair $({[{{g_1}(y ),{g_2}(y )} ]} )$ are generated by the sampler from the noisy high-resolution image y and then fed into model (I) to train the N2N model. Our sampler is a random neighbor sub-sampler [31] which generates two sub-images by randomly selecting two neighbor pixels in a $2 \times 2$ block. The model (I) takes ${g_1}(y )$ and ${g_2}(y )$ as input and target, respectively. We utilized a regularization term with ${g_1}({{f_\theta }(y )} )$ and ${g_2}({{f_\theta }(y )} )$ from the output ${f_\theta }(y )$ by sampler. In the second step, with speckle patterns extracted by model (I), the noisier image as training input to model (II) is synthesized by: ${g_1}(y )+ [{{g_2}(y )- {g_2}({{f_\theta }(y )} )} ]$ and the noise image ${g_1}(y )$ is the training target, shown in Fig. 3 (b). During the inference phase, it is theoretically required to add extra speckle noise to the noisy image to be predicted, but this results in an artificially poor view of the noisy image. Here, we fed the noisy image without adding extra noise. It is obvious that the model can still work when the input image which all pixels have a smaller than normal noise magnitude. The output of the model is not a clean image, but halfway between the clean target and the input image. A correction step is performed to obtain an estimate of a clean image, i.e., doubling the model’s output and subtracting its input.

Fig. 3. Frameworks of the proposed denoiser module. (a) is the Neighbor2Neighbor strategy for extracting speckle pattern; (b) is the training phase of Noisier2Noise; (c) is inference phase using trained model (II).

Download Full Size | PDF

The proposed MAS-Net is a multi-scale discriminator GAN structure as shown in Fig. 4. The generator of GAN is model (III) and its goal is to generate images ${F_\theta }(x )$ similar to ground truth ${f_\theta }(y )$ from input image x to fool the discriminator network [33]. In order to reconstruct the detailed texture close to the ground truth, we introduced the content loss [34], which together with the adversarial loss composed the generator loss.

Fig. 4. Frameworks of the MAS-Net.

Download Full Size | PDF

The detailed architecture of model (I), (II), (III) and the discriminator is shown in Figs. 5(a) and 5(b). A variant of U-Net proposed in this work, named RDBU-Net, was applied to both denoising module and MAS-Net. The convolutional layer in traditional U-Net was replaced with RDB block which is composed of residual in residual densely connections [35]. The RDB block was used in encode stage, decode stage, and skip connection for improving the feature extraction and representation capabilities. The $1 \times 1$ convolutional layer was used to expand or compress the number of feature channels. We used three discriminators to enhance the discriminatory aspects of the GAN. The input size of the first discriminator was the same as the output size of generator, and the inputs of the second and third discriminators were down-sampled by a max pooling layer to 1/2 and 1/4 of the original size. Each discriminator was a convolutional neural network composed of eight blocks and two dense layers followed by Leaky ReLU and sigmoid functions. Each block included a $3 \times 3$ convolutional layer followed by a Leaky ReLU.

Fig. 5. The detail network architectures. (a) is the architecture of model (I), (II), (III) in Neighbor2Neighbor, Noisier2Noise and MAS-Net, respectively; (b) is the architecture of the discriminator network in MAS-Net. Conv: convolutional layer; Deconv: deconvolutional layer; Concat: concatenate layer; LR: Leaky ReLU.

Download Full Size | PDF

3.3 Loss functions

For denoiser module, the Neighbor2Neighbor is training a N2N model which feeds into a noisy image pair as input and label. The denoiser tries to minimize the loss function:

(7)$$\begin{aligned} {L_{NBR}} &= ||{{f_\theta }({{g_1}(y )} )- {g_2}(y )} ||_2^2 + \\ &\textrm{ }\gamma ||{{f_\theta }({{g_1}(y )} )- {g_2}(y )- {g_1}({{f_\theta }(y )} )+ {g_2}({{f_\theta }(y )} )} ||_2^2 \end{aligned}$$

where γ is the hyper-parameter. The second term is the regularization term to optimize denoising result when there is a sufficient small non-zero gap between the noisy image pair. The Noisier2Noise is training another N2N model with the loss function:

(8)$${L_{Noisier2Noise}} = ||{{F_\theta }(Z )- {g_1}(y )} ||_2^2$$

In MAS-Net, the loss function is divided into generator loss and discriminator loss. The generator loss is defined as:

(9)$${L_G} = {L_C} + \lambda {L_{adv}}$$

where the $\lambda $ is a hyper-parameter. ${L_C}$ and ${L_{adv}}$ represent content loss and adversarial loss, respectively. Content loss contains the pixel mean square error (MSE) loss and the perceptual loss. Perceptual loss tends to reconstruct fine microstructures, which has a beneficial effect on resolution improvement. The content loss is defined as:

(10)$${L_C} = ||{{f_\theta }(y )- x} ||_2^2 + \beta ||{VG{G_{19}}({{f_\theta }(y )} )- VG{G_{19}}(x )} ||_2^2$$

where $VG{G_{19}}(\cdot )$ represents the extracted feature images from VGG-19 network [36] and $\beta $ is hyper-parameter. We define $G(\cdot )$ is output of generator and ${D_i}(\cdot )$ is output of $i$th discriminator. In this work, adversarial loss is the total loss that three discriminators recognize the data from generator, shown in Eq. (11).

(11)$${L_{adv}} ={-} \sum\nolimits_{i = 1}^3 {\sum\nolimits_{h,w} {\log ({{D_i}({G(x )} )} )} }$$

where $h,{\; \; }w$ are the height and width of input image, respectively. We let the denoised high-resolution image $y^{\prime} = {f_\theta }(y )$ and the multi-scale discriminator loss function is the sum of spatial binary cross-entropy losses, defined as:

(12)$${L_D} ={-} \sum\nolimits_{i = 1}^3 {\sum\nolimits_{h,w} {y^{\prime}\log ({{D_i}({y^{\prime}} )} )} + ({1 - y^{\prime}} )\log ({1 - {D_i}({G(x )} )} )}$$

4. Experiment

4.1 Multiple aperture synthetic OCT and customized dataset

We built the multiple aperture synthetic OCT setup, shown in Fig. 1. Details of the system construction and signal transduction can be found in [27]. We collected B-scans of polystyrene microparticle calibration samples, stomach, urothelium, esophagus and lemon samples by MAS OCT. The B-scan number of polystyrene microparticle calibration samples and biological samples are 200 and 512, respectively. B-scans with different apertures are selected as low-resolution (LR) images according to the imaging quality, and correspond to the HR images synthesized by the MAS algorithm. We cropped images to patches with the size of $256 \times 256$ to generate the MAS-Net dataset and then applied the data argument operations, such as image flip. The number of image patches training dataset and test dataset of each sample is given in Table 1. All HR image patches are in training dataset of denoising network and denoised by trained-well denoising network to obtain the SHR images as new ground truth in the MAS-Net dataset.

Table 1. Dataset details.

View Table

Some low-resolution images and the corresponding multiple aperture synthetic images in our customized dataset are shown in Fig. 6. The low-resolution image is the OCT B-scan collected by one of the apertures of MAS OCT, and the corresponding high-resolution image is obtained using the MAS algorithm.

Fig. 6. Low-resolution images and their corresponding MAS images. (a) and (b) are images cropped from different positions of the same low-resolution image of polystyrene microparticle calibration sample, and (e) and (f) are their corresponding MAS images. (c) and (d) are images cropped from different positions of the same low-resolution image of esophagus sample and (g) and (h) are their corresponding MAS images.

Download Full Size | PDF

4.2 Implementation detail

Microparticle data and biological sample data were used together to train the denoising network. The learning rate and parameters of the optimizer Adam in denoising model were set as $5 \times {10^{ - 5}}$, ${\beta _1} = 0.9$ and ${\beta _2} = 0.999$. The weight $\gamma $ in regularization term was set to 0.5 to denoise better. We trained two MAS-Net models with microparticle data and biological sample data, respectively, and the same network structure and hyperparameters were used to train them. For the denoiser module, we trained one model using HR images of microparticles and biological samples due to they have the same speckle patterns. The learning rate and parameters of the optimizer Adam in MAS-Net models were set as $1 \times {10^{ - 5}}$, ${\beta _1} = 0.9$ and ${\beta _2} = 0.999$. . The hyper-parameter $\lambda $, $\beta $ and $\gamma $ were 0.001, 0.1 and 2, respectively. All networks were trained on an Inte. Xeon Sliver 4210R CPU, an NVIDIA Quadro RTX5000 16 GB GPU and 64 GB of RAM with a Tensorflow 2.5.0-based environment.

4.3 Evaluation

For the sake of quantitatively evaluating the denoising and DOF extension of our proposed approach, four state-of-art evaluation metrics: peak signal-to-noise ratio (PSNR), structure similarity index measure (SSIM). signal-to-noise ratio (SNR) are employed.

PSNR evaluates the quality o.he reconstructed image and is often used for evaluation of denoising and super-resolution tasks. The closer the PSNR value is to 1, the better the quality of the reconstructed image. The formulation is as follows:

(13)$$PSNR = 10\log \left( {\frac{{\max {{(I )}^2}}}{{MSE}}} \right),$$

where $\max (I )$ is the theoretical maximum of the pixel in image I, and $MSE$ is the mean square error between the processed image and reference image.

SSIM describes the structure similarity between the two images. The closer the SSIM to 1, the higher the similarity between the two images. Assuming ${\mu _y}$., ${\mu _{\hat{y}}}$, ${\sigma _y}$, ${\sigma _{\hat{y}}}$ and ${\sigma _{y\hat{y}}}$ are the mean value, the standard deviation, and cross-covariance for reference ${I_y}$ and predicted image ${I_{\hat{y}}}$, respectively, the SSIM is defined as:

(14)$$SSIM({y,\hat{y}} )= \frac{{({2{\mu_y}{\mu_{\hat{y}}} + {C_1}} )\times ({{\sigma_{y\hat{y}}} + {C_2}} )}}{{({\mu_y^2 + \mu_{\hat{y}}^2 + {C_1}} )\times ({\sigma_y^2 + \sigma_{\hat{y}}^2 + {C_2}} )}}.$$

SNR represents the ratio of sigl to noise in the OCT image, as follow:

(15)$$SNR = 10 \times \log {\left( {\frac{{{\mu_r} - {\mu_b}}}{{{\sigma_b}}}} \right)^2},$$

where ${\mu _r}$ is the mean value of selected signal ROIs and ${\mu _b}$, ${\sigma _b}$ is the mean value and the standard deviation of the background. We calculated SNR based on three selected signal regions of interests (ROIs) and one background ROI. These ROIs can be seen in Fig. 7, where the yellow rectangle represents the background ROI and three red rectangles represent signal ROIs.

Fig. 7. Results of speckle noise removal. (a) is the noisy high-resolution image of microparticle and (b) is the corresponding denoising result. (c) is the noisy high-resolution image of esophagus sample and (d) is the corresponding denoising result. The orange and green boxes show the zoomed-in regions on the right. The red and yellow boxes are the signal and background ROIs used to calculate SNR, respectively.

Download Full Size | PDF

5. Result

5.1 Denoise result

We developed a denoising appro.ch that does not require clean images and repeated scans to suppress speckle noise in high-resolution OCT images, providing clean high-resolution ground truth for subsequent training of MAS-Net. An example of denoising is shown in Fig. 7. The two rows of images show the denoising results of high-resolution images of the polystyrene microparticle calibration sample and esophagus sample, respectively.

Figure 7 (a) and (b) are the noisy image and denoising image of microparticle, while Fig. 7 (c) and (d) are the noisy image and denoising image of esophagus sample. From Fig. 7 (b) and the magnified regions marked by the orange and green boxes, we could see that the speckle noise in the image was effectively removed, and the shape and distribution of the microparticle could be clearly observed. By comparing Fig. 7 (c) and (d), it was easier to distinguish the boundaries of the vessel in the esophagus sample after removing the noise, and to see the cellular particles that were largely impossible to observe in large amounts of speckle noise. We used SNR as a quantitative metric to evaluate denoising performance and annotated it in the figure, since there was no reference image. By calculating the SNR, the SNR of the denoised image was significantly higher than that of the original image, indicating that our proposed denoising method had a very effective denoising performance.

5.2 Results of MAS-Net OCT on the microparticle sample

We used a polystyrene microparticle calibration sample image to verify the DOF extension performance of MAS-Net OCT. This sample was conducted by mixing agarose solution and polystyrene microparticles, whose normal diameter is 6 µm. Figure 8 (a) is the B-scan captured from one of five apertures. Figure 8 (b) is the refocused B-scan via MAS algorithm using five B-scans from distinctive apertures and Fig. 8 (c) is the corresponding denoising B-scan. The MAS-Net result is shown in Fig. 8 (d). Magnified boxes and transverse line profiles indicated by the dashed lines are shown in the right part of each image. By observing the magnified regions of orange boxes, the two microparticles in Fig. 8 (a) are stuck together and cannot be resolved due to the very low transverse resolution but they are obviously separated in Fig. 8 (b), (c), (d). Because we denoised the high-resolution image as the ground truth for training MAS-Net, the transverse resolution of the prediction results [Fig. 8 (d)] is improved, and speckle noise is incidentally removed resulting in a clean DOF extended image. The transverse line profiles show a consistent intensity distribution. The DOF extended results [Fig. 8 (d)] has almost the same transverse resolution as ground truth [Fig. 8 (c)] as evidenced by the region indicated by green boxes and their corresponding curves.

Fig. 8. Feasibility results of MAS-Net OCT on polystyrene microparticle calibration sample. (a) is the low-resolution image captured from one aperture of MAS OCT. (b) is the high-resolution image using MAS algorithm. (c) is the denoising high-resolution image using our denoising approach. (d) is the result of MAS-Net. The orange and green boxes show the zoomed-in regions on the right and the graphs are transverse line profiles indicated by the dashed lines in zoomed-in images.

Download Full Size | PDF

Quantitatively, by calculating the full-width-at-half-maximum (FWHM) of the line profile, the denoised high-resolution image [Fig. 8 (c)] has the maximum transverse resolution, with particle diameters of 4.69, 4.91, and 7.85 pixels in the orange and green regions, respectively. These three values are 5.94, 5.22, and 7.93 pixels in Fig. 8 (d), respectively. Some of the calculated values of the FWHM in Fig. 8 (b) are larger than denoised result because of speckle noise. For the low-resolution original image, we consider that the particle diameter in it is larger than 30 pixels, due to the low lateral resolution and the fact that it is no longer possible to distinguish two separate particles. With the denoised ground truth as a reference, the mean value of PSNR of MAS-Net prediction map is 33.8862 with a variance of 0.5459. The mean SSIM is 0.9090 with a variance of 0.0083.

To quantitatively demonstrat. performance of the DOF extension using MAS-Net, we conducted imaging experiments using a polystyrene microparticle calibration sample with a known size and depth. We mixed agarose solution with polystyrene microparticles (No. 64090-15: 6 $\mu m$ diameter, Sigma-Aldrich, St. Louis, Missouri, USA) and cured them to construct the sample. As shown in Fig. 9, the selected imaging depth range was 1000 $\mu m$ and the focal plane was at the depth of 853 $\mu m$. At the focal plane, the finest transverse FWHMs of microparticles in Fig. 9 (a) and Fig. 9 (b) were 9.96 $\mu m$ and 9.79 $\mu m$, thus the transverse resolutions (= transverse FWHM - nominal diameter) were 3.96 $\mu m$ and 3.79 $\mu m$, respectively. However, the transverse resolution of Fig. 9 (a) degraded sharply when leaving the focus range. The maximum transverse FWHM of the microparticle was 41.84 $\mu m$ at a depth of 762 $\mu m$ from the focus, so the transverse resolution deteriorated to 35.84 $\mu m$. In contrast, the averaged transverse FWHM of the microparticle in Fig. 9 (b) was 11.24 $\mu m$, which means that MAS-Net appeared to have the ability to preserve the transverse resolution over the full axial depth.

Fig. 9. Imaging results using microparticle calibration sample with a known size and depth. (a) The noisy low-resolution image, (b) the predicted image by MAS-Net, (c) the transverse FWHMs of microparticles at different depths. The dashed line in (a) is the focus position, and the dashed curve in (c) is the 6-order polynomial-fitting performed on the points.

Download Full Size | PDF

5.3 Results of MAS-Net OCT generalization on biological samples

We used OCT images of fresh lemon sample and lamina cribrosa sample to test the generalization performance of the proposed MAS-Net OCT. Here the OCT images of lamina cribrosa sample is that the model has never seen. As shown in Fig. 10, Fig. 10 (a) shows the image of fresh lemon sample with low transverse resolution acquired by the conventional OCT system, and Fig. 10 (b) shows the image predicted by MAS-Net trained on the biological tissue training dataset. The lemon cell walls in original B-scan were blurred and disturbed by speckle noise, but sharper and finer after MAS-Net and the speckle noise was suppressed. As indicated by the zoomed-in regions and the corresponding transverse line profiles, the width of the cell walls spanned by the orange and green dashed lines were sharped from 21.26 pixels and 16.07 pixels to 12.72 pixels and 8.76 pixels, respectively. As shown in Fig. 11, Fig. 11 (a) is the noisy low-resolution image from one aperture of MAS OCT setup, Fig. 11 (b) is the noisy high-resolution image using MAS algorithm and Fig. 11 (c) is the predicted result by the proposed MAS-Net. Lamina cribrosa is formed by a multilayered network of collagen fibers extended from the scleral canal wall so the width of fibers is an important visual feature. Compared with the original Fig. 11 (a), the fiber width in the orange box of Fig. 11 (c) is obviously reduced and the tissue features indicated by the white arrows in the green box and blue box are reconstructed by MAS-Net.

Fig. 10. Results of MAS-Net's generalization ability on fresh lemon samples. (a) is B-scan obtained from conventional OCT system. (b) is predicted image by MAS-Net. The orange and green boxes show the zoomed-in regions on the right and the graphs are transverse line profiles indicated by the dashed lines in zoomed-in images. The red and yellow boxes are the signal and background ROIs used to calculate SNR, respectively.

Download Full Size | PDF

Fig. 11. Results of MAS-Net's generalization ability on lamina cribrosa sample. (a) is original B-scan captured from one aperture of MAS OCT system. (b) is the high-resolution image using MAS algorithm. (c) is predicted image by MAS-Net.

Download Full Size | PDF

6. Discussion

It is necessary to denoise the images before training MAS-Net, otherwise the complex distribution of scatter patterns will reduce the controllability of training. Here we compared seven approaches including BM3D [37], NLM [38], Noise2Self (N2S) [39], Noise2Void (N2V) [40], Recorrupted-to-Recorrupted (R2R) [41], Neighbor2Neighbor and our method, to demonstrate the effectiveness of our denoising network. The denoising performance of each method is shown in Fig. 12. BM3D, a conventional denoising method, can remove noise well but severely corrupts the detail in the OCT image, resulting in particles in the image that cannot be clearly observed. NLM and R2R remove some speckle noise but damage the image and N2S and N2V leave much noise that affects visual observation. Neighbor2Neighbor retains the detail information but does not effectively remove the speckle noise. Our proposed denoising method not only completely removes the speckle noise but also preserves the detail structure so that the particles and inter-particle gaps are clearly visible, providing high-quality data for the subsequent training of MAS-Net. The quantitative results of each approach are shown in the lower-left corner of the image.

Fig. 12. The noisy image and corresponding denoised results for various methods. (a) The noisy image, (b) BM3D, (c) NLM, (d) N2S, (e) N2V, (f) R2R, (g) Neighbor2Neighbor, (h) Ours. The red and yellow boxes are the signal and background ROIs used to calculate SNR, respectively.

Download Full Size | PDF

To demonstrate the effectiveness of the multi-scale discriminator, we show the comparison results shown in Fig. 13. The particle diameter in Fig. 13 (d) is smaller than that in Fig. 13 (c) and closer to that in Fig. 13 (b). The multi-scale discriminator helps the GAN to recover small details and improve the transverse resolution. From the transverse line profiles, the FWHMs indicated by the orange dashed lines in Figs. 13 (b), (c), and (d) are 4.11, 6.08, and 6.57 pixels, respectively, and the FWHMs indicated by cyan are 5.55, 8.89, and 14.35 pixels.

Fig. 13. Comparison results of single-scale and multi-scale discriminator. (a) is the noisy high-resolution image (cropped from one larger B-scan). (b) is the denoised image from (a). (c) and (d) are the results of single-scale and multi-scale discriminator, respectively. (e) is the transverse line profile curve corresponding to the orange dashed lines in (b), (c), (d), and (f) corresponds to the cyan dashed lines.

Download Full Size | PDF

In order to further illustrate the practicability of MAS-Net OCT, we analyzed the B-scans (cropped from two larger B-scan) of urothelium sample which was captured from one of five apertures of MAS OCT, as shown in Fig. 14 (a) and (e). Figure 14 (b) and (f) are the high-resolution images by MAS algorithm. Figure 14 (c) and (g) are the corresponding denoised high-resolution images by our proposed denoising model. Figure 14 (d) and (h) are the predicted results of MAS-Net. The cellular particles in the blood vessel structure of the urothelium sample are indicated by the yellow arrows in the Fig. 14 (a) and (e). The cell particles have upper and lower reflective surfaces, which are heavily disturbed by noise in Fig. 14 (a) and (e) and the demarcation of the reflective surface is not clear due to the low resolution. This cellular feature becomes clearer in Fig. 14 (b) and (f) but is still indistinguishable from the noise, while in Fig. 14 (c) and (g) it can be observed clearly due to our denoising method. For the predicted results of MAS-Net, the noise suppression effect is very significant, but the limited DOF performance of the physical MAS OCT setup for complex biological samples makes it difficult for deep learning networks that are highly dependent on training data to achieve surprising results. Having said that, MAS-Net's predictions still allow us to observe the reflective surfaces of the cellular particles and their boundaries better than the original image [Fig. 14 (a) and (e)].

Fig. 14. Results of MAS-Net OCT on complex biological sample. (a) and (e) are the original clipped B-scans. (b) and (f) are the high-resolution images using MAS algorithm and (c) and (g) are the corresponding denoising result. (d) and (h) are the predicted result of the proposed MAS-Net.

Download Full Size | PDF

7. Conclusions

In this paper, MAS-Net OCT with speckle-free was proposed to improve transverse resolution and extend the DOF of the OCT system. We proposed a novel self-supervised denoising algorithm to remove speckle noise of the high transverse resolution OCT images from the MAS OCT. Pairs of low-resolution and denoised high-resolution images were fed into the proposed MAS-Net, a GAN constructed by U-Net with RDB blocks and a multi-scale discriminator. The experimental results on homemade microparticle samples and fresh lemon samples demonstrated the excellent denoising performance of the denoising method and the feasibility of MAS-Net in enhancing transverse resolution. Further, the effectiveness of the multi-scale discriminator and the practicability of MAS-Net on urothelium sample was experimentally illustrated. With further optimizing MAS-OCT performance, our future research will focus on the DOF extension in more complex tissues using deep learning methods.

Funding

National Natural Science Foundation of China (61905036); China Postdoctoral Science Foundation (2019M663465, 2021T140090); Medico-Engineering Cooperation Funds from University of Electronic Science and Technology of China (ZYGX2021YGCX019); Fundamental Research Funds for the Central Universities (ZYGX2021J012).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. T. Klein, W. Wieser, L. Reznicek, A. Neubauer, A. Kampik, and R. Huber, “Multi-MHz retinal OCT,” Biomed. Opt. Express 4(10), 1890–1908 (2013). [CrossRef]

2. M. Paulo, J. Sandoval, V. Lennie, J. Dutary, M. Medina, N. Gonzalo, P. Jimenez-Quevedo, J. Escaned, C. Bañuelos, R. Hernandez, C. Macaya, and F. Alfonso, “Combined use of OCT and IVUS in spontaneous coronary artery dissection,” JACC-Cardiovasc. Imag. 6(7), 830–832 (2013). [CrossRef]

3. G. Ni, J. Zhong, X. Gao, R. Wu, W. Wang, X. Wang, Y. Xie, Y. Liu, and J. Mei, “Three-dimensional morphological revealing of human placental villi with common obstetric complications via optical coherence tomography,” Bioeng. Transl. Med. 8(1), e10372 (2023). [CrossRef]

4. X. Ge, S. Chen, S. Chen, and L. Liu, “High resolution optical coherence tomography,” J. Lightwave Technol. 39(12), 3824–3835 (2021). [CrossRef]

5. G. Ni, J. Zhang, L. Liu, X. Wang, X. Du, J. Liu, and Y. Liu, “Detection and compensation of dispersion mismatch for frequency-domain optical coherence tomography based on A-scan’s spectrogram,” Opt. Express 28(13), 19229–19241 (2020). [CrossRef]

6. Z. Ding, H. Ren, Y. Zhao, J. Stuart Nelson, and Z. Chen, “High-resolution optical coherence tomography over a large depth range with an axicon lens,” Opt. Lett. 27(4), 243–245 (2002). [CrossRef]

7. K. Lee and J. P. Rolland, “Bessel beam spectral-domain high-resolution optical coherence tomography with micro-optic axicon providing extended focusing range,” Opt. Lett. 33(15), 1696–1698 (2008). [CrossRef]

8. J. Kim, J. Xing, H. S. Nam, J. W. Song, J. W. Kim, and H. Yoo, “Endoscopic micro-optical coherence tomography with extended depth of focus using a binary phase spatial filter,” Opt. Lett. 42(3), 379–382 (2017). [CrossRef]

9. B. Hermann, E. J. Fernández, A. Unterhuber, H. Sattmann, A. F. Fercher, W. Drexler, P. M. Prieto, and P. Artal, “Adaptive-optics ultrahigh-resolution optical coherence tomography,” Opt. Lett. 29(18), 2142–2144 (2004). [CrossRef]

10. K. Sasaki, K. Kurokawa, S. Makita, and Y. Yasuno, “Extended depth of focus adaptive optics spectral domain optical coherence tomography,” Biomed. Opt. Express 3(10), 2353–2370 (2012). [CrossRef]

11. M. N. Romodina and K. Singh, “Depth of focus extension in optical coherence tomography using ultrahigh chromatic dispersion of zinc selenide,” J. Biophotonics 15(8), e202200051 (2022). [CrossRef]

12. T. S. Ralston, D. L. Marks, P. Scott Carney, and S. A. Boppart, “Real-time interferometric synthetic aperture microscopy,” Opt. Express 16(4), 2555–2569 (2008). [CrossRef]

13. E. Bo, Y. Luo, S. Chen, X. Liu, N. Wang, X. Ge, X. Wang, S. Chen, S. Chen, J. Li, and L. Liu, “Depth-of-focus extension in optical coherence tomography via multiple aperture synthesis,” Optica 4(7), 701–706 (2017). [CrossRef]

14. E. Bo, X. Ge, L. Wang, X. Wu, Y. Luo, S. Chen, S. Chen, H. Liang, G. Ni, X. Yu, and L. Liu, “Multiple aperture synthetic optical coherence tomography for biological tissue imaging,” Opt. Express 26(2), 772–780 (2018). [CrossRef]

15. K. Liang, X. Liu, S. Chen, J. Xie, W. Q. Lee, L. Liu, and H. K. Lee, “Resolution enhancement and realistic speckle recovery with generative adversarial modeling of micro-optical coherence tomography,” Biomed. Opt. Express 11(12), 7236–7252 (2020). [CrossRef]

16. D. Y. Zhuoqun Yuan, H. Pan, and Y. Liang, “Axial super-resolution study for optical coherence tomography images via deep learning,” IEEE Access 8, 204941–204950 (2020). [CrossRef]

17. D. Y. Zhuoqun Yuan, Z. Yang, J. Zhao, and Y. Liang, “Digital refocusing based on deep learning in optical coherence tomography,” Biomed. Opt. Express 13(5), 3005–3020 (2022). [CrossRef]

18. Y. Huang, Z. Lu, Z. Shao, M. Ran, J. Zhou, L. Fang, and Y. Zhang, “Simultaneous denoising and super-resolution of optical coherence tomography images based on generative adversarial network,” Opt. Express 27(9), 12289–12307 (2019). [CrossRef]

19. B. Qiu, Y. You, Z. Huang, X. Meng, Z. Jiang, C. Zhou, G. Liu, K. Yang, Q. Ren, and Y. Lu, “N2NSR-OCT: Simultaneous denoising and super-resolution in optical coherence tomography images using semisupervised deep learning,” J. Biophotonics 14(1), e202000282 (2021). [CrossRef]

20. G. Ni, Y. Chen, R. Wu, X. Wang, M. Zeng, and Y. Liu, “Sm-Net OCT: a deep-learning-based speckle-modulating optical coherence tomography,” Opt. Express 29(16), 25511–25523 (2021). [CrossRef]

21. G. Ni, R. Wu, J. Zhong, Y. Chen, L. Wan, Y. Xie, J. Mei, and Y. Liu, “Hybrid-structure network and network comparative study for deep-learning-based speckle-modulating optical coherence tomography,” Opt. Express 30(11), 18919–18938 (2022). [CrossRef]

22. Q. Zhou, M. Wen, M. Ding, and X. Zhang, “Unsupervised despeckling of optical coherence tomography images by combining cross-scale CNN with an intra-patch and inter-patch based transformer,” Opt. Express 30(11), 18800–18820 (2022). [CrossRef]

23. B. Qiu, S. Zeng, X. Meng, Z. Jiang, Y. You, M. Geng, Z. Li, Y. Hu, Z. Huang, C. Zhou, Q. Ren, and Y. Lu, “Comparative study of deep neural networks with unsupervised Noise2Noise strategy for noise reduction of optical coherence tomography images,” J. Biophotonics 14(11), e202100151 (2021). [CrossRef]

24. N. A. Kande, R. Dakhane, A. Dukkipati, and P. K. Yalavarthy, “SiameseGAN: a generative model for denoising of spectral domain optical coherence tomography images,” IEEE Trans. Med. Imaging 40(1), 180–192 (2021). [CrossRef]

25. Z. Chen, Z. Zeng, H. Shen, X. Zheng, P. Dai, and P. Ouyang, “DN-GAN: Denoising generative adversarial networks for speckle noise reduction in optical coherence tomography images,” Biomed. Signal Process. Control 55, 101632 (2020). [CrossRef]

26. E. Bo, X. Ge, X. Yu, J. Mo, and L. Liu, “Extending axial focus of optical coherence tomography using parallel multiple aperture synthesis,” Appl. Opt. 57(13), 3556–3560 (2018). [CrossRef]

27. E. Bo, X. Ge, Y. Luo, X. Wu, S. Chen, H. Liang, S. Chen, X. Yu, P. Shum, J. Mo, N. Chen, and L. Liu, “Cellular-resolution in vivo tomography in turbid tissue through digital aberration correction,” PhotoniX 1(1), 9 (2020). [CrossRef]

28. Y. Huang, N. Zhang, and Q. Hao, “Real-time noise reduction based on ground truth free deep learning for optical coherence tomography,” Biomed. Opt. Express 12(4), 2027–2040 (2021). [CrossRef]

29. Dufan Wu, Kuang Gong, Kyungsang Kim, Xiang Li, and Quanzheng Li, “Consensus neural network for medical imaging denoising with only noisy training samples,” in Med. Image Comput. Comput. Assist. Interv. – MICCAI 2019, Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, eds. (Springer International Publishing, 2019), pp. 741–749.

30. Tao Huang, Songjiang Li, Xu Jia, Huchuan Lu, and Jianzhuang Liu, “Neighbor2neighbor: Self-supervised denoising from single noisy images,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2021), pp. 14781–14790.

31. T. Huang, S. Li, X. Jia, H. Lu, and J. Liu, “Neighbor2Neighbor: a self-supervised framework for deep image denoising,” IEEE Trans. Image Process. 31, 4023–4038 (2022). [CrossRef]

32. Nick Moran, Dan Schmidt, Yu Zhong, and Patrick Coady, “Noisier2noise: Learning to denoise from unpaired noisy data,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2020), pp. 12061–12069.

33. Xin Ge, Pengfei Yang, Zhao Wu, Chen Luo, Peng Jin, Zhili Wang, Shengxiang Wang, Yongsheng Huang, and Tianye Niu, “Virtual differential phase-contrast and dark-field imaging of X-ray absorption images via deep learning,” Bioeng. Transl. Med., e10494 (2023).

34. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE 2017), pp. 4681–4690.

35. Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Change Loy Chen, “ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks,” in Computer Vision – ECCV 2018 Workshops, Laura Leal-Taixé and Stefan Roth, eds. (Springer International Publishing, Cham, 2019), pp. 63–79.

36. Karen Simonyan and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv, arXiv:1409.1556 (2014). [CrossRef]

37. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process. 16(8), 2080–2095 (2007). [CrossRef]

38. A. Buades, B. Coll, and J.-M. Morel, “Nonlocal image and movie denoising,” Int. J. Comput. Vis. 76(2), 123–139 (2008). [CrossRef]

39. Joshua Batson and Loic Royer, “Noise2self: Blind denoising by self-supervision,” in International Conference on Machine (PMLR 2019), pp. 524–533.

40. Alexander Krull, Tim-Oliver Buchholz, and Florian Jug, “Noise2void-learning denoising from single noisy images,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2019), pp. 2129–2137.

41. Tongyao Pang, Huan Zheng, Yuhui Quan, and Hui Ji, “Recorrupted-to-recorrupted: unsupervised deep learning for image denoising,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE 2021), pp. 2043–2052.

Sample		B-scan size (pixels)	Number of training patches	Number of test images
Microparticle	LR (aperture 1)	$1024 \times 797$	9120	40
Microparticle	HR		9120	40
Stomach	LR (aperture 1)	$1021 \times 511$	18500	60
	LR (aperture 2)	$1021 \times 511 .$	18500	60
	HR	$1021 \times 511$	18500	60
Esophagus	LR (aperture 1)	$1021 \times 531$	20500	60
	LR (aperture 2)	$1021 \times 531$	20500	60
	HR	$1021 \times 531$	20500	60
Urothelium	LR (aperture 1)	$1021 \times 549$	3500	12
Urothelium	HR		3500	12
Lemon	LR (aperture 1)	$1024 \times 359$	900	12
Lemon	HR	$1024 \times 359$	900	12

MAS-Net OCT: a deep-learning-based speckle-free multiple aperture synthetic optical coherence tomography

Abstract

1. Introduction

2. Related works

2.1 Multiple aperture synthetic optical coherence tomography

2.2 Neighbor2Neighbor and Noisier2Noise

3. Methods

3.1 Overview

3.2 Deep learning networks

3.3 Loss functions

4. Experiment

4.1 Multiple aperture synthetic OCT and customized dataset

4.2 Implementation detail

4.3 Evaluation

5. Result

5.1 Denoise result

5.2 Results of MAS-Net OCT on the microparticle sample

5.3 Results of MAS-Net OCT generalization on biological samples

6. Discussion

7. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (14)

Tables (1)

Equations (15)

Biomedical Optics Express