Denoising of pre-beamformed photoacoustic data using generative adversarial networks

Amir Refaee; Amir Refaee; Corey J. Kelly; Corey J. Kelly; Hamid Moradi; Septimiu E. Salcudean

doi:10.1364/BOE.431997

1. Introduction

Photoacoustic imaging (PAI) is a biomedical imaging technique that leverages the advantages of both ultrasound and optical imaging to non-invasively image vasculature and other optically absorbing targets [1,2]. Significant progress both in hardware and data reconstruction techniques have moved this modality from the lab to the clinic [3], where it has shown promise in the detection and staging of cancer [4,5], the detection of inflammatory conditions [6,7], and for intraoperative guidance [8,9].

Photoacoustic signals are inherently weak, with their intensity being proportional to the dose of laser light reaching the imaging target [2]. A fundamental tradeoff exists between maximizing this intensity while minimizing the laser dose to which the patient is exposed [10]. Although different reconstruction schemes including modified delay-and-sum techniques [11], filtered back-projection [12] and iterative approaches [13–15] have been developed to enhance the quality of the reconstructed images, it is still common to average the raw radio-frequency (RF) ultrasound data generated from multiple consecutive laser pulses, reducing stochastic noise and improving the signal-to-noise ratio (SNR) [16,17]. This approach not only increases the scan time proportionally to the number of frames acquired at each imaging position, lowering the frame rate of the system, but it also increases the total laser dose. More specific noise-reduction techniques such as bandpass filtering to the working range of the ultrasound transducer [12], or the removal of laser-induced noise using singular value decomposition (SVD) denoising [18] have also been employed to improve the quality of photoacoustic RF data. In this paper we propose a deep learning based method that is capable of reducing Gaussian background noise similar to averaging multiple acquisition frames, while simultaneously replicating more sophisticated denoising, in particular the SVD denoising method of Hill et al. [18] , which we have previously found to be particularly effective for our data [19].

Generative adversarial networks (GANs) [20] and conditional GANs (cGANs) [21] emerged in 2014, and were quickly adopted as the state-of-the-art deep learning models for a variety of tasks in different fields [22]. For instance, the introduction of cGANs capable of mapping different image domains, such as the Pix2Pix Model of Isola et al. [23], can be used for tasks such as segmentation, grayscale to RGB transformation, and super-resolution reconstruction and rendering. In recent years, there have been numerous studies investigating applications of deep learning in PAI [24–27]. More specifically, GANs have been applied to PAI both for artifact removal and as a substitution for iterative solutions [28,29] where the input and output of the GAN resemble the initial guess and final iterative solution provided by iterative reconstruction algorithms respectively. However, these studies have mainly focused on post-beamforming processing of photoacoustic data for enhancing reconstruction. Since we already have a gold standard (frame averaging and SVD denoising) for our data, a generative supervised learning approach such as a GAN is ideally suited.

In this study, we have developed a GAN-based method using the Pix2Pix model for denoising pre-beamformed RF data that can mimic both the behaviour of multi-frame temporal averaging noise reduction using only one frame of data while also being able to remove sensor-specific artifacts. Since the Pix2Pix model was designed for image-to-image translation tasks involving highly structured graphical outputs, we hypothesized that it might work as an alternative to our SVD denoising, which excels at removing structured noise.

We achieved comparable results both in terms of raw RF data and also in the resultant reconstructed images using the denoised data. To the best of our knowledge, this study is the first to use GANs as a pre-processing step in photoacoustic imaging to enhance the quality of the raw RF data and consequently the resultant reconstructed images.

2. Methods

In the following sections, scalars are represented by lower-case letters, data matrices by bold lower-case letters, operators by upper-case italic letters. We begin by describing the photoacoustic model and defining our reference dataset. We then propose our method of GAN denoising, as well as our reconstruction methods and metrics used for the quantitative assessment. Finally we describe our data acquisition hardware and experimental parameters.

2.1 Photoacoustic model

The photoacoustic effect describes the generation of propagating acoustic waves by a temporally short pulse of laser light. These waves can be detected by standard ultrasound transducers, and the received signals can be processed to reconstruct the initial pressure distribution $p_0$ which, under the assumption of an acoustically homogeneous sample and spatially uniform illumination, will be proportional to the optical absorption in the sample [2]. When the measured time-varying pressure $p_D \left ( \mathbf {r_{D}}, t \right )$ is acquired at multiple positions $\mathbf {r_{D}}$ in 3D space and $p_{0}( \mathbf {r} )$ is reconstructed at sample points $\mathbf {r}$, this is referred to as PAI reconstruction [12]. In imaging systems, the discretized versions of $p_{0}( \mathbf {r} )$ and $p_D \left ( \mathbf {r_{D}}, t \right )$ are denoted by $\mathbf {p_0}$ and $\mathbf {p_D}$, respectively.

2.2 Reference RF dataset

At each imaging location across a volume, a frame of RF data is acquired using a sensor with the general shape of $(m, n)$ where $m$ is the number of detection elements of the sensor and $n$ is the number of time samples taken, which correspond to how far away is the source from the detection element. This noisy frame of RF data is denoted by $\mathbf {p_{D,noisy}}$. We pre-process these frames by first performing temporal averaging at each location where $n_f$ frames ($n_f=20$ in our example) are averaged together denoted by $\mathbf {p_{D,n_f-avg}}$. We then perform SVD denoising to remove cross-channel noise bands from the data [18,19]. These steps will result in a reference frame for the experiments performed in this paper, which we will denote by $\mathbf {p_{D,ref}}$. Additionally, SVD denoising applied to a single noisy frame of RF data (as opposed to a temporal average of frames) will be denoted by $\mathbf {p_{D,SVD}}$.

For example, a frame of our RF data contains $n=1792$ time samples across $m=384$ elements of the transducer, where for each 128-element, 1792-sample segment of a frame of RF data, we construct a $128 \times 1792$ matrix and compute its SVD (our transducer array of 384 elements is multiplexed into 128 channels; three acquisitions are performed at each transducer location). By isolating the first $k$ singular value components and reverting to the original representation, we are left with only the noise, which dominates those $k$ singular values. We subtract this noise from the original matrix, resulting in the denoised data. This process is illustrated in Fig. 1. An example frame pair of $\mathbf {p_{D,noisy}}$ and $\mathbf {p_{D,ref}}$ can also be seen in Fig. 2.

In SVD denoising, as the $k$ singular values increase, lower-frequency structures are suppressed at the expense of an increased level of background noise, which can be observed in Fig. 3. To strike the best balance between removing the noise bands and preserving the photoacoustic signals, the data in the present study was denoised using $k=15$, obtained empirically.

Fig. 1. Preparation of a reference frame. After imaging one location for $n_f$ times and performing temporal averaging, the resultant frame is then denoised using SVD denoising. We also denoise each frame with the SVD approach.

Download Full Size | PDF

Fig. 2. RF data training frame pair for the GAN. These $384\times 1792$ frames are divided into $128\times 128$ patches to be used with the GAN. The reference frame is the result of averaging 20 frames followed by SVD denoising. The line plot in each frame is an A-line profile of the frame at the location of the dashed white line.

Download Full Size | PDF

Fig. 3. Example results of SVD denoising, illustrating the suppression of signals as the number of discarded singular values, k, increases.

Download Full Size | PDF

2.3 GAN denoising

We use the Pix2Pix Model [23] which trains a cGAN to learn a mapping from an input image and random noise vector $\mathbf {z}$, to an output image. Due to the size of $\mathbf {p_{D,noisy}}$ and $\mathbf {p_{D,ref}}$ in our example $(384, 1792)$, we cannot use these frames at once as an input/output pair and therefore we need to divide them into smaller patches denoted by $\mathbf {{}^ {\boldsymbol i}{}p_{D,noisy}}$ and $ {{}^ {\boldsymbol i}{}{\textbf p}_{\textbf {D,ref}}}$ with $i$ indicating the location of the patch within each frame (see Fig. 4 for an example). Letting $G(\mathbf {.},\mathbf {.})$ and $D(\mathbf {.},\mathbf {.})$ represent the outputs of the generator and discriminator respectively, the training objective of the model presented in [23] will be

(1)$$\begin{aligned} L_{cGAN} \left( G, D \right) = \mathbb{E}_{{}^{i}\mathbf{p}_\mathbf{D,noisy},{}^{i}\mathbf{p}_\mathbf{D,ref}} \left[ \log D\left( {}^{i}\mathbf{p}_\mathbf{D,noisy}, {}^{i}\mathbf{p}_\mathbf{D,ref} \right)\right] \\ + \mathbb{E}_{{}^{i}\mathbf{p}_\mathbf{D,noisy},\mathbf{z}} \left[ \log \left( 1 - D\left( {}^{i}\mathbf{p}_\mathbf{D,noisy}, G\left( {}^{i}\mathbf{p}_\mathbf{D,noisy}, \mathbf{z} \right) \right) \right) \right] \end{aligned}$$

(2)$$ L_{L1} \left(G, D\right) = \mathbb{E}_{{}^{i}\mathbf{p}_\mathbf{D,noisy},{{}^{i}\mathbf{p}_\mathbf{D,ref}},\mathbf{z}} \left[ \left\lVert{{}^{i}\mathbf{p}_\mathbf{D,ref} - G\left( {}^{i}\mathbf{p}_\mathbf{D,noisy}, \mathbf{z} \right)}\right\rVert_{1} \right] $$

(3)$$ G^{*} = \mathrm{arg}\, \underset{G}{\mathrm{min}}\, \underset{D}{\mathrm{max}}\, \{L_{cGAN}\left( G, D \right) + \lambda_1 L_{L1} \left(G, D \right)\}$$

where $L_{cGAN}$ denotes the cGAN loss, $L_{L1}$ the L1 distance norm, and $\mathbb {E}$ the expectation values [23]. Combining Eqs. (1) and (3) results in $G*$ which is the cGAN loss where the generator tries to minimize the objective and the discriminator tries to maximize it, in addition to a sparsifying distance which encourages a more focused image controlled by the parameter $\lambda _1$.

Fig. 4. RF data training patch pair for the GAN. In our example, these $128\times 128$ patches are extracted from $384\times 1792$ RF frames. The patches have been absolute-valued for display only, to increase visibility of the signals. The reference patches are the result of averaging 20 frames followed by SVD denoising.

Download Full Size | PDF

In training the Pix2Pix model, the loss function does not provide a descriptive source for determining the quality of training [23], therefore in addition to having a training dataset and a testing dataset, we also utilized a validation dataset to tune the parameters of the model, after each training trial. While this requires performing an additional experiment, it ensures that the model training is completely blind to the validation data, unlike cross-validation methods such as k-fold or Monte Carlo. We found that we achieved optimal performance using our validation dataset with a patch size of $(128,128)$, without overlap, a batch size of $n_b=5$, a learning rate of $\alpha =0.0002$, and a regularizer $\lambda _1=100$. We used the Adam optimizer with the model weights being initialized randomly from a Gaussian distribution with mean 0 and standard deviation 0.02 as recommended in [23]. As seen from Eq. (4), training occurs with the generator learning to output images that are closer to the reference, while the the discriminator learns to distinguish between real images and fake images from the generator. This training process is outlined in Fig. 6. Our choice of hyper-parameters and input/output image sizes will result in the GAN architecture illustrated in Fig. 5 where the generator is a U-Net architecture [30] and the discriminator is a PatchGAN classifier [23]. Our model was trained using a Tesla V100 GPU on an NVIDIA DGX-1 system.

Fig. 5. GAN architecture portraying the dimensions of the hidden layers fusing 128 × 128 patches. If a different patch size is used, the size of the layers will be adjusted accordingly. Each output of the generator is subsequently used as an input to the discriminator which has to determine if it is a real patch or a fake patch produced by the generator.

Download Full Size | PDF

Fig. 6. The training process of the GAN involves A the generator learning to output more realistic data and B the discriminator learning to distinguish between fake and real images. The combined objective function $L_{cGAN}+ \lambda _1 L_{L1}$ is back-propagated through both networks updating their respective weights.

Download Full Size | PDF

Performing inference on our unseen testing data using the GAN is similar and requires the frames to be first divided into sub-patches, denoised, and subsequently put back together as outlined in Fig. 7. A frame of RF data denoised using the GAN will be denoted by $\mathbf {p_{D,GAN}}$. During inference, only the generator is used to output denoised patches using noisy input patches, while the discriminator is left unused.

Fig. 7. Once the noisy RF data $\mathbf {p_{D,noisy}}$ is acquired using the imaging system, each frame is divided into $\mathbf {{}^ {\boldsymbol i}{}p_{D,noisy}}$ sub-patches ($128\times 128$ in our examples). These patches are then denoised using the trained generator model making $\mathbf {{}^ {\boldsymbol i}{}p_{D,GAN}}$ which are combined into a denoised $\mathbf {p_{D,GAN}}$ frame subsequently used by the reconstruction algorithms to approximate the initial pressure distribution $\mathbf {p_{0}}$.

Download Full Size | PDF

2.4 Reconstructions

Improving the pre-processing and denoising of RF data ultimately serves to improve the quality of reconstructed photoacoustic images. We therefore need to assess the reconstructions that use our GAN-denoised RF dataset compared to reconstructions using temporal averaging and SVD denoising.

2.4.1 Filtered back-projection

In this study, we use the model of Xu et al. [12] for spherical scanning geometry, which relates $p_{0}( \mathbf {r} )$ to $p_D \left ( \mathbf {r_{D}}, t \right )$ in a backprojection form as

(4)$$p_{0}( \mathbf{r} ) = \frac{2}{\Omega_{0}} \int\nolimits_{\Omega_{0}} d \Omega_{0} \left[ p_{D} \left( \mathbf{r_{D}}, t \right) - t \frac{\partial p_{D} \left( \mathbf{r_{D}}, t \right)}{\partial t} \right]_{t = \frac{\lvert \mathbf{r} - \mathbf{r_{D}} \rvert}{v_s}}$$

where $v_s$ is the acoustic speed in the sample, $\Omega _{0}$ is the solid angle of the surface containing the detection points $\mathbf {r_{D}}$, and $d \Omega _{0}$ is the solid angle of the surface element at a location $\mathbf {r_{D}}$ relative to a sample point $\mathbf {r}$. While this form is exact only in the case in which this surface completely encloses the sample ($\Omega _{0} = 4 \pi$), the $\frac {d \Omega _{0}}{\Omega _{0}}$ term serves as a weight to mitigate the effects of the well-known partial view problem [12]. We have previously described our reconstruction scheme in detail [19] but we note that in the present study we assume spatially uniform illumination. We refer to this single-shot reconstruction as filtered backprojection (FBP) in the later sections.

2.4.2 Fast iterative shrinkage thresholding algorithm

To implement the iterative reconstruction algorithm, we also need a forward model describing the generated $p_D$ as a function of $p_0$. If we define a matrix $A^{*}$ which describes the action of Eq. (5) on a discrete sampling of $p_D$ in vector form $\mathbf {p_D}$ (the RF data) as

(5)$$A^{*}\mathbf{p_D} = \mathbf{p_0}$$

where $\mathbf {p_0}$ is a vector of initial pressure values in the sample (the image), then we can also define a matched [13,31] forward (adjoint) operator $A$ such that

(6)$$A\mathbf{p_0} = \mathbf{p_D}$$

where $A^{*} = A^{T}$ since $A$ contains only real-valued elements [13]. We will refer to $A$ and $A^{T}$ as the projection and backprojection operators, respectively. Iterative frameworks for solving the acoustic inverse problem in PAI have been well studied in the past [14]. In this project, we have chosen the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [32] with a Total Variational (TV) regularizer [33] which is used widely in the field of PAI [13,15]. As we are dealing with the limited-view problem [12], we can never fully compute $\mathbf {p_{0}}$, therefore formulating PAI as an inverse problem and following the steps outlined in [13,15,32,33] with our best possible estimation to $\mathbf {p_{0}}$ being denoted by $\mathbf {\hat {p}_{0}}$, the minimization objective will be

(7)$$\underset{\mathbf{\hat{p}_0}}{\mathrm{min}}\{ F\left( \mathbf{\hat{p}_0} \right) = \left\lVert{ A \mathbf{\hat{p}_0} - \mathbf{p_D} }\right\rVert^{2} + \lambda_2 \left\lVert{\mathbf{\hat{p}_0}}\right\rVert_{TV}\}$$

In Algorithm 1 we are using the monotone version of FISTA (MFISTA) outlined in [33] to prevent divergence of the cost function $F$ due to the lack of an exact solution for the TV denoising problem [33]. We iteratively update $\mathbf {\hat {p}_{0,j}}$, the $j$-th guess, from our initial guess of $\mathbf {\hat {p}_{0,0}}$ being the FBP reconstruction to converge to $\mathbf {\hat {p}_{0}}$. $\mathrm {TV}_{2\lambda _2/L_j}$ is a solution to the TV denoising problem, implemented using the method of Chambolle [34], and $L_j$ is the Lipschitz constant, used as the denoising weight found using backtracking line search [33]. Note that $\mathbf {p_D}$ in Algorithm 1 may be any of the RF datasets explained, i.e. Noisy, Reference, GAN Denoised, and SVD Denoised. In summary, as the algorithm progresses, $\mathbf {\hat {p}_{0,j}}$ converges to $\mathbf {\hat {p}_0}$, a TV-regularized solution to Eq. (6).

Algorithm 1. Monotone FISTA

View Table | View all tables in this article

2.5 Metrics

In this section we present the metrics we use to both assess the result of the GAN denoising directly using the RF data, as well as indirectly by inspecting the resultant reconstructions. In the formulas presented in this section, $\mathbf {y}$ refers to our ground-truth and expected output and $\mathbf {\hat {y}}$ refers to our estimated output. Due to the dual purpose of our assessment, $\mathbf {y}$ and $\mathbf {\hat {y}}$ can be substituted in by both the RF data and the reconstructed images in the formulas below. We begin by using the traditional mean squared error

(8)$$MSE = \frac{1}{n_p}\sum_{i=1}^{n_p}(y_i - \hat{y_i})^{2}$$

where $n_p$ is total number of pixels in an image and $y_i$ and $\hat {y}_i$ the value of the pixel $i$ in the expected and estimated images respectively.

The second metric we use to quantify the performance is the structural similarity index measurement (SSIM) [35]

(9)$$SSIM(\mathbf{y}, \mathbf{\hat{y}}) = \frac{(2\mu_\mathbf{y} \mu_\mathbf{\hat{y}}+c_1)(2\sigma_{\mathbf{y\hat{y}}}+c_2)}{(\mu_\mathbf{y}^{2}+\mu_\mathbf{\hat{y}}^{2}+c_1)(\sigma_\mathbf{y}^{2}+\sigma_\mathbf{\hat{y}}^{2}+c_2)}$$

where $\mu$ is the mean intensity and $\sigma$ the standard deviation of the signal, and $\sigma _{\mathbf {y\hat {y}}}$ is the correlation coefficient of $\mathbf {y}$ and $\mathbf {\hat {y}}$

(10)$$\sigma_{\mathbf{y\hat{y}}} = \frac{1}{n_p-1}\sum_{i=1}^{n_p}(y_i-\mu_\mathbf{y})(\hat{y}_i-\mu_\mathbf{\hat{y}})$$

The constant terms $c_1$ and $c_2$ are used to avoid ill-defined values in Eq. (10); we use the same values presented in [35].

The final metric used for assessing the quality of the denoised RF data directly and indirectly is the feature similarity index measurement (FSIM) [36,37] which assesses the similarity of the low-level features in the images similar to the features the human visual system uses. FSIM is defined as

(11)$$FSIM = \frac{\sum_{\mathbf{x}\in\Omega}S_L(\mathbf{x})\cdot PC_m(\mathbf{x})}{\sum_{\mathbf{x}\in\Omega}PC_m(\mathbf{x})}$$

where the vector $\mathbf {x}$ represents the spatial domain of the the images $\mathbf {y}$ and $\mathbf {\hat {y}}$, and $\Omega$ is the entire spatial domain. $PC_m(\mathbf {x})$ is the maximum phase congruency [38]—a dimensionless quantity providing an absolute measure of significance of feature points, between the two images. $S_L(\mathbf {x})$ is a similarity measure between the two images calculated using

(12)$$S_L(\mathbf{x}) = S_{PC}(\mathbf{x})\cdot S_G(\mathbf{x})$$

where

(13)$$S_{PC}(\mathbf{x}) = \frac{2PC_1(\mathbf{x}) \cdot PC_2(\mathbf{x})+s_1}{PC_1(\mathbf{x})^{2}+PC_2^{2}(\mathbf{x})+s_1}$$

(14)$$S_G(\mathbf{x}) = \frac{2G_1(\mathbf{x}) \cdot G_2(\mathbf{x})+s_2}{G_1^{2}(\mathbf{x})+G_2^{2}(\mathbf{x})+s_2}$$

with $G_i(\mathbf {x})$ being the gradient magnitude and $PC_i(\mathbf {x})$ the phase congruency of image $i$ in domain x. $s_1$ and $s_2$ are constants to avoid ill-defined values of $S_{PC} ( \mathbf {x} )$ and $S_G ( \mathbf {x} )$ which we chose the same values used in [36,37].

2.6 Data acquisition

All RF data were acquired using the SonixEmbrace Automated Breast Ultrasound Scanner (ABUS - Ultrasonix Medical Corporation, Richmond, BC, Canada). This scanner consists of a 384-element transducer with −12 cm radius of curvature, 10 MHz centre frequency, and 90 % bandwidth. The transducer is embedded in a spherical dome attached to a motor, which rotates through 360° to collect volumetric data. A SonixDAQ module (DAQ - BK Medical, Peabody, MA) was used to acquire pre-beamformed RF data at 40 MHz. Our illumination source consists of a Continuum Surelite II laser (Continuum, Santa Clara, CA) pumping an optical parametric oscillator (OPO) from the same manufacturer. The OPO output is homogenized [39] and coupled into a 1 mm silica-core optical fiber, which is coupled to a custom illuminator designed to deliver a fan-shaped beam of diffuse illumination to the sample surface through a window parallel to the ABUS transducer [19]. Synchronization of the RF data acquisition with the laser illumination was accomplished using a custom Arduino-based circuit controlled over USB. With the data download from the SonixDAQ to the PC being the rate-limiting step, we are able to acquire one frame per transducer position at 50 equally spaced positions in about 20 minutes.

Our training data was acquired from a custom, modular wire phantom [19], consisting of black spray-painted monofilament fishing line suspended between attachment points on a 3D printed template as seen in Fig. 8(a). To validate our GAN, we imaged a commercial photoacoustic phantom containing optically absorbing spheres purchased from Computerized Imaging Reference Systems Incorporated (CIRS - Norfolk, VA). For testing and quantification, we used the same black fishing line described above, tied into a small figure-of-eight knot, as shown in Fig. 8(b), which provides several features at various angles within a small imaging volume.

Fig. 8. Custom phantoms used for training and testing the GAN.

Download Full Size | PDF

To test how the model performs on out-of-distribution data, we also developed an anatomically realistic 3D printed vessel phantom using labelled magnetic resonance angiography patient data made publicly available by Lou et al. [40]. Beginning with dataset “Neg_35_Left”, we extracted voxels labelled as blood vessels. We added a rectangular base for the vessels to attach to, and discarded any vessel segments which were not attached to either the base or another vessel, as these would be unsupported in the final print. Finally, we converted the voxel data to a surface mesh which was exported to STL format for 3D printing. The phantom was printed on a Form 2 3D printer (FormLabs, Somerville, MA) with a layer height of 0.1 mm, using “Tough 2000” resin from the same manufacturer. Finally, the print was cleaned up and spray painted black. While some of the smallest vessels were not faithfully reproduced, and others still were lost during the cleanup process, the phantom still provides realistic geometry, and vessels ranging in diameter from 0.2 mm (the voxel size in the original dataset) up to 5 mm. The numerical data, as well as a photo of the final phantom are shown in Fig. 8(c).

The training phantom was imaged at 100 imaging locations, the validation phantom at 10, and the two different testing phantoms each at 50 equally spaced imaging locations in the ABUS dome. Each location was imaged 20 times in all three cases. Since we train our GAN using $(128,128)$ patches, this corresponds to 4300, 430, and 2150 patch pairs for the training, validation and hyperparameter optimization, and testing datasets, respectively.

3. Results

In this section, we first provide an example of a training trial for the model. We then provide our results by first assessing the pre-beamformed RF data followed by the resulting reconstructions.

3.1 Model hyperparameter tuning

As mentioned previously, in order to train our GAN model we used a validation dataset to tune the different hyper-parameters using the CIRS phantom described in 2.6. Figure 9 is an example of this process where we chose the optimal number of training epochs. Ideally we would want the minimum MSE value and the maximum FSIM and SSIM to occur at the same epoch, however, Fig. 9 suggests that considering all three metrics, epoch 820 (dashed red line) provides a good balance of a low MSE value alongside high SSIM and FSIM values which are desirable for our task. This is due to the fact that while low MSE corresponds to the removal of the background noise, SSIM and FSIM correspond to the fidelity of the output signals’ shape compared to the true signals. These 820 training epochs took an average of 28 seconds each, for a total training time of about 6.5 hours.

Fig. 9. Metrics for the validation dataset. The dashed red line indicates the epoch chosen to be applied to our test dataset. Although this epoch is not at the maximum or minimum value of any of the metrics, it achieves an appropriate balance between the three. Note that GAN optimum performance is not necessarily monotone with the number of epoch, and this is known in the literature [41].

Download Full Size | PDF

3.2 RF data

In applying our metrics, we compared our reference RF dataset to the other possible approaches of only frame-averaging, only SVD denoising, and finally GAN denoising. The results are the mean values from 50 imaging locations along the ABUS dome. Additionally, we have also added a 10-frame averaged case for comparison purposes to the 20 frames averaged case. These results are summarized in Table 1 for the knot phantom, and in Table 2 for the vessel phantom. Figure 10 and Fig. 11 illustrates sample denoised frames of the knot and vessel phantom, respectively, using the different approaches for one of the aforementioned 50 imaging locations. We note that increasing the number of averaged frames beyond 20, we found diminishing returns with respect to our quality metrics which did not justify the proportional increase in scan time. A plot of this relationship is included in Figure S1.

Fig. 10. Testing data showing the denoising techniques considered applied to the knot phantom. The blue plot in each frame is an A-line profile of the frame at the location of the dashed white line. The absolute values of the RF data are displayed. The A-lines portray the real range of the data. The green arrow points to the signal at sample 400 for comparison. The red arrow points to the straight line artifact in denoising cases where it has not been removed. Note that the SVD denoised and GAN denoised are single-frame results without temporal averaging.

Download Full Size | PDF

Fig. 11. Testing data showing the denoising techniques considered applied to the vessel phantom. The blue plot in each frame is an A-line profile of the frame at the location of the dashed white line. The absolute values of the RF data are displayed. The A-lines portray the real range of the data. Note that the SVD denoised and GAN denoised are single-frame results without temporal averaging.

Download Full Size | PDF

Table 1. Comparison of denoising methods for the knot phantom RF data. Each row represents the mean of 50 imaging locations around the ABUS dome. Best results are bolded.

View Table | View all tables in this article

Table 2. Comparison of denoising methods for the vessel phantom RF data. Each row represents the mean of 50 imaging locations around the ABUS dome. Best results are bolded.

View Table | View all tables in this article

3.3 Reconstruction using the RF data

Table 3 summarizes the reconstruction results of the knot phantom using the different RF data available with reconstructions resultant from the reference RF dataset as ground-truth. The 2D reconstructions are in the axial/lateral plane of the ABUS transducer at a resolution of 0.1 mm. We achieved convergent FISTA reconstructions with the regularizer parameter $\lambda _2=0.01$ after 20 iterations of Algorithm 1. By examining the minimization objective as a function of iteration number, we can see that there is little improvement beyond 20 iterations. An example of this relationship is included in Figure S2. As before, these results are the mean of 50 imaging locations with Figs. 12 and 13 providing examples for FBP and iterative reconstructions at one of these imaging locations.

Fig. 12. 2D FBP reconstruction of the knot phantom data after the RF data has been processed using different denoising approaches. The line plot in each image is a profile plot along the dashed white line. Reference refers to reconstructions using the reference RF dataset.

Download Full Size | PDF

Fig. 13. 2D FISTA reconstruction of the knot phantom data after the RF data has been processed using different denoising approaches. The line plot in each image is a profile plot along the dashed white line. Reference refers to reconstructions using the reference RF dataset.

Download Full Size | PDF

Table 3. Comparison of denoising methods and their effects on the reconstructed images of the knot phantom. Each row represents the mean of 50 imaging locations around the ABUS dome. Best results are bolded.

View Table | View all tables in this article

4. Discussion

For the knot phantom, as shown in Table 1, the GAN outperforms all of the other denoising cases except in its SSIM value which is slightly below that of 20 frames averaged. This is mainly due to the GAN being trained on the SVD and 20 frames averaged combination which is the theoretical upper limit of its performance. In Table 3, we can see that the reconstructions of the knot phantom data denoised using the GAN outperform all of the other cases, both in FBP and FISTA. This is significant since all of these performance benefits come at a fraction of a time of the frame averaged cases due to the GAN needing only a single frame of data per imaging location. These results are consistent with characteristics of the metrics used. While MSE mainly focuses on the absolute error between images, SSIM and FSIM focus on the structural differences and similarities [42].

Additionally, if we take a closer look at Fig. 10, we can see that the GAN and SVD are the only two cases that have managed to remove the straight line noise band visible at sample 550 (indicated in the A-lines in Fig. 10 with a red arrow). However, the GAN displays less background noise which is consistent with the results of Table 1, one such example being the signal at sample 400 (indicated in the A-lines in Fig. 10 with a green arrow) which is strongest in the GAN output. Once again this is due to the fact that the GAN learns the optimal behaviour from both frame averaging and SVD denoising since it has been trained on their combination. Similar behaviour is also seen in the reconstruction results in Fig. 12 and Fig. 13. The effects of the leftover noise bands can clearly be seen in the FBP($\mathbf {p_{D,10-avg}}$), FBP($\mathbf {p_{D,20-avg}}$), FISTA($\mathbf {p_{D,10-avg}}$), and FISTA($\mathbf {p_{D,10-avg}}$) cases as circular arcs, while the increased background noise can be seen in the FBP($\mathbf {p_{D,SVD}}$) and FISTA($\mathbf {p_{D,SVD}}$) cases.

Despite being significantly different from the training data, we see that the GAN performed similarly well on the vessel phantom data. Qualitatively, we see again that the GAN denoising removed the worst noise bands from the RF data, as is clear in the A-line plots in Fig. 11. Table 2 shows that the GAN performed best with respect to FSIM, and very near to the best result with respect to MSE and SSIM.

The timing benefits of using only a single frame of data for imaging will result in a better PAI frame rate as the GAN model can be loaded prior to the imaging session and denoising a frame takes on average 0.3 seconds on an Nvidia GeForce GTX 1060 GPU using the PyTorch library. This computation time will be reduced once inference-only libraries like TensorRT are used in the clinical deployment stage [43]. This is similar to our GPU implementation of SVD denoising, and since our 384-element transducer requires three 128-element acquisitions per frame (with one laser pulse each), we are still limited by the 10 Hz repetition rate of our laser. Additionally, averaging several frames per imaging location will most likely cause distortions in the data as clinical imaging subjects, i.e. human organs, will move throughout the data acquisition, therefore imaging only a single frame per location provides additional benefits.

Future work will benefit from a set of training data that is independent of both the SVD and frame averaging. One possible approach would be to use simulation data and add noise for training purposes; however, adding sensor specific noise to simulation data correctly will bring forth a new set of challenges. Ultimately, this will enable the GAN to overcome the shortcomings that might arise from using either the SVD or averaging. Further, while the effectiveness of this denoising method is fundamentally limited by the SNR of the training data, it would be useful to explore the absolute limits of this technique in terms of the minimum SNR which still allows recovery of photoacoustic signals. We are exploring the application of this technique to photoacoustic data acquired using a laser diode illuminator, where significant frame averaging is usually required to attain an SNR similar to pulsed laser systems.

While this technique is promising based on the metrics we have chosen, it will be important moving forward to further study the specific effects of this method on the RF data and corresponding reconstructions to ensure, for example, that the attainable resolution is not affected, or that certain reconstruction methods are not rendered less effective.

Finally, for this method to be useful, it must be tested on in-vivo data, which raises the question of how one would gather sufficient training data for such a study. While it would be prohibitively time consuming to acquire such a dataset from a single patient, we are hopeful that smaller scans of a modest cohort of patients or volunteers (10 to 15) would provide a sufficient dataset to train our model. Whether or not such a model would be robust enough for the high variability of in-vivo data will be the ultimate test of this technique. Our tests on the vessel phantom suggest that some out-of-distribution data can be effectively denoised, providing some promise in this regard.

5. Conclusion

In this paper we have shown that using a GAN to denoise pre-beamformed RF data is a viable option both for removing general background noise for which frame-averaging is standard [16]. The approach can be used to remove artifacts where algorithms such the SVD denoising are traditionally employed. Using SVD and frame-averaging as our training standard, we have improved upon the performance of single frame SVD output, producing results similar to SVD and frame averaging combined. Additionally we have shown that improving the quality of the RF data results in improvements in the corresponding reconstructions.

Acknowledgements

The authors acknowledge funding from the Charles Laszlo Chair in Biomedical Engineering held by Professor Salcudean. Corey Kelly acknowledges funding support from the Walter C Sumner Foundation. We thank Yanan Shao for valuable contributions to the editing of this manuscript.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. R. A. Kruger, D. R. Reinecke, and G. A. Kruger, “Thermoacoustic computed tomography-technical considerations,” Med. Phys. 26(9), 1832–1837 (1999). [CrossRef]

2. M. Xu and L. V. Wang, “Photoacoustic imaging in biomedicine,” Rev. Sci. Instrum. 77(4), 041101 (2006). [CrossRef]

3. L. Lin, P. Hu, X. Tong, S. Na, R. Cao, X. Yuan, D. C. Garrett, J. Shi, K. Maslov, and L. V. Wang, “High-speed three-dimensional photoacoustic computed tomography for preclinical research and clinical translation,” Nat. Commun. 12(1), 882 (2021). [CrossRef]

4. N. Nyayapathi and J. Xia, “Photoacoustic imaging of breast cancer: a mini review of system design and image features,” J. Biomed. Opt. 24(12), 1 (2019). [CrossRef]

5. X. Wang, W. W. Roberts, P. L. Carson, D. P. Wood, and J. B. Fowlkes, “Photoacoustic tomography: a potential new tool for prostate cancer,” Biomed. Opt. Express 1(4), 1117 (2010). [CrossRef]

6. J. R. Rajian, G. Girish, and X. Wang, “Photoacoustic tomography to identify inflammatory arthritis,” J. Biomed. Opt. 17(9), 0960131 (2012). [CrossRef]

7. Z. Xie, Y. Yang, Y. He, C. Shu, D. Chen, J. Zhang, J. Chen, C. Liu, Z. Sheng, H. Liu, J. Liu, X. Gong, L. Song, and S. Dong, “In vivo assessment of inflammation in carotid atherosclerosis by noninvasive photoacoustic imaging,” Theranostics 10(10), 4694–4704 (2020). [CrossRef]

8. L. Xi, S. R. Grobmyer, L. Wu, R. Chen, G. Zhou, L. G. Gutwein, J. Sun, W. Liao, Q. Zhou, H. Xie, and H. Jiang, “Evaluation of breast tumor margins in vivo with intraoperative photoacoustic imaging,” Opt. Express 20(8), 8726 (2012). [CrossRef]

9. S. Abbasi, K. Bell, B. Ecclestone, and P. H. Reza, “Live feedback and 3D photoacoustic remote sensing,” Quant. Imaging Med. Surg. 11(3), 1033–1045 (2020). [CrossRef]

10. A. Rosencwaig and A. Gersho, “Theory of the photoacoustic effect with solids,” J. Appl. Phys. 47(1), 64–69 (1976). [CrossRef]

11. R. Paridar, M. Mozaffarzadeh, M. Mehrmohammadi, M. Basij, and M. Orooji, “Delay-multiply-and-standard-deviation weighting factor improves image quality in linear-array photoacoustic tomography,” in Photons Plus Ultrasound: Imaging and Sensing 2019, vol. 10878A. A. Oraevsky and L. V. Wang, eds., International Society for Optics and Photonics (SPIE, 2019), pp. 601–607.

12. M. Xu and L. Wang, “Universal back-projection algorithm for photoacoustic computed tomography,” Phys. Rev. E 71(1), 016706 (2005). [CrossRef]

13. C. Huang, K. Wang, L. Nie, L. V. Wang, and M. A. Anastasio, “Full-wave iterative image reconstruction in photoacoustic tomography with acoustically inhomogeneous media,” IEEE Trans. Med. Imaging 32(6), 1097–1110 (2013). [CrossRef]

14. J. Poudel, Y. Lou, and M. A. Anastasio, “A survey of computational frameworks for solving the acoustic inverse problem in three-dimensional photoacoustic computed tomography,” Phys. Med. Biol. 64(14), 14TR01 (2019). [CrossRef]

15. J. Poudel, S. Na, L. V. Wang, and M. A. Anastasio, “Iterative image reconstruction in transcranial photoacoustic tomography based on the elastic wave equation,” Phys. Med. Biol. 65(5), 055009 (2020). [CrossRef]

16. J. Li, B. Yu, W. Zhao, and W. Chen, “A review of signal enhancement and noise reduction techniques for tunable diode laser absorption spectroscopy,” Appl. Spectrosc. Rev. 49(8), 666–691 (2014). [CrossRef]

17. R. Manwar, M. Hosseinzadeh, A. Hariri, K. Kratkiewicz, S. Noei, and M. N. Avanaki, “Photoacoustic signal enhancement: Towards utilization of low energy laser diodes in real-time photoacoustic imaging,” Sensors 18(10), 3498 (2018). [CrossRef]

18. E. R. Hill, W. Xia, M. J. Clarkson, and A. E. Desjardins, “Identification and removal of laser-induced noise in photoacoustic imaging using singular value decomposition,” Biomed. Opt. Express 8(1), 68–77 (2017). [CrossRef]

19. C. Kelly, A. Refaee, and S. E. Salcudean, “Integrating photoacoustic tomography into a multimodal automated breast ultrasound scanner,” J. Biomed. Opt. 25(11), 1–18 (2020). [CrossRef]

20. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, vol. 27Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2014), pp. 2672–2680.

21. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” ArXiv e-prints pp. 1–7 (2014).

22. A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative adversarial networks: An overview,” IEEE Signal Process. Mag. 35(1), 53–65 (2018). [CrossRef]

23. P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” Proc. - 30th IEEE Conf. on Comput. Vis. Pattern Recognition, CVPR 2017 2017-Janua, 5967–5976 (2017).

24. J. Gröhl, M. Schellenberg, K. Dreher, and L. Maier-Hein, “Deep learning for biomedical photoacoustic imaging: a review,” Photoacoustics 22, 100241 (2021). [CrossRef]

25. E. M. A. Anas, H. K. Zhang, C. Audigier, and E. M. Boctor, “Robust photoacoustic beamforming using dense convolutional neural networks,” in Simulation, Image Processing, and Ultrasound Systems for Assisted Diagnosis and Navigation, D. Stoyanov, Z. Taylor, S. Aylward, J. M. R. Tavares, Y. Xiao, A. Simpson, A. Martel, L. Maier-Hein, S. Li, H. Rivaz, I. Reinertsen, M. Chabanas, and K. Farahani, eds. (Springer International Publishing, 2018), pp. 3–11.

26. E. M. A. Anas, H. K. Zhang, J. Kang, and E. Boctor, “Enabling fast and high quality led photoacoustic imaging: a recurrent neural networks based approach,” Biomed. Opt. Express 9(8), 3852–3866 (2018). [CrossRef]

27. A. Hariri, K. Alipour, Y. Mantri, J. P. Schulze, and J. V. Jokerst, “Deep learning improves contrast in low-fluence photoacoustic imaging,” Biomed. Opt. Express 11(6), 3360–3373 (2020). [CrossRef]

28. H. Lan, K. Zhou, C. Yang, J. Cheng, J. Liu, S. Gao, and F. Gao, “Ki-GAN: knowledge infusion generative adversarial network for photoacoustic image reconstruction in vivo,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, (Springer, 2019), pp. 273–281.

29. T. Vu, M. Li, H. Humayun, Y. Zhou, and J. Yao, “A generative adversarial network for artifact removal in photoacoustic computed tomography with a linear-array transducer,” Exp. Biol. Med. 245(7), 597–605 (2020). [CrossRef]

30. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), vol. 9351 of LNCS (Springer, 2015), pp. 234–241. (available on arXiv:1505.04597 [cs.CV]).

31. K. Wang, C. Huang, Y.-J. Kao, C.-Y. Chou, A. A. Oraevsky, and M. A. Anastasio, “Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units,” Med. Phys. 40(2), 023301 (2013). [CrossRef]

32. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm,” Soc. for Ind. Appl. Math. J. on Imaging Sci. 2, 183–202 (2009).

33. A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. on Image Process. 18(11), 2419–2434 (2009). [CrossRef]

34. A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” J. Math. Imaging Vis. 20(1/2), 73–87 (2004). [CrossRef]

35. Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

36. L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. on Image Process. 20(8), 2378–2386 (2011). [CrossRef]

37. M. U. Müller, N. Ekhtiari, R. M. Almeida, and C. Rieke, “Super resolution of multispectral satellite images using convolutional neural networks,” ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. V-1-2020, 33–40 (2020). [CrossRef]

38. P. Kovesi, “Image features from phase congruency,” Videre: J. Comput. Vis. Res. 1, 1–26 (1999).

39. M. Ai, W. Shu, T. Salcudean, R. Rohling, P. Abolmaesumi, and S. Tang, “Design of high energy laser pulse delivery in a multimode fiber for photoacoustic tomography,” Opt. Express 25(15), 17713 (2017). [CrossRef]

40. Y. Lou, W. Zhou, T. P. Matthews, C. M. Appleton, and M. A. Anastasio, “Generation of anatomically realistic numerical phantoms for photoacoustic and ultrasonic breast imaging,” J. Biomed. Opt. 22(4), 041015 (2017). [CrossRef]

41. I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” (2017).

42. L. Zhang, L. Zhang, X. Mou, and D. Zhang, “A comprehensive evaluation of full reference image quality assessment algorithms,” in 2012 19th IEEE International Conference on Image Processing, (2012), pp. 1477–1480.

43. H. Vanholder, “Efficient inference with tensorrt,” (2016).

Denoising Category	MSE (%)	SSIM	FSIM
Noisy Data	0.28	0.50	0.54
10 Frames Averaged	0.07	0.72	0.71
20 Frames Averaged	0.06	0.78	0.74
SVD Denoised	0.17	0.37	0.75
GAN Denoised	0.05	0.76	0.85

Denoising Category	MSE (%)	SSIM	FSIM
Noisy Data	0.31	0.43	0.50
10 Frames Averaged	0.16	0.47	0.69
20 Frames Averaged	0.17	0.46	0.71
SVD Denoised	0.88	0.15	0.71
GAN Denoised	0.17	0.44	0.84

Denoising Category	MSE(%)
	FBP	FISTA	FBP	FISTA	FBP	FISTA
Noisy Data	2.00	0.51	0.40	0.48	0.69	0.69
10 Frames Averaged	0.85	0.11	0.43	0.64	0.75	0.74
20 Frames Averaged	0.51	0.08	0.49	0.73	0.79	0.76
SVD Denoised	0.15	0.05	0.58	0.71	0.84	0.78
GAN Denoised	0.18	0.05	0.66	0.80	0.83	0.80

Denoising Category	MSE (%)	SSIM	FSIM
Noisy Data	0.28	0.50	0.54
10 Frames Averaged	0.07	0.72	0.71
20 Frames Averaged	0.06	0.78	0.74
SVD Denoised	0.17	0.37	0.75
GAN Denoised	0.05	0.76	0.85

Denoising Category	MSE (%)	SSIM	FSIM
Noisy Data	0.31	0.43	0.50
10 Frames Averaged	0.16	0.47	0.69
20 Frames Averaged	0.17	0.46	0.71
SVD Denoised	0.88	0.15	0.71
GAN Denoised	0.17	0.44	0.84

Denoising of pre-beamformed photoacoustic data using generative adversarial networks

Abstract

1. Introduction

2. Methods

2.1 Photoacoustic model

2.2 Reference RF dataset

2.3 GAN denoising

2.4 Reconstructions

2.4.1 Filtered back-projection

2.4.2 Fast iterative shrinkage thresholding algorithm

2.5 Metrics

2.6 Data acquisition

3. Results

3.1 Model hyperparameter tuning

3.2 RF data

3.3 Reconstruction using the RF data

4. Discussion

5. Conclusion

Acknowledgements

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (13)

Tables (4)

Equations (14)

Biomedical Optics Express