Enabling fast and high quality LED photoacoustic imaging: a recurrent neural networks based approach

Emran Mohammad Abu Anas; Haichong K. Zhang; Jin Kang; Emad Boctor

doi:10.1364/BOE.9.003852

1. Introduction

Photoacoustic (PA) is a combination of optical (photo) and ultrasound (acoustic) imaging techniques; it has abilities to exploit the spectral contrast of optical imaging as well as spatial resolution of ultrasound signal. The underlying principle of this imaging technology is based on the photoacoustic effect [1], that explains the phenomenon of generation of acoustic waves following light or near infrared energy absorption in a tissue sample. Usually endogenous tissue chromophores (e.g., hemoglobin in blood, melanin in skin) in human organs demonstrate such phenomenon due to their high optical absorption coefficients [2]. The mapping of optical absorption properties could, therefore, provide important anatomical and functional information that may be useful in many clinical applications, e.g., ischemic stroke, breast cancer, functional brain mapping or skin melanomas [3–5]. In addition to tissue chromophores, PA technology has been successfully used for imaging of exogenous contrast agents in a number of applications including molecular imaging and prostate cancer detection [6,7].

The PA imaging work-flow starts with excitation of a soft-tissue sample with high intense short light pulse, followed by local transient thermo-elastic expansion that eventually leads to generation of acoustic waves, usually known as PA signal. To excite a tissue sample, the current standard technology uses heavy and complex laser system, e.g., Nd:YAG or Ti:Sapphire laser [8]. The main advantage of such lasers is ability to generate high energy light pulse; the typical energy is in the range of mJ with a pulse repetition frequency, number of pulses in one second, of 10–30 Hz. However, due to the irradiation of high intense laser source, protective measures (e.g., wearing glasses and gloves) must be taken by the operator to prevent himself/herself from potential health hazards. In addition, such high intensity may decrease the optical contrast of the exogenous contrast agents due to photobleaching effect [9].

Light emitting diode (LED) is a potential alternative of laser as a light source; it has advantages of inexpensive, portable and safe light source. However, the key drawback of LED light source is its limited output power, even series of LEDs can generate energy in range of μJ. As a result, the received PA signal of an LED-based system significantly suffers from low signal-to-noise-ratio (SNR). To improve the SNR, the current technology is based on acquiring multiple frames of PA signals, and subsequently performs an averaging over them to minimize the noise. Though the pulse repetition frequency of LED-based system is much higher (in range of kHz), an averaging over a large number of frames, typically thousands, reduces the effective frame rate of PA images. In addition, a large number of averaging frames requires longer scanning time that can lead to potential motion artifacts in reconstructed images. A few researches have been carried out to improve the SNR as well as keep the averaging frame number low using standard signal processing approaches, e.g., using adaptive denoising [10], empirical mode decomposition [11], wavelet transform [12], Wiener deconvolution [13] or coded excitation [14–16].

In recent years, deep neural networks based approaches have outperformed the previous state-of-the-art signal and image processing techniques in various computer vision tasks including image classification [17,18], image segmentation [19], image denoising [20] and image super-resolution [21–24]; where the latter two categories closely fit to our problem of image enhancement. The published image enhancement techniques are based on stacked denoising auto-encoder [20], densely connected convolutional net [24] or including perceptual loss to enhance the spatial structure of images [22]. In addition to computer vision, neural networks based techniques have been reported in various PA applications, e.g., image reconstruction in PA tomography [25–27] and elimination of reflection artifacts in PA images [28].

In this work, we present a deep neural networks based image enhancement approach to improve the quality as well as reduce the scanning time of LED-based PA images. Our proposed architecture consists of two key components; one is convolutional neural networks (CNN) to extract the spatial features, and the other one is recurrent neural networks (RNN) to leverage the temporal information in PA images. The CNN is built upon a state-of-the-art densenet-based architecture [24] that uses series of skip-connections to enhance the image content. For the RNN component, we use convolutional variant of short-long-term-memory [29, 30] to exploit the temporal dependencies in a given PA image sequence. For an effective feature propagation and an elimination of vanishing gradient, we propose to incorporate skip-connections throughout the network including both CNN and RNN components.

2. Methods

We start this section with describing the CNN-based image enhancement technique in Section 2.1 that can be used to improve the quality of a single PA image. Next in Section 2.2, we present how to incorporate recurrent connection in the proposed network to exploit the temporal dependencies in a given PA image sequence for a further quality improvement.

2.1. Single image enhancement by CNN

Figure 1(a) shows the densenet-based CNN architecture [24] to improve the quality of a single PA image. The network takes a low quality PA image as input and generates high quality PA image as output. The details of generating a PA image using a standard beamforming technique is mentioned in Section 3.1. For convenience, we mention the number of feature maps in each convolutional layer as ‘xx’ in ‘Conv xx’ in Fig. 1(a), in addition, we indicate the number of features maps in the illustration. The architecture consists of three dense blocks, where each dense block consists of two densely connected 3 × 3 convolutional layers followed by rectified linear units (ReLU). Unlike [24] we do not use any upsampling layers in the architecture as the sizes of the input and output images are same (168 × 168) in our case. The key advantage of dense convolutional layer is that it uses all of the generated features from previous layers as inputs through skip connections, therefore, features are propagated more effectively and it leads to the elimination of the vanishing gradient problem of a deep networks [31]. Finally, to estimate the output image, all of the features from the dense blocks are concatenated, and a convolution with one feature map is performed at the end.

Fig. 1 A schematic of our proposed approach. (a) The densenet-based CNN architecture to improve the quality of a single PA image. The architecture consists of three dense blocks, each dense block includes two 3 × 3 dense convolutional layers followed by rectified linear units. (b) A schematic of ConvLSTM cell. In addition to current input X_t, it exploits previous hidden and cell states to generate current states. (c) The proposed architecture that integrates CNN and ConvLSTM together to extract the spatial features and the temporal dependencies, respectively. We also incorporate dense skip connections throughout the network for an improved prediction. The increasing number of features maps are mentioned in the figure.

Download Full Size | PDF

2.2. Recurrent networks for temporal features

RNN [32, 33] is established as a powerful tool to extract the temporal dependencies in a given sequence. The vanilla RNN is usually difficult to train for a long sequence due to the vanishing or exploding gradient. Several improved variants of RNN have been reported and long-short-term-memory (LSTM) [29] has shown the most promising performance in various applications.

ConvLSTM [30] is an extension of LSTM that leverages the convolution operation in extracting the temporal features from a series of 2D maps. As shown in Fig. 1(b), it takes current input X_t, previous cell state C_t−1 and previous hidden state H_t−1; subsequently produces current cell state C_t and current hidden state H_t.

\begin{array}{l} i_{t} = σ (W_{x i} * X_{t} + W_{h i} * H_{t - 1} + b_{i}), \\ f_{t} = σ (W_{x f} * X_{t} + W_{h f} * H_{t - 1} + b_{f}), \\ o_{t} = σ (W_{x o} * X_{t} + W_{h o} * H_{t - 1} + b_{o}), \\ C_{t} = f_{t} ○ C_{t - 1} + i_{t} ○ tanh (W_{x c} * X_{t} + W_{h c} * H_{t - 1} + b_{c}), \\ H_{t} = o_{t} ○ tanh (C_{t}) . \end{array}

Here, * and ○ stand for convolution operation and element-wise matrix multiplication, respectively. i_t, f_t and o_t represent input, forget and output gates, respectively. W_xi, W_hi, W_xf, W_hf, W_xc, W_hc, W_xo, W_ho, b_i, b_f, b_o and b_c are the convolutional parameters to model the 2D time-series sequence. σ indicates the sigmoid operation, and we utilize ‘tanh’ as an activation function.

Figure 1(c) shows our proposed architecture combining CNN and ConvLSTM together to improve the quality of PA image. The architecture takes a sequence of PA images as inputs, initially uses CNN to extract the spatial features and subsequently utilizes ConvLSTM to exploit the temporal dependencies. For the recurrent connection, we use two layers of ConvLSTM with including skip connections. At the end, we concatenate all of the generated features in the previous layers to predict the final output. The details of training the network with an input sequence and a high quality target image is described in Section 3.3.

3. Experiments, materials and training

3.1. Experiments and image reconstruction

We conducted an experimental study to train the proposed network as well as evaluate its performance. The excitation source in our experimental setup consisted of two sets of LED matrices (PreXionLED AcousticX, Tokyo, Japan) that were attached on both sides of the ultrasound transducer of our SonixDAQ system (BK Ultrasound, MA, USA). Each LED matrix consisted of 144 LEDs arranged in four rows. The imaging setup also included an LED driver to generate LED pulses with a pulse repetition frequency of 1 kHz, and a synchronizer was used to synchronize between LED excitation and PA signal reception. The ultrasound transducer consisting of 128 piezoelectric elements with a pitch of 0.3 mm was used to acquire the raw pre-beamformed PA signal.

The experiment consisted of scanning of phantoms and in vivo human fingers. For the phantom experiments, we acquired PA signal for a duration of 11 sec that led to sets of 11,000 frames of pre-beamformed signals. It is important to note that we manged a steady condition during the whole scanning period of 11 sec in the phantom experiments. In contrast, for the in vivo, we could not be able to manage a steady condition, and acquired data for a short duration of 5 sec. After acquisition, we averaged the PA signals over a number of frames N, followed by beamforming using delay-and-sum technique, subsequently detecting the envelope to reconstruct the PA image PA_N.

Two kinds of phantoms were used in our study; wire and magnetic nanoparticle phantoms. We chose wire and magnetic nanoparticles as optical contrast agents in our study due to their high optical absorption coefficients. For the wire phantom, we acquired a total of 62 sets of PA data corresponding to 62 different image planes, where each set provided a total of 11,000 frames. For the gold magnetic nanoparticle phantom, we provide a schematic in Fig. 2. This phantom was specially designed to investigate the sensitivity of our proposed approach with respect to possible variations of imaging depth, contrast of targeted objects and optical scattering of the imaging medium (details in Section 4.3). The phantom was built with fives cylindrical tubes (cross-section in Fig. 2) that were located at three different depths. Tubes 1–3 were located at successively decreasing depth, where the concentrations of the nanoparticles inside the tubes were same (10 nM). In addition to tube 3, two tubes (no 4 and 5) were placed at the lowest depth with concentrations of 5 nM and 2.5 nM, respectively; i.e., the concentration successively decreased from left (tube 3) to the right (tube 5). It is interesting to note that we considered the nanoparticles with the highest concentration at different depths for the sensitivity analysis since our existing LED-based PA system is not sufficient enough to image the nanoparticles with lower concentration (e.g. 2.5 nM) at higher depth (> 2 cm). In the first part of the experiment, we used water as a phantom medium. After acquisition of one set of PA signals of 11,000 frames, we mixed a very small amount of milk (1% of total volume of water) into the water in order to increase the optical scattering of the phantom medium, followed by acquiring another set of PA signals of 11,000 frames. For this phantom experiment, we acquired a total of 10 sets of PA data at 10 different image planes.

Fig. 2 A schematic of the cross-sectional view of the gold magnetic nanoparticle phantom. The phantom consists of five cylindrical tubes. Tubes 1–3 are placed at successively decreasing depth, where a same concentration of nanoparticles is used in each of them. In addition to tube 3, we include two more tubes (tubes 4 and 5) at the upper row of tubes, with a successively decreasing concentration from tube 3 to 5.

Download Full Size | PDF

3.2. Materials

We divide all of our experimentally acquired data into three independent groups: training, validation and test sets. The training set is used for optimizing the network parameters; the validation set, in contrast, is used to fix the hyper-parameters of our architecture that mainly include number of dense blocks, number of convolutional layers in each dense block and number of ConvLSTM layers; and the test set is used to evaluate the proposed network. For the training and validation sets, we use a subset of PA images from the wire phantom. Out of 62 sets, we follow a 50% vs. 15% vs. 35% dividing rule to categorize the images from the wire phantom into training, validation and test sets, respectively. In other words, the training, validation and test sets consist of 32, 10 and 20 sets of PA data from the wire phantom experiment, respectively. Since a deep network requires lots of data for an effective training, we use most of our data in training and the rest of the data is divided between the validation and test sets.

We have chosen PA images from the wire phantom to train the neural networks, because those PA images with multiple point targets at different depths allow our network to learn how to improve the quality of point spread function in PA images. As a result, such a trained network with point spread function can be applied to any arbitrary function of optical target. To evaluate the performance of our technique using an independent test set, we form the test set from the remaining samples in our study, including 20 sets (different image planes) from the wire phantom and 10 sets from the gold nanoparticle experiment. Since in the in vivo experiment we could not be able to manage a steady condition during the scanning period, we did not use them in any quantitative analysis, instead we performed a qualitative analysis on them.

3.3. Training

3.3.1. Generation of input and target sequences

We perform training of the network using different qualities of input PA images. To achieve a difference in the quality, we exploit the proportional effect of the averaging frame number (N) on the image quality. Note that such proportional effect is only true when there is a steady condition maintained during the scanning period. Since we maintain a steady condition during the scanning period in our phantom experiments, we can exploit such proportional effect to generate the PA images with different qualities. The values of N are, therefore, chosen from a range starting from a very low value to the highest possible value of 11,000; where an increased value of N indicates a higher image quality.

For a chosen value of N = N₀, we generate one input sequence, consisting of PA images reconstructed with different values of the averaging frame numbers ≤ N₀. Mathematically, the averaging frame numbers in the sequence can be represented as {N_s, 2N_s, 3N_s, · · · , N₀} corresponding to time index {t₁, t₂, t₃, · · · , t_N₀}, i.e., the averaging frame number in the sequence starts with N_s at t = t₁, subsequently increasing with a step of N_s frames to the highest value of N₀ at t = t_N₀. Figure 3 shows an example of generating one sequence of length of 4 for N₀ = 800 and N_s = 200. The length of input sequence is, therefore, floor((N₀)/N_s), where floor(x) indicates the nearest integer less than or equal to x. Once we fix the averaging frame numbers in a sequence, we can reconstruct a sequence of PA images using the averaging frame numbers in the sequence.

Fig. 3 An example of generating an input sequence of length of 4 for N₀ = 800 and N_s = 200. The sequence starts with 200 with a step of N_s and ends at 800.

Download Full Size | PDF

For a fixed N₀, the length of the sequence is dependent on N_s. A lower value of N_s indicates a longer sequence that may cross the memory limit of our GPU during training. Based on our GPU memory size, we could be able to set N_s = 200 and it gives a maximum length of the input sequence to 55 corresponding to N₀ = 11, 000.

Given our collection of 11,000 frames for one phantom sample, we can achieve the highest possible quality PA image by reconstructing it from the averaged signal over all of the frames. Though an averaging over 11,000 frames significantly improves the SNR of the PA images, still there may be small amount of noise present in the averaged signal. To suppress this small noise, we apply an anisotropic diffusion filter on the reconstructed image PA11000 without affecting the image structure. The anisotropic diffusion [34], also named as Perona-Malik diffusion, is a filtering technique aiming at suppressing the image noise while preserving the image content (e.g. edges). Such a filtering method has been shown to improve the image interpretation in a significant number of applications. In short, the filtering operation is not only space-variant but also data-driven that encourages intra-region smoothing rather than the inter-region smoothing. In our study, the filtered image on PA11000 is considered as gold-standard target image. Note that for each experiment we have only one gold-standard target image corresponding to more than one input sequences.

3.3.2. Optimization

We use mean square losses between the predicted and gold-standard target PA images as a loss function. To minimze the loss function, TensorFlow library (Google, Mountain View, CA) with Adam [35] optimization technique is utilized. A summary of all of the steps of our training procedure is given below:

Start an iteration by randomly shuffle the collection of 11,000 frames.
Choose a random integer N₀ within 200 − 11, 000 as N_s = 200.
For the chosen N₀, generate a sequence of averaging frame numbers as {N_s, 2N_s, 3N_s, · · · , N₀} corresponding to time index {t₁, t₂, t₃, · · · , t_N₀}. Note that the sequence length is dependent on N₀ and it may change after each iteration.
Generate the PA image sequence based on the averaging frame numbers in the sequence. In addition, the gold-standard target image is obtained by applying anisotropic diffusion filter on PA11000.
Perform random cropping of input and target images, followed by resizing the cropped images to 168 × 168 pixels.
Given the input sequence and target image of the proposed network shown in Fig. 1(c), compute the loss function using forward-propagation, subsequently update the network parameters using back-propagation.
Go back to step (1) to start next iteration until convergence achieves.

4. Evaluation

4.1. Evaluation indices

For quantitative evaluation of the proposed approach using the test set, we use signal-to-noise-ratio (SNR), peak-signal-to-noise-ratio (PSNR) and structural similarity index (SSIM) as evaluation indices.

4.1.1. SNR

SNR is defined as the ratio of peak signal intensity and standard deviation of the background intensities in decibels. Unlike PSNR and SSIM, we do not need any reference signal/image to compute the SNR, it is purely based on signal strength and noise statistics. It can be mathematically defined as:

SNR = 20 {log}_{10} (\frac{μ_{I}}{σ_{b}}) .

where, μ_I and σ_b represent the peak signal amplitude at the target and the standard deviation of the background, respectively.

4.1.2. PSNR

The PSNR is a conventional measurement of the image quality in decibels (dB) based on the mean square differences between the estimated and reference images as:

PSNR = 20 {log}_{10} (\frac{I_{max}}{\sqrt{MSE}}),

where,

MSE = \frac{1}{MN} \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} {(I_{ref} (m, n) - I_{est} (m, n))}^{2} .

Here, I_ref and I_est are the reference and estimated images, respectively, of sizes of M × N. I_max indicates the maximum possible value in the given images.

4.1.3. SSIM

SSIM measures the perceived quality of a digital image; a higher SSIM (in a scale of 1.0) indicates a better representation of an estimated image in terms of perception. Mathematically, SSIM is defined as:

SSIM = \frac{(2 μ_{ref} μ_{est} + c_{1}) (2 σ_{cov} + c_{2})}{(μ_{ref}^{2} + μ_{est}^{2} + c_{1}) (σ_{ref}^{2} + σ_{est}^{2} + c_{2})} .

Here, μ_ref (σ_ref) and μ_est (σ_est) are the mean (variance) of the reference and estimated images, respectively; σ_cov represents the covariance between them. c₁ and c₂ are used as constants to stabilize the division with weak denominator and we use their same values as suggested in the original work [36].

4.2. Comparison

In addition to evaluation of the proposed CNN+RNN approach, we provide comparisons with simple averaging and densenet [24] based techniques. The simple averaging method is based on solely averaging operation, i.e., it does not perform any further processing on the PA image that is reconstructed from the already averaged PA signal. The densenet based technique, in contrast, uses a series of dense convolutional layers to improve the quality of a PA image; the architecture is shown in Fig. 1(a) that does not use any recurrent connection to extract the temporal features. We refer this technique as CNN-only method in rest of the literature. Note that the comparison is performed for different values of the averaging frame numbers ≤ 11, 000.

4.3. Sensitivity analysis

We also investigate the sensitivity of our proposed approach with respect to possible variations of imaging depth, contrast of targeted objects and optical scattering of the imaging medium. As mentioned earlier, we used the experimental data from the gold magnetic nanoparticle phantom for this purpose; we select five region of interests of area of 5 mm × 5 mm around the tubes in Fig. 2, followed by comparing the SNRs among those ROIs.

4.4. Computation time

We compute the run-time of our proposed method using a GPU (NVIDIA GeForce GTX 1080 Ti) and a CPU (Intel Core i7-7700K @4.20GHz with 32 GB RAM).

5. Results

5.1. Quantitative comparison

Figures 4(a–b) show a comparison of PSNR and SSIM of our method (RNN+CNN) with those from simple averaging and CNN-only techniques for different values of the averaging frame numbers. Each solid line in the figure indicates the mean of the evaluation index (e.g. PSNR or SSIM) computed from 30 test samples for each of the competing methods, and the corresponding shaded region around the line represents the standard deviation of that evaluation index. We can observe an improved performance of our RNN+CNN method compared to both simple averaging and CNN-only techniques for all values of the averaging frame numbers. On average we could achieve improvements in PSNRs of 5.9 dB and 2.9 dB with respect to the averaging and CNN-only techniques, respectively. It is also interesting to notice a higher improvement rate (with respect to the averaging frame number) of our technique than that of the CNN-only method, in other words, the CNN-only method reaches to a saturation in image quality for higher averaging frame numbers.

Fig. 4 A comparison of PSNR and SSIM of our (RNN+CNN) method with those from the simple averaging and CNN-only methods. (a) PSNR vs. averaging frame numbers. An improvement at all of the averaging frame numbers is noticed for our method compared to two other methods. We can also observe a higher improvement rate of our method compared to the CNN-only method. (b) SSIM vs. averaging frame numbers. Unlike CNN-only method, we observe an increasing trend of improvement with the averaging frame numbers for our method. (c) Gain in frame rate vs. mean PSNR.

Download Full Size | PDF

To quantify a statistical significance of the improvement, we perform student t-test between the evaluation indices of our and each of the comparing methods. Compared to the simple averaging technique, our p-values are obtained less than 0.05 at all of the averaging frame numbers for both PSNR and SSIM. In contrast, for CNN-only technique, we achieve p < 0.05 only at N > 4400.

We can also interpret the improvement in terms of gain in imaging speed. For this purpose, we choose the simple averaging technique as a reference method, i.e., we compute how much gain in frame rate we could achieve with respect to the averaging approach. For example, at the mean PSNR of 35.4 dB, the proposed, CNN-only and averaging techniques need 1,360, 3,680 and 11,000 averaging frames, respectively (dotted line in Fig. 4(a)). If we consider the frame numbers of 11,000 (corresponding to 35.4 dB) of the averaging approach as a reference, then the proposed and CNN-only techniques achieve gains in the frame rate of 11,000/1,360≈8.1 and 11,000/3,680≈3.0 times, respectively. By calculating the gains in the frame rate at other mean PSNRs of the averaging approach, we can plot gain in the frame rate vs mean PSNR as in Fig. 4(c). Such a plot demonstrates how much gain in the imaging speed we can achieve using our proposed RNN+CNN method, in addition, it shows significant improvement in imaging speed achieved compared to the CNN-only method.

5.2. Qualitative comparison

Figures 5 and 6 show qualitative comparison among all three comparing methods for a wire phantom and an in vivo examples, respectively. We use three different values of the averaging frame numbers to demonstrate the comparison. In addition, we provide the gold-standard high quality PA image at the bottom in each figure. The phantom example in Fig. 5 is chosen in such a way that the imaging plane captures the axis of a single wire. Note that such kind of imaging plane is not present in the training set. We can notice the ability of both neural networks approaches (CNN-only and RNN+CNN), trained with point spread functions, in enhancement of other target function in Fig. 5. The obtained PSNRs for the CNN-only method are 33.4 dB, 34.7 dB and 35.6 dB for the averaging frame number of 200, 400 and 600, respectively, in this figure. For our RNN+CNN method, we achieve corresponding PSNRs of 34.1 dB, 35.2 dB and 36.1 dB, respectively, for these three examples.

Fig. 5 Qualitative comparison of our method with the simple averaging and CNN-only techniques for a wire phantom example for three different values of the averaging frame numbers, where the imaging plane consists of a line object. Though the network has been trained using point spread function, we can observe its robustness on line target function.

Download Full Size | PDF

Fig. 6 A comparison of our method with the averaging and CNN-only techniques for an in vivo example. The in vivo data consists of proper digital arteries of three fingers of a volunteer. We can notice improvements in our results compared to those of other two methods in recovering the blood vessels (marked by arrows).

Download Full Size | PDF

The in vivo example in Fig. 6 consists of proper digital arteries of three fingers of a volunteer (anatomy is shown at bottom in the figure). For each finger, we could be able to scan at most three blood veins, where we can notice improved detections of the blood vessels (shown by arrows) for our RNN+CNN approach. It is also evident that the the PA image averaged from all of the 5,000 frames (mentioned as high quality in the figure) contains some artifacts because of the movement during the scanning period.

5.3. Sensitivity analysis

It is mentioned earlier that the nanoparticle phantom in Figure 2 is used for the sensitivity analysis. A summary of the analysis using our approach is provided in the following.

Imaging depth For depth analysis, we choose the imaging plane consisting of tubes 1–3 of the nanoparticle phantom (please refer to Figure 2). Figure 7 shows a comparison among the simple averaging, CNN-only and our proposed techniques for three different values of the averaging frame numbers for that imaging plane. The PA images in Fig. 7 indicates that we need more number of averaging frame numbers to detect an object at higher depth. To investigate the SNR across these tubes, we plot SNR vs. averaging frame numbers in Fig. 8(a) for three different ROIs corresponding to tube 1 (higher depth), 2 (medium depth) and 3 (lower depth) for the water phantom. Note that for computation of SNR, we use our presented technique to compute the peak intensity and standard deviation of the background of each ROI. A comparison among the SNRs at those ROIs indicates a successive decrease in SNR with an increase in the imaging depth. The key reason of such decreasing SNR can be attributed to lower PA peak intensity at higher depth.
Optical contrast The imaging plane for optical contrast analysis consists of tubes 3–5 of the nanoparticle phantom in Fig. 2, where the concentration successively decreases from tube 3 to 5 (left to right). Figure 9 demonstrates a comparison among three competing techniques for three different values of the averaging frame numbers. The PA images in this figure indicate a successive decrease in the image quality with a decrease in the concentration of the optical absorber. In addition, we can observe better recovery of the target object for our RNN+CNN method (shown by arrows in Fig. 9). A comparison among the SNRs for tubes 3–5 (for the water phantom) is provided in Fig. 8(b) for different averaging frame numbers to demonstrate the effect of optical contrast on image quality. As expected, we can notice a successive reduction in SNR with a decrease in optical contrast.
Optical scattering In addition to comparing SNRs for the water phantom, we also include comparative results for tubes 3–5 with water+1% milk phantom in Fig. 8(b). We can observe a corresponding reduction in accuracy for all the tubes 3–5, when the phantom medium is changed from a lower (water only) to higher scattering (water+1% milk) liquid.

Fig. 7 An example effect of depth on the PA image quality. We choose tubes 1–3 of the nanoparticle phantom for this purpose. A comparison among the simple averaging, CNN-only and RNN+CNN techniques is shown for three different values of the averaging frame numbers. As shown by the arrows, a decrease in the image quality can be observed with an increase in the imaging depth. It is also interesting to notice the increased image quality with an increase in the averaging frame number.

Download Full Size | PDF

Fig. 8 Sensitivity analysis of our method. (a) Effect of depth on SNR vs. averaging frame numbers. We can notice a successively reduced accuracy from lower to higher depth. (b) Effect of optical contrast and scattering of phantom medium on image quality. The image quality is degraded with a decrease in the concentration of optical absorber. In addition, a higher optical scattering medium leads to a reduction in image quality.

Download Full Size | PDF

5.4. Computation time

The GPU computation times are obtained as 15 and 4 mili-seconds for our and CNN-only methods, respectively. The corresponding run-times in the CPU are 190 and 105 mili-seconds, respectively.

6. Discussion and conclusion

We have presented a deep neural networks approach to improve the quality of PA images in real-time while simultaneously decreasing the scanning period. In addition to using CNN to extract the spatial features, we employ RNN in our architecture to exploit the temporal information in PA images. We have trained our network using sequence of PA images from 32 phantom experiments. On the test from 30 samples, we could achieve a gain in the frame rate of 8.1 times with a mean PSNR of 35.4 dB compared to the conventional averaging approach.

We have demonstrated that a temporal PA sequence allows the neural networks to learn the image and noise contents more effectively than a single image based CNN-only network does. In addition, for the CNN-only method we could observe saturation in both image quality indices for higher averaging frame numbers (Figs. 4(a–b)) that indicate to a reduction in the improvement rate with an increase in the averaging frame number, in contrast to the higher improvement rate for our method.

Though we train our network using PA images with point targets, we could also notice its effectiveness in other target object (Fig. 5). In addition, we have observed improved performance of our method with respect to CNN-only and averaging techniques in detection of point targets at higher depth and with lower optical contrast (Figs. 7 and 9).

Fig. 9 Comparative performance of the simple averaging, CNN-only and RNN+CNN methods in detection of point target objects with different concentrations. We can notice a successive decrease in signal quality with a decrease in concentration. In addition, the improved performance of our RNN+CNN method can be observed (marked by arrow) with respect to two other methods.

Download Full Size | PDF

We have also shown the improved performance of the proposed approach for in vivo example (Fig. 6). In addition, we have observed the effect of artifacts (due to motion during scanning) in the reconstructed image with a higher averaging frame number. The key advantage of our technique is that it could enhance the image quality from a reconstructed image with low averaging frame number, subsequently eliminating the potential effect of the artifacts.

We have investigated the sensitivity of our approach with respect to imaging depth, contrast of the targeted object and optical scattering of the imaging medium. In summary, we could observe a reduction in image quality for a higher depth, for a lower contrast and for a higher scattering medium. For all of these effects, we can attribute the key reason to lower SNR of the PA signals. For example, either for a higher depth or for a higher optical scattering, the receiving light energy in the target decreases, followed by a reduced SNR. In contrary, a less optical contrast corresponds to less amount of converted acoustic energy that in turn provides lower SNR of the PA signal.

Future work includes an extended evaluation of the proposed approach using more in vivo experiments. In addition, we aim to integrate the proposed technique in our PA imaging work-flow.

In conclusion, we have demonstrated the potential of our approach to exploit the temporal information in PA images for an improvement in image quality as well as a gain in the imaging frame rate.

Funding

NIH Brain Initiative (R24MH106083-03); NIH National Institute of Biomedical Imaging and Bioengineering (R01EB01963).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References and links

1. M. Xu and L. V. Wang, “Photoacoustic imaging in biomedicine,” Rev. Sci. Meth. 77, 041101 (2006). [CrossRef]

2. A. Hariri, J. Lemaster, J. Wang, A. S. Jeevarathinam, D. L. Chao, and J. V. Jokerst, “The characterization of an economic and portable LED-based photoacoustic imaging system to facilitate molecular imaging,” Photoacoustics. 9, 10–20 (2018). [CrossRef]

3. J. Kang, H. Zhang, E. Kulikowicz, E. Graham, R. Koehler, and E. Boctor, “In vivo photoacoustic quantification of brain tissue oxygenation for neonatal piglet graded ischemia model using microsphere administration,” in Ultrasonics Symposium (IUS), 2017 IEEE International, (IEEE, 2017), p. 1.

4. J. L. Su, B. Wang, K. E. Wilson, C. L. Bayer, Y.-S. Chen, S. Kim, K. A. Homan, and S. Y. Emelianov, “Advances in clinical and biomedical applications of photoacoustic imaging,” Expert. Opin. on Med. Diagn. 4, 497–510 (2010). [CrossRef]

5. P. Beard, “Biomedical photoacoustic imaging,” Interface Focus. 1, rsfs20110028 (2011). [CrossRef]

6. A. Agarwal, S. Huang, M. O’donnell, K. Day, M. Day, N. Kotov, and S. Ashkenazi, “Targeted gold nanorod contrast agent for prostate cancer detection by photoacoustic imaging,” J. Appl. Phys. 102, 064701 (2007). [CrossRef]

7. J. Weber, P. C. Beard, and S. E. Bohndiek, “Contrast agents for molecular photoacoustic imaging,” Nat. Methods 13, 639 (2016). [CrossRef] [PubMed]

8. T. J. Allen and P. C. Beard, “High power visible light emitting diodes as pulsed excitation sources for biomedical photoacoustics,” Biomed. Opt. Express 7, 1260–1270 (2016). [CrossRef] [PubMed]

9. N. A. George, B. Aneeshkumar, P. Radhakrishnan, and C. Vallabhan, “Photoacoustic study on photobleaching of rhodamine 6g doped in poly (methyl methacrylate),” J. Phys. D: Appl. Phys. 32, 1745 (1999). [CrossRef]

10. A. Hariri, M. Hosseinzadeh, S. Noei, and M. Nasiriavanaki, “Photoacoustic signal enhancement: towards utilization of very low-cost laser diodes in photoacoustic imaging,” in Photons Plus Ultrasound: Imaging and Sensing 2017, vol. 10064 (International Society for Optics and Photonics, 2017), p. 100645L.

11. M. Sun, N. Feng, Y. Shen, X. Shen, and J. Li, “Photoacoustic signals denoising based on empirical mode decomposition and energy-window method,” Adv. Adapt. Data Analysis 4, 1250004 (2012). [CrossRef]

12. S. A. Ermilov, T. Khamapirad, A. Conjusteau, M. H. Leonard, R. Lacewell, K. Mehta, T. Miller, and A. A. Oraevsky, “Laser optoacoustic imaging system for detection of breast cancer,” J. Biomed. Opt. 14, 024007 (2009). [CrossRef] [PubMed]

13. C. Zhang, K. I. Maslov, J. Yao, and L. V. Wang, “In vivo photoacoustic microscopy with 7.6-μm axial resolution using a commercial 125-MHz ultrasonic transducer,” J. Biomed. Opt. 17, 116016 (2012). [CrossRef]

14. M. P. Mienkina, C.-S. Friedrich, N. C. Gerhardt, M. F. Beckmann, M. F. Schiffner, M. R. Hofmann, and G. Schmitz, “Multispectral photoacoustic coded excitation imaging using unipolar orthogonal golay codes,” Opt. Express 18, 9076–9087 (2010). [CrossRef] [PubMed]

15. H. Zhang, K. Kondo, M. Yamakawa, and T. Shiina, “Simultaneous multispectral coded excitation using gold codes for photoacoustic imaging,” Jpn. J. Appl. Phys. 51, 07GF03 (2012). [CrossRef]

16. H. K. Zhang, K. Kondo, M. Yamakawa, and T. Shiina, “Coded excitation using periodic and unipolar M-sequences for photoacoustic imaging and flow measurement,” Opt. Express 24, 17–29 (2016). [CrossRef] [PubMed]

17. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, (2012), pp. 1097–1105.

18. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE CVPR, (2016), pp. 770–778.

19. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), pp. 3431–3440.

20. J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in Advances in neural information processing systems, (2012), pp. 341–349.

21. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis Mach. Intell. 38, 295–307 (2016). [CrossRef]

22. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision, (Springer, 2016), pp. 694–711.

23. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” arXiv preprint (2016).

24. T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense skip connections,” in 2017 IEEE International Conference on Computer Vision (ICCV), (IEEE, 2017), pp. 4809–4817. [CrossRef]

25. S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” arXiv preprint arXiv:1704.04587 (2017).

26. J. Schwab, S. Antholzer, R. Nuster, and M. Haltmeier, “DALnet: High-resolution photoacoustic projection imaging using deep learning,” arXiv preprint arXiv:1801.06693 (2018).

27. S. Antholzer, M. Haltmeier, R. Nuster, and J. Schwab, “Photoacoustic image reconstruction via deep learning,” in Photons Plus Ultrasound: Imaging and Sensing 2018, vol. 10494 (International Society for Optics and Photonics, 2018), p. 104944U.

28. A. Reiter and M. A. L. Bell, “A machine learning approach to identifying point source locations in photoacoustic data,” in Photons Plus Ultrasound: Imaging and Sensing 2017, vol. 10064 (International Society for Optics and Photonics, 2017), p. 100643J.

29. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput. 9, 1735–1780 (1997). [CrossRef] [PubMed]

30. S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” in Advances in neural information processing systems, (2015), pp. 802–810.

31. G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1 (2017), p. 3.

32. J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” in Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications, (World Scientific, 1987), pp. 411–415.

33. D. E. Rumelhart, G. E. Hinton, R. J. Williams, et al., “Learning representations by back-propagating errors,” Cogn. Model. 5, 1 (1988).

34. P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Transactions on Pattern Analysis Mach. Intell. 12, 629–639 (1990). [CrossRef]

35. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

36. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Process. 13, 600–612 (2004). [CrossRef]

Enabling fast and high quality LED photoacoustic imaging: a recurrent neural networks based approach

Abstract

1. Introduction

2. Methods

2.1. Single image enhancement by CNN

2.2. Recurrent networks for temporal features

3. Experiments, materials and training

3.1. Experiments and image reconstruction

3.2. Materials

3.3. Training

3.3.1. Generation of input and target sequences

3.3.2. Optimization

4. Evaluation

4.1. Evaluation indices

4.1.1. SNR

4.1.2. PSNR

4.1.3. SSIM

4.2. Comparison

4.3. Sensitivity analysis

4.4. Computation time

5. Results

5.1. Quantitative comparison

5.2. Qualitative comparison

5.3. Sensitivity analysis

5.4. Computation time

6. Discussion and conclusion

Funding

Disclosures

References and links

Cited By

Figures (9)

Equations (5)

Biomedical Optics Express