Deep learning approach to improve tangential resolution in photoacoustic tomography

Praveenbalaji Rajendran; Manojit Pramanik

doi:10.1364/BOE.410145

1. Introduction

Photoacoustic computed tomography/photoacoustic tomography (PACT/PAT) is a hybrid imaging modality that combines the merits of both optical and ultrasound imaging [1–4]. Over the past decade, PAT has shown many potential applications in various clinical and preclinical studies [5–9]. In PAT, nano second laser pulses are used to irradiate the target object (e.g., Biological tissues). The light energy incident on the target is absorbed by the chromophores leading to the thermoelastic expansion and generation of pressure waves [also known as photoacoustic (PA) waves]. These PA waves are then acquired around the target boundary by employing an ultrasound transducer (UST). In PAT several scanning geometries are used. However, the most widely used scanning geometry is the circular scanning geometry, in which the UST is rotated around the sample in a circular configuration for acquiring the signals at different locations. The acquired signals are then reconstructed into cross-sectional PAT image by using various reconstruction algorithms [10–13].

In circular scan PAT, the axial (radial, along the ultrasound axis or perpendicular to the scan axis) resolution is spatially invariant and is determined by the finite bandwidth of the detector. Whereas, the tangential (lateral, or parallel to the scan direction) resolution is spatially varying and is dependent on the aperture size (active area) of the detector [14,15]. For a detector with large aperture size the tangential resolution is poor near the detector surface in comparison with the tangential resolution near the scanning center. A pictorial representation of how the axial and tangential resolutions are defined in the circular scanning configuration is shown in Fig. 1. An easy way to overcome this problem is to increase the scanning radius by moving the UST far away from the imaging region of interest. However, increasing the scanning radius is not practically feasible due to degradation of sensitivity and constraints in space. Another way to improve the tangential resolution is to use a small aperture transducer (ideally a point detector). However, the sensitivity of such detectors will be low. Without increasing the radius or decreasing the aperture of the transducer, the tangential resolution can be improved experimentally by attaching a negative acoustic lens to the detector surface [16]. But the negative acoustic lens is homemade (/custom-made), and its attachment reduces the sensitivity of the detector, and it may be expensive to make custom-made transducer with negative lens built in. The tangential resolution can also be improved by using high numerical aperture (NA)-based virtual point detectors with a wide acceptance angle, high sensitivity, and negligible aperture effect instead of the conventional UST’s [17,18]. However, the virtual point detectors are also homemade and is not commercially available. Alternatively, the tangential resolution can be improved by employing a modified delay-and-sum beamformer for image reconstruction [19]. In this approach there is no need for any external lenses or any custom-made detectors. In conventional delay-and-sum the large aperture detectors are considered as a point detector during the beamforming process. This approximation introduces the artefacts in the reconstructed image. Whereas in the modified delay-and-sum the entire surface of the detector was considered as many small segments. The modified delay-and-sum have also been experimentally validated showing threefold improvement in tangential resolution [20]. The impact of sensor apodization on the tangential resolution is also studied with modified delay-and-sum [21]. The modified delay-and-sum suffers from limitations such as the inability to preserve the shape of the target at the furthest point from the scanning center and the computational load is higher. Therefore, an alternative approach to alleviate the problem of tangential resolution needs to be explored.

Fig. 1. Representation of how axial and tangential resolution is defined for the circular scanning geometry used in PAT. UST-ultrasound Transducer.

Download Full Size | PDF

Deep learning is a powerful machine learning approach, which utilizes a range of neural networks to improve the quality of images [22–24]. Specifically, the convolutional neural networks (CNN) are the most widely employed neural networks, and it is shown to be powerful in solving complex imaging problems [25,26]. Cognizing the potential of CNN, there has been significant interest in photoacoustic community to apply the CNN to overcome the current limitations and challenges [27,28]. In general, the CNN based reconstruction techniques employed in PAT can be classified in to four different approaches, namely, post-processing, pre-processing, direct processing, and hybrid processing based on the type of input data to the CNN [29]. In post-processing approach, the images resulting from the conventional reconstruction methods are fed into the CNNs to improve their quality. In the approach of pre-processing, the quality of the raw PA signals (sinograms) are improved using CNN and then used with conventional reconstruction method to produce the final improved PA images. In direct processing, the images are obtained from the CNN by directly feeding the raw PA signals (sinograms), not needing use of any conventional reconstruction technique. Lastly, in the hybrid processing both the sinograms and conventionally reconstructed PA images are fed into the CNN to obtain the resultant final PA images. Several CNN based architectures have been successfully demonstrated for post processing applications such as artefact removal and contrast improvement in PAT images [30–32]. The potential of the CNN for direct processing and hybrid processing have also been successfully demonstrated for PA imaging [33–35]. However, the application of CNN to directly process the images from the raw data is limited by the quality of initial reconstruction and certain features could not be recovered by this approach. Furthermore, the direct processing suffers computational complexity and requires tedious parameter adjustments [27]. Thus, it is optimal to combine the conventional reconstruction technique to process the images from the raw data and then employ the CNN to improve their quality (post-processing approach).

In this work, we propose a postprocessing based deep learning approach to improve the tangential resolution in PAT using a novel CNN based architecture termed as TARES network. This is the first work where a CNN based architecture has been applied for improving the tangential resolution of PAT images. The proposed network was trained on the simulated PA data and its performance was verified by experimental phantom data and in vivo PA imaging. All simulated PA data was generated using k-Wave toolbox in MATLAB [36] The trained model performed well on the test dataset as well as the experimental phantom and in vivo animal imaging. In comparison with the conventional delay-and-sum, the deep learning-based method improved the tangential resolution by eight-fold. Furthermore, the shapes of the target objects were also preserved with the deep learning approach.

2. Methods

2.1 Proposed TARES network architecture

Since its inception, U-Net has been widely used for various imaging related tasks. It incorporates down sampling and up sampling layers along with the skip connections resembling a symmetrical U shape [37,38]. However, the U-Net requires extensional techniques for improved accuracy [39]. An improvised variant of U-Net, termed as fully dense (FD) U-Net, was first proposed for artefact removal in PAT [40] and was adopted for other PA applications due to its better performance [41–43]. The superior performance of FD-U-Net in comparison with the traditional U-Net have already been demonstrated [40]. The FD U-Net comprises of dense blocks in the contracting and expanding paths. These dense blocks connect the earlier convolutional layers to all subsequent layers via concatenation, thus enabling each layer to learn additional feature maps from the knowledge gained by previous layers. Furthermore, the dense layers increase the depth of the networks and its dense connectivity mitigates the problem of vanishing gradient. For improving the tangential resolution in PAT, we propose a network architecture termed as TARES (tangential resolution) network. The proposed network makes use of the basic structure of FD U-Net with 9 residual blocks in the bottom layer. Figure 2 shows the detailed architecture of the TARES network. The proposed network expects an input image size 128×128 pixels and returns a resultant image of size 128×128 pixels. The addition of residual blocks to the network is inspired by the de-raining model [44]. This inclusion of the residual blocks improves the denoising performance of the FD U-Net and its shorter paths from earlier to latter layers alleviates the problem of vanishing gradient. In the contracting path of our network the down sampling operation is performed by a 1×1 convolution block followed by a 3×3 convolution block of stride 2 and the up-sampling operation in expanding path is carried out by a 3×3 transposed convolution block of stride 2. Each convolution block incorporated in the network consists of batch normalization preceded by convolution and nonlinear activation (SELU). Using SELU as the activation function improves the learning speed and robustness of the network [45]. All the dense blocks employed in our model takes f feature maps (channels) as input and returns $2 \times f$ feature maps (channels) as output, with a growth rate of $f/4$ and its structure is analogous to the dense blocks used in FD U-Net [40]. To prevent the spatial information loss, the skip connections were used for each layer and the resultant image is obtained by adding the input image to the residual image.

Fig. 2. Schematic representation of the TARES network architecture. L1, L2,…, Ln refers to the level of compression. X refers to the input image of size 128×128 and Y refers to the output image of size 128×128.

Download Full Size | PDF

2.2 Network training and Implementation

The proposed TARES network was implemented in Python 3.7 with the TensorFlow v2.3 deep learning library [46]. The network training was performed on a Nvidia Tesla V100-32GB GPU using the nodes of Gekko cluster, High Performance Computing Centre, Nanyang Technological University, Singapore. The loss function used for training the network is a combination of the mean absolute error (MAE) and Fourier loss (FMAE) and it is calculated as $Total\; loss = {\lambda _1} \times MAE + {\lambda _2} \times FMAE$ [42,43]. The weights assigned for ${\lambda _1}$ and ${\lambda _2}$ are 1 and 0.01, respectively. For optimizing the network, Adam optimization algorithm was used in conjunction with a learning rate scheduler intended to reduce the learning rate by a factor of 0.5 when the monitored metric was not improving. The initial learning rate used was 0.001. The saving metric used is the combination of structural similarity index (SSIM) and peak signal to noise ratio (PSNR), it is analogous to the saving metric used in [42]. The proposed network was trained for 100 epochs with a batch size of two and the training process took approximately 4 hours to complete. We observed the best SSIM of 0.96147 and PSNR of 31.909 at 62^nd epoch.

2.3 Simulated photoacoustic data sets for training

Deep learning is a data driven approach and its performance depends on the quality of the data used for training [47]. In general, the training data used in the deep learning model comprises of the input data and the expected output data (ground truth). Although, in this case, it is possible to obtain large amount of input data experimentally using the PAT system, but the expected ground truths are unavailable or difficult to obtain. Thus, we used the MATLAB toolbox k-Wave for generating the photoacoustic datasets for training [36]. Three numerical phantoms namely point targets (four-point sources, in which the size [0.5-0.7 mm], position [distance from the scanning center], orientation and source strength of the point sources were varied randomly), Triangular shape (size, orientation and position [distance from the scanning center] of the triangular shape were varied randomly), and vessel shapes mimicking the venous sinuses of rodent brain (orientation, magnitude and position [distance from the scanning center] of the vessel shapes were changed randomly) were used for the simulation [Figs. 3(a)–3(c)]. The simulation geometry used is shown in Fig. 3(d). A computational grid of 410×410 (0.1 mm/pixel) with a perfectly matched bounding layer was used for generating the forward PA data. The imaging region was restricted to 40 mm and the PA data was acquired at 800 sensor locations. A step size of 40 ns with 1500-time steps were used and 1% noise were added, thus maintaining the signal to noise ratio (SNR) at 40 dB. The speed of sound chosen was 1500 m/s and the medium used was acoustically homogeneous. For the input PA data, two large aperture unfocused USTs (13 mm active area) of central frequency 2.25 MHz and 5 MHz with 70% nominal bandwidth were considered. Whereas, for the ground truth PA data were generated with ideal point detector of central frequency 2.25 MHz and 5 MHz with 70% nominal bandwidth. The simulated forward PA data was then reconstructed into cross-sectional PAT images of size 128 ×128 pixels using the traditional delay-and-sum beamformer. The generated dataset comprises of 3000 PAT images simulated using both the USTs of central frequency 2.25 MHz and 5 MHz. The dataset was randomly divided into training, testing, and validation datasets in the ratio 90:5:5. Optimization of the model was done using the training dataset, the hyperparameters were tuned using the validation dataset, and performance of the model was assessed using the testing dataset.

Fig. 3. (a) Point source numerical phantom (Four-point targets located at 0, 7, 14 and 21 mm from the scanning center). (b) Numerical triangular phantom. (c) Numerical phantom mimicking the venous sinuses of rodent brain. (d) Schematic diagram of the k-wave simulation geometry in MATLAB. (e) Point source phantom made up of pencil leads. (f) Triangular phantom made up of horsehair.

Download Full Size | PDF

2.4 Experimental phantom photoacoustic imaging

For validating the usefulness of the proposed TARES network in real experimental PA imaging, phantom imaging was done first. Two types of phantoms namely four-point targets (pencil leads located at 0, 8.5, 17, and 25.5 mm from the scanning center), and triangular horsehair phantom were used for obtaining the PAT images [Figs. 3(e), 3(f)]. A schematic representation of the experimental setup employed for imaging the phantoms is shown in Fig. 4(a). A ∼532 nm Q-switched Nd:YAG laser capable of delivering 10 pulses per second at 5 ns pulse width was used as the excitation source. Two unfocused UST of 2.25 MHz (Olympus NDT, V306-SU) and 5 MHz (Olympus-NDT, V309-SU) central frequency (70% nominal bandwidth) were employed to acquire the generated PA waves. The acquired PA signals were amplified by a low signal noise amplifier (Olympus-NDT, 5072PR) and stored inside a desktop computer (IntelXeon, 3.7 GHz 64-bit processor, 16 GB RAM) using a data acquisition (DAQ) card (GaGe, compuscope 4227). The collected PA signals are then reconstructed into PAT images by conventional delay-and-sum beamformer.

Fig. 4. (a) Schematic of the experimental PAT set up used for phantom imaging. (b) Schematic of the experimental PAT set up used for in vivo imaging. AM-anaesthesia machine, AMP-amplifier, DAQ-data acquisition card, LDU-laser driving unit, PLD-pulsed laser diode, PC-personal computer, SM-stepper motor, UST-ultrasound Transducer, GG-ground glass, P1, P2 and P3 are uncoated prisms, S1 is the sample used for imaging.

Download Full Size | PDF

2.5 In vivo photoacoustic rat brain imaging

The feasibility of the proposed TARES network was also verified on the in vivo PA imaging. For the in vivo imaging, two Sprague Dawley female rats (weighing ∼90 ± 5 g) procured from the InVivos Pte. Ltd., Singapore were utilized. A combination of Ketamine (100 mg/mL) and Xylazine (20 mg/mL) were used to anesthetize the rats. Before imaging, the hair on the rat head was removed using depilatory cream and eye gel was applied on the eyes of the rat. A layer of ultrasound gel was applied on rat scalp and the animal was mounted on the system as shown in the Fig. 4(b). During the imaging a continuous supply of 0.75% isoflurane and 1.0 L/min oxygen were provided for maintaining the anesthesia. The experimental setup used for imaging the in vivo PAT images of the rat brain is shown in Fig. 4(b). For the in vivo experiments, Near-infrared (NIR) wavelength is usually used for achieving better penetration depth. Thus, a pulsed laser diode (PLD) (Quantel, France) of ∼816 nm wavelength (2 kHz pulse repetition rate, 107 ns pulse width) was used as the excitation source. The PLD was controlled by the laser driving unit (LDU), and the output laser beam was homogenized using an optical diffuser. The laser energy density was maintained at ∼0.17 mJ/cm², which is in par with the ANSI safety limits [48]. Two unfocused UST of 2.25 MHz (Olympus-NDT, V309-SU) and 5 MHz (Olympus-NDT, V309-SU) central frequency (70% nominal bandwidth) were used to obtain the generated PA signals. The PA signals acquired were amplified with a 48 dB low signal noise amplifier (Mini-circuits, ZFL-500LNBNC). The amplified signals are then stored inside a desktop computer (IntelXeon, 3.7 GHz 64-bit processor, 16 GB RAM) using a DAQ card (Spectrum, M2i.4932-Exp), and then reconstructed into PAT images using the conventional delay-and-sum beamformer. After the experiments, the animals were euthanized by the intraperitoneal administration of Valabarb (sodium pentobarbitone 300 mg/mL).

3. Results and discussion

3.1 Model comparison

Initially, we compared the performance of the proposed TARES network to the performance of the FD U-Net using a variety of loss metrics [SSIM, PSNR, MAE, MSE] as shown in Table 1. For the comparison, the FD U-Net architecture was also trained with the same training dataset for 100 epochs and the metrics were evaluated over the test data using the best model (63^rd epoch) chosen [40,42]. The TARES network exhibited best performance over all the metrics [SSIM, PSNR, MAE, MSE] in comparison with the FD U-Net.

Table 1. Evaluation metrics (mean ± SD) to compare the model architectures TARES network and FD U-Net. PSNR, peak signal-to-noise ratio; SSIM, structural similarity index; MAE, mean absolute error; MSE, mean squared error.

View Table

3.2 Simulation results

Figure 5 shows the images of two numerical phantoms obtained using the conventional delay-and-sum, the modified delay-and-sum, and images with TARES network. The two numerical phantoms shown in Fig. 5 were simulated with the UST frequency of 2.25 MHz. The reconstructed images of four-point targets using the conventional delay-and-sum is shown in Fig. 5(a), and the zoomed images of the individual point targets 1 to 4 are shown in Figs. 5(d)–5(g). The elongation of the point targets in the tangential direction can be clearly visualized from these images. Note that the elongation of the point is larger (poorer the tangential resolution) when the respective point is farther from the scanning radius (closer to the UST). Figure 5(b) shows the reconstructed PAT images of the four-point targets using the modified delay-and-sum beamformer. The zoomed in images of each individual point targets 1 to 4 are shown from Figs. 5(h)–5(k). From the Fig. 5(b), it is evident that the modified delay-and-sum improves the tangential resolution as expected. However, it can also be inferred that the ability of the modified delay-and-sum to improve the tangential resolution and to preserve the shape of the target object degrades with increasing distance from the scanning center. The tangential resolution improved images using the TARES network is shown in Fig. 5(c), and the respective zoomed in images of the reconstructed individual point targets 1 to 4 are depicted from Figs. 5(l)–5(o).

Fig. 5. (a)–(o) Simulation results for point source phantoms simulated at 2.25 MHz. (a) PAT image of 4-point targets using conventional delay-and-sum. (b) PAT image of 4-point targets using modified delay-and-sum. (c) PAT image of 4-point targets using TARES network. (d-g) Zoomed in point targets in (a). (h-k) Zoomed in point targets in (b). (l-o) Zoomed in point targets in (c). (p) Comparison of tangential resolution versus distance from center (UST 2.25 MHz). (q) Comparison of tangential resolution versus distance from center (UST 5 MHz). (r-s) Simulation results of triangular phantoms simulated at 2.25 MHz (p) PAT image of triangular phantom using conventional delay-and-sum. (q) PAT image of triangular phantom with TARES network. SSIM, structural similarity index.

Download Full Size | PDF

A similar improvement was also noted in the point target images simulated with 5 MHz UST frequency; the images are not shown here for brevity. The tangential resolution was also quantified at each point targets by using the edge spread function (ESF). In ESF, the resolution is determined by calculating the distance required by the edge response (from the sharp discontinuity at the edge of point target) to rise from 10% to 90% of maximum intensity value. The quantified tangential resolution at each point target positions were plotted for both the USTs of central frequency 2.25 MHz and 5 MHz [Figs. 5(p),5(q)]. Note that the proposed TARES network outperforms the modified delay-and-sum in improving the tangential resolution over all the points located at different distances from the scanning center. Especially at the farthest point, a threefold improvement in the tangential resolution was observed with the TARES network in comparison with the modified delay-and-sum and a tenfold improvement in comparison with conventional delay-and-sum. The axial resolution over all the points remains almost the same in all three reconstructed images shown in Figs. 5(a)–5(c). Furthermore, the potentiality of the TARES network to improve the image quality (due to tangential resolution improvement) can also be visualized by comparing the conventionally reconstructed and TARES network reconstructed image of the numerical triangular phantom shown in Figs. 5(r), 5(s). The yellow arrows on the Fig. 5(r) indicates that at the corners of the triangle the reconstructed image quality suffers (due to its location far from the scanning center). In Fig. 5(s) the corresponding corners have been recovered very well with the TARES network. This improvement in image quality is further substantiated by the higher SSIM [SSIM = 0.96] value of the tangential resolution improved image of the numerical triangular phantom.

3.3 Experimental phantom photoacoustic imaging

Success of TARES network on simulated PA imaging data to improve the tangential resolution drastically, lead us to apply the network on real experimental PA images. As described earlier, two types of phantoms were used for the experiments. Two different UST of central frequency 2.25 MHz and 5 MHz were employed to acquire the PAT data. The acquired PAT data is then reconstructed using the conventional delay-and-sum, and TARES network. Figure 6(a) shows the PAT images of the four-point targets obtained using the 2.25 MHz UST using the conventional delay-and-sum. The zoomed images of each point targets are shown in the Figs. 6(b)–6(e). Elongation of point targets can also be visualized in the conventionally reconstructed experimental point target PA images. The tangential resolution improved image of four-point target using the TARES network is shown in Fig. 6(f), and the zoomed in images of the individual point targets are shown in Figs. 6(g)–6(j). Tangential resolution plots versus distance from the scanning center for both USTs (2.25 MHz and 5 MHz) are shown in Figs. 6(k), 6(l). The individual zoomed in PA images and the tangential resolution vs distance plots confirms the ability of TARES network to recover the points very well even on the farthest point in the real experimental PAT images. In particular, the TARES network improved the tangential resolution by eight folds on the farthest location. The ability of the TARES network to preserve the shape and to improve the image quality in the reconstructed PAT images can be visualized by comparing the images of the triangular horsehair phantoms PAT images shown in Figs. 6(m), 6(n), where Fig. 6(m) shows the PAT image with conventional delay-and-sum, and Fig. 6(n) shows the PAT image with TARES network. In particular, the edges of the triangular horsehair phantom are well preserved (indicated with yellow arrows) in the images processed with TARES network [Fig. 6(n)] than the conventionally reconstructed PAT image [Fig. 6(m)].

Fig. 6. (a–i) Experimental results of point source phantoms PA imaging using unfocused UST of 2.25 MHz (a) PAT image of 4-point targets using conventional delay-and-sum. (b-d) Zoomed in point targets in (a). (f) PAT image of 4-point targets using TARES network. (g-j) Zoomed in point targets in (f). (k) Comparison of tangential resolution versus distance from center (UST 2.25 MHz). (l) Comparison of tangential resolution versus distance from center (UST 5 MHz). (m-n) Experimental results of triangular horsehair phantom imaged using unfocused UST of 2.25 MHz (m) PAT image of horsehair phantom using conventional delay-and-sum. (n) PAT image of horsehair phantom using TARES network.

Download Full Size | PDF

3.4 In vivo photoacoustic rat brain imaging

The TARES networks performance was also evaluated on the in vivo rat brain imaging. Two different UST of central frequency 2.25 MHz and 5 MHz were employed to acquire the PAT data using the PAT system described earlier [Fig. 4(b)]. Figures 7(a) and 7(c) show the in vivo PAT images obtained using the 2.25 MHz and 5 MHz frequency UST’s using conventional delay-and-sum. Note that, in these images the boundaries of the venous sinuses are not clearly visible (blurred, shown with dashed yellow arrows), especially the demarcation of the transverse sinus is wane, and, in some case, the transverse sinus is not visible at all. This degradation of the venous sinuses is a major hurdle in employing the single element UST based PAT system for in vivo animal studies such as the detection of intracranial hypotension [49] and the detection of cerebral hemorrhage. Furthermore, the white spots (artefacts, shown with dashed yellow boxes) appearing around the venous sinuses also degrades its visibility and hampers the analysis. The tangential resolution improved images of the in vivo rat brain using the TARES network are shown in Figs. 7(b) and 7(d) corresponding to the respective original PAT image of Figs. 7(a) and 7(c). From the improved in vivo brain images, the enhancement in image quality (due to tangential resolution improvement) is clearly perceptible on both the transverse sinus (TS) and sagittal sinus (SS). Especially, the morphology of the transverse sinus along the edges is clearly visible (shown with yellow dashed arrows) in the improved in vivo brain images [Figs. 7(b) and 7(d)] of the TARES network than the conventionally reconstructed in vivo brain images [Figs. 7(a) and 7(c)]. The suppression of the noise and the white artifacts on the images can also be clearly noticed [Fig. 7(d)].

Fig. 7. (a)–(b) Experimental results of in vivo rat brain PAT imaging using UST of 2.25 MHz. (a) PAT brain image using conventional delay-and-sum (2.25 MHz). (b) PAT image using TARES network (2.25 MHz). (c)–(d) Experimental results of in vivo rat brain PAT imaging using UST of 5 MHz. (c) PAT brain image using conventional delay-and-sum (5 MHz). (d) PAT brain image using TARES network (5 MHz).

Download Full Size | PDF

One of the common problems encountered in circular scan PAT is the tangential resolution degradation with increasing aperture size of the detector. Although numerous techniques have been proposed to overcome this problem, they are hindered by the high cost and the need of customized detectors. To overcome this problem, we proposed a FD U-Net based CNN architecture along with 9 residual blocks termed as TARES network. The use of residual blocks in the network improves its denoising performance along with the quality of the resultant images. Furthermore, the dense connectivity in the network increases the depth of the network without increasing the number of contracting and expanding layers. Simulated numerical phantoms were used to train the network and the performance of the network was verified with both the simulated and experimental PAT imaging. The results of the TARES network on the simulated and experimental imaging data consistently proves that the network is highly efficient in improving the tangential resolution and is independent of the PAT imaging system used. However, the approach of employing deep learning to improve the tangential resolution also has certain limitations. First, the requirement of large training dataset to optimize the network for the application and it can be overcome by using the simulated phantoms akin to the intended application. Furthermore, developing the simulated dataset close to the intended application enhances the performance of the deep learning network. This fact is exemplified by the TARES networks performance on the experimental phantom and in vivo PAT images. In lieu of the simulated training dataset, it is possible to generate the training dataset only on experimental images by using the conventional reconstruction for the input PAT images and tangential resolution improved PAT images as the ground truth. But, in such instances the performance of the deep learning network is constrained by the performance of the algorithm used to improve the tangential resolution of PAT images. Another limitation is the time taken for training the network, it is a well-known phenomenon that increasing the training data size increases the time taken for training. A means to overcome this problem is to distribute the training load over multiple GPU’s. The training time also depends on the employed GPU specifications such as CUDA cores, memory, clock, BUS and GPU memory bandwidth [50]. Also, the training time depends on the epochs and batch size used for training. Using deep learning several drawbacks of photoacoustic imaging can be addressed in a one go approach and the only prerequisite is the appropriate dataset. We are happy to share the source code of the TARES network and it can be found at the data availability section of this paper.

4. Conclusion

In this work, a CNN based architecture termed as TARES was implemented for improving the tangential resolution in PAT images. The network was trained on the simulated phantoms and its performance was verified on both the simulated and experimental dataset. Our results substantiate that the proposed network can improve the tangential resolution by eight folds than the conventional delay-and-sum. In addition, the proposed network improves the quality of PAT images. Therefore, the proposed TARES network can improve the tangential resolution of PAT images without the use of any external negative acoustic lens on the surface of transducer or any modified delay-and-sum beamformer.

Funding

Ministry of Education - Singapore (RG144/18).

Acknowledgments

The authors would also like to thank high performance computing center (HPC) of Nanyang Technological University, Singapore for providing computational resources.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data Availability

The source code of the TARES network employed in this paper is available at Github: https://github.com/BILAB-NTU/Deep-learning-approach-to-improve-tangential-resolution-in-photoacoustic-tomography.git and the datasets used can be downloaded from Open Science Framework (OSF): https://osf.io/mcjk9/?view_only=898efa10d0944376a5970e91ccd1b7d0.

References

1. L. Li, L. Zhu, C. Ma, L. Lin, J. Yao, L. Wang, K. Maslov, R. Zhang, W. Chen, J. Shi, and L. V. Wang, “Single-impulse Panoramic Photoacoustic Computed Tomography of Small-animal Whole-body Dynamics at High Spatiotemporal Resolution,” Nat. Biomed. Eng. 1(5), 0071 (2017). [CrossRef]

2. Y. Zhou, J. Yao, and L. V. Wang, “Tutorial on photoacoustic tomography,” J. Biomed. Opt. 21(6), 061007 (2016). [CrossRef]

3. L. V. Wang and J. Yao, “A practical guide to photoacoustic tomography in the life sciences,” Nat. Methods 13(8), 627–638 (2016). [CrossRef]

4. A. Taruttis and V. Ntziachristos, “Advances in real-time multispectral optoacoustic imaging and its applications,” Nat. Photonics 9(4), 219–227 (2015). [CrossRef]

5. Z. Wu, L. Li, Y. Yang, P. Hu, Y. Li, S. Y. Yang, L. V. Wang, and W. Gao, “A microrobotic system guided by photoacoustic computed tomography for targeted navigation in intestines in vivo,” Sci. Robot. 4(32), eaax0613 (2019). [CrossRef]

6. S. Gottschalk, O. Degtyaruk, B. Mc Larney, J. Rebling, M. A. Hutter, X. L. Deán-Ben, S. Shoham, and D. Razansky, “Rapid volumetric optoacoustic imaging of neural dynamics across the mouse brain,” Nat. Biomed. Eng. 3(5), 392–401 (2019). [CrossRef]

7. D. Das and M. Pramanik, “Combined ultrasound and photoacoustic imaging of blood clot during microbubbles-assisted sonothrombolysis,” J. Biomed. Opt. 24(12), 1 (2019). [CrossRef]

8. L. Lin, P. Hu, J. Shi, C. M. Appleton, K. Maslov, L. Li, R. Zhang, and L. V. Wang, “Single-breath-hold photoacoustic computed tomography of the breast,” Nat. Commun. 9(1), 2352 (2018). [CrossRef]

9. J. Xia and L. V. Wang, “Small-animal whole-body photoacoustic tomography: a review,” IEEE Trans. Biomed. Eng. 61(5), 1380–1389 (2014). [CrossRef]

10. X. Deán-Ben and D. Razansky, “Optoacoustic image formation approaches - a clinical perspective,” Phys. Med. Biol. 64(18), 18TR01 (2019). [CrossRef]

11. J. Poudel, Y. Lou, and M. A. Anastasio, “A survey of computational frameworks for solving the acoustic inverse problem in three-dimensional photoacoustic computed tomography,” Phys. Med. Biol. 64(14), 14TR01 (2019). [CrossRef]

12. S. Gutta, S. K. Kalva, M. Pramanik, and P. K. Yalavarthy, “Accelerated image reconstruction using extrapolated Tikhonov filtering for photoacoustic tomography,” Med. Phys. 45(8), 3749–3767 (2018). [CrossRef]

13. C. Huang, K. Wang, L. Nie, L. V. Wang, and M. A. Anastasio, “Full-Wave Iterative Image Reconstruction in Photoacoustic Tomography with Acoustically Inhomogeneous Media,” IEEE Trans. Med. Imaging 32(6), 1097–1110 (2013). [CrossRef]

14. M. Haltmeier and G. Zangerl, “Spatial resolution in photoacoustic tomography: effects of detector size and detector bandwidth,” Inverse Problems 26(12), 125002 (2010). [CrossRef]

15. M. Xu and L. V. Wang, “Analytic explanation of spatial resolution related to bandwidth and detector aperture size in thermoacoustic or photoacoustic reconstruction,” Phys. Rev. E 67(5), 056605 (2003). [CrossRef]

16. M. Pramanik, G. Ku, and L. V. Wang, “Tangential resolution improvement in thermoacoustic and photoacoustic tomography using a negative acoustic lens,” J. Biomed. Opt. 14(2), 024028 (2009). [CrossRef]

17. C. Li and L. V. Wang, “Photoacoustic tomography of the mouse cerebral cortex with a high-numerical-aperture-based virtual point detector,” J. Biomed. Opt. 14(2), 024047 (2009). [CrossRef]

18. C. Li and L. V. Wang, “High-numerical-aperture-based virtual point detectors for photoacoustic tomography,” Appl. Phys. Lett. 93(3), 033902 (2008). [CrossRef]

19. M. Pramanik, “Improving tangential resolution with a modified delay-and-sum reconstruction algorithm in photoacoustic and thermoacoustic tomography,” J. Opt. Soc. Am. A 31(3), 621–627 (2014). [CrossRef]

20. S. K. Kalva and M. Pramanik, “Experimental validation of tangential resolution improvement in photoacoustic tomography using a modified delay-and-sum reconstruction algorithm,” J. Biomed. Opt. 21(8), 086011 (2016). [CrossRef]

21. P. Warbal, M. Pramanik, and R. K. Saha, “Impact of sensor apodization on the tangential resolution in photoacoustic tomography,” J. Opt. Soc. Am. A 36(2), 245–252 (2019). [CrossRef]

22. N. Awasthi, G. Jain, S. K. Kalva, M. Pramanik, and P. K. Yalavarthy, “Deep neural network based sinogram super-resolution and bandwidth enhancement for limited-data photoacoustic tomography,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control (2020 (In Press)).

23. M. I. Razzak, S. Naz, and A. Zaib, “Deep Learning for Medical Image Processing: Overview, Challenges and the Future,” in Classification in BioApps: Automation of Decision Making, N. Dey, A. S. Ashour, and S. Borra, eds. (Springer International Publishing, Cham, 2018), pp. 323–350.

24. S. Gutta, V. S. Kadimesetty, S. K. Kalva, M. Pramanik, S. Ganapathy, and P. K. Yalavarthy, “Deep neural network-based bandwidth enhancement of photoacoustic data,” J. Biomed. Opt. 22(11), 1 (2017). [CrossRef]

25. C. Tian, Y. Xu, W. Zuo, B. Zhang, L. Fei, and C. Lin, “Coarse-to-fine CNN for image super-resolution,” IEEE Trans. Multimedia, 1 (2020). [CrossRef]

26. A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature 542(7639), 115–118 (2017). [CrossRef]

27. C. Yang, H. Lan, F. Gao, and F. Gao, “Deep learning for photoacoustic imaging: a survey,” arXiv, 2008.04221 (2020).

28. N. Awasthi, K. R. Prabhakar, S. K. Kalva, M. Pramanik, R. V. Babu, and P. K. Yalavarthy, “PA-Fuse: deep supervised approach for the fusion of photoacoustic images with distinct reconstruction characteristics,” Biomed. Opt. Express 10(5), 2227–2243 (2019). [CrossRef]

29. A. Hauptmann and B. Cox, “Deep Learning in Photoacoustic Tomography: Current approaches and future directions,” arXiv, 2009.07608 (2020).

30. A. Hariri, K. Alipour, Y. Mantri, J. Schulze, and J. Jokerst, “Deep Learning Improves Contrast in Low-Fluence Photoacoustic Imaging,” Biomed. Opt. Express 11(6), 3360–3373 (2020). [CrossRef]

31. T. Vu, M. Li, H. Humayun, Y. Zhou, and J. Yao, “A generative adversarial network for artifact removal in photoacoustic computed tomography with a linear-array transducer,” Exp. Biol. Med. 245(7), 597–605 (2020). [CrossRef]

32. D. Allman, A. Reiter, and M. A. L. Bell, “Photoacoustic Source Detection and Reflection Artifact Removal Enabled by Deep Learning,” IEEE Trans. Med. Imaging 37(6), 1464–1477 (2018). [CrossRef]

33. H. Lan, D. Jiang, C. Yang, F. Gao, and F. Gao, “Y-Net: Hybrid deep learning image reconstruction for photoacoustic tomography in vivo,” Photoacoustics 20, 100197 (2020). [CrossRef]

34. J. Feng, J. Deng, Z. Li, Z. Sun, H. Dou, and K. Jia, “End-to-end Res-Unet based reconstruction algorithm for photoacoustic imaging,” Biomed. Opt. Express 11(9), 5321–5340 (2020). [CrossRef]

35. H. Lan, K. Zhou, C. Yang, J. Cheng, J. Liu, S. Gao, and F. Gao, “Ki-GAN: Knowledge Infusion Generative Adversarial Network for Photoacoustic Image Reconstruction In Vivo,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, (Springer International Publishing, 2019), 273–281.

36. B. E. Treeby and B. T. Cox, “k-Wave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wave fields,” J. Biomed. Opt. 15(2), 021314 (2010). [CrossRef]

37. M. Livne, J. Rieger, O. U. Aydin, A. A. Taha, E. M. Akay, T. Kossen, J. Sobesky, J. D. Kelleher, K. Hildebrand, D. Frey, and V. I. Madai, “A U-Net Deep Learning Framework for High Performance Vessel Segmentation in Patients With Cerebrovascular Disease,” Front. Neurosci. 13, 97 (2019). [CrossRef]

38. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, (Springer International Publishing, 2015), 234–241.

39. S. Cai, Y. Tian, H. Lui, H. Zeng, Y. Wu, and G. Chen, “Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network,” Quant. Imaging Med. Surg. 10(6), 1275–1285 (2020). [CrossRef]

40. S. Guan, A. A. Khan, S. Sikdar, and P. V. Chitnis, “Fully Dense UNet for 2-D Sparse Photoacoustic Tomography Artifact Removal,” IEEE J. Biomed. Health Inform. 24(2), 568–576 (2020). [CrossRef]

41. A. Sharma and M. Pramanik, “Convolutional neural network for resolution enhancement and noise reduction in acoustic resolution photoacoustic microscopy,” Biomed. Opt. Express 11(12), 6826–6839 (2020). [CrossRef]

42. A. DiSpirito III, D. Li, T. Vu, M. Chen, D. Zhang, J. Luo, R. Horstmeyer, and J. Yao, “Reconstructing Undersampled Photoacoustic Microscopy Images using Deep Learning,” IEEE Trans. Med. Imaging, 1 (2020). [CrossRef]

43. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for Fourier ptychography microscopy,” Opt. Express 26(20), 26470–26484 (2018). [CrossRef]

44. H. Porav, T. Bruls, and P. Newman, “I Can See Clearly Now: Image Restoration via De-Raining,” 2019 International Conference on Robotics and Automation (ICRA), 7087–7093 (2019).

45. D. Pedamonti, “Comparison of non-linear activation functions for deep neural networks on MNIST classification task,” arXiv, 1804.02763 (2018).

46. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Józefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” arXiv, 1603.04467 (2016).

47. A. Munappy, J. Bosch, H. H. Olsson, A. Arpteg, and B. Brinne, “Data Management Challenges for Deep Learning,” in 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), (2019), 140–147.

48. American National Standard for Safe Use of Lasers, ANSI Standard Z136.1-2007, NY (2007).

49. P. Rajendran, S. Sahu, R. A. Dienzo, and M. Pramanik, “In vivo detection of venous sinus distension due to intracranial hypotension in small animal using pulsed-laser-diode photoacoustic tomography,” J. Biophotonics 13(6), e201960162 (2020). [CrossRef]

50. D. Justus, J. Brennan, S. Bonner, and A. McGough, “Predicting the Computational Cost of Deep Learning Models,” 2018 IEEE International Conference on Big Data (Big Data), 3873–3882 (2018).

Network	PSNR	SSIM	MAE	MSE
TARES	∼ 32.74 ± 5.89	∼ 0.96 ± 0.03	∼ 0.025 ± 0.016	∼ 0.0011 ± 0.0012
FD-UNet	∼ 29.65 ± 6.56	∼ 0.94 ± 0.04	∼ 0.037 ± 0.022	∼ 0.0022 ± 0.0021

Deep learning approach to improve tangential resolution in photoacoustic tomography

Abstract

1. Introduction

2. Methods

2.1 Proposed TARES network architecture

2.2 Network training and Implementation

2.3 Simulated photoacoustic data sets for training

2.4 Experimental phantom photoacoustic imaging

2.5 In vivo photoacoustic rat brain imaging

3. Results and discussion

3.1 Model comparison

3.2 Simulation results

3.3 Experimental phantom photoacoustic imaging

3.4 In vivo photoacoustic rat brain imaging

4. Conclusion

Funding

Acknowledgments

Disclosures

Data Availability

References

Data Availability

Cited By

Figures (7)

Tables (1)

Biomedical Optics Express