Encoder-decoder deep learning network for simultaneous reconstruction of fluorescence yield and lifetime distributions

Jiaju Cheng; Peng Zhang; Peng Zhang; Fei Liu; Jie Liu; Hui Hui; Jie Tian; Jie Tian; Jianwen Luo

doi:10.1364/BOE.466349

1. Introduction

Fluorescence molecular tomography (FMT) is capable of detecting biological activities in vivo and has become an important functional imaging tool for oncology research, drug development and other biological studies [1–5]. However, the applications of FMT are hampered not only by its ill-posed inverse problem due to strong scattering in biological tissues, leading to poor spatial resolution, but also by its penetration limit. There have been many studies that utilize regularization methods to enhance the spatial resolution, and thus the penetration limit is considered as a more significant obstacle to the applications of FMT. Conventional FMT requires photons that penetrate the object for adequate spatial information. In this case, conventional geometries of FMT are penetrating or partially penetrating [6], and the penetration depth of near-infrared photons in biological tissues is limited. Therefore, conventional FMT applications are restricted in objects with small section diameters, such as mouse [3–5], human fingers [7], and so on.

To circumvent the penetration limit, a time-domain FMT in reflective geometry (TD-rFMT) was proposed in our previous work [8]. In reflective geometry, the fluorescence distribution in the external region could be reconstructed with spatial information provided not only by the different source-detector spacings and positions, but also by the photon propagation trajectories varying over time. However, reconstruction of fluorescence distribution in the deep region is challenging due to the low sensitivity of the deep region [8,9] and the dominance of signal from the shallow region. Through a depth-regularized Tikhonov-regularization-based projecting sparsity pursuit (PrSP-TK-D) method for yield distribution reconstruction and an L1 regularization weighted separation reconstruction algorithm (L1WSR) for lifetime distribution reconstruction [8], the fluorescence yield and lifetime distributions of targets within a 2.5-cm depth could be accurately reconstructed regardless of the object sizes. Despite the improvement in reconstruction performance, the positioning accuracy and quantification accuracy are not sufficient when the fluorescent targets with different depths are close to each other, especially when considering that the fluorescence lifetime could be a quantitative indicator of vital biological tissue properties such as pH and temperature [10–12]. In the extreme cases that fluorescent targets are close and with different depths, the signal of deep targets would be embedded in that of the shallow targets and it would be challenging to reconstruct the fluorescence distribution of the deep targets. However, deep learning method could be practical in reconstructing the distribution of the deep targets.

Deep learning method has been widely used to improve the reconstruction performance of many imaging modalities such as bioluminescence tomography (BLT) [13], photo-acoustic tomography (PAT) [14], and FMT [15–18]. Huang et al. [15] proposed a method that combines deep convolutional neural network, gated recurrent unit and multiple layer perception. Li et al. [16] achieved fast FMT reconstruction by a ResGCN based on graph convolution networks with fewer parameters. Guo et al. [17] proposed a deep encoder-decoder network for end-to-end three-dimensional (3D) reconstruction of FMT. Zhang et al. [18] further improved the spatial resolution of 3D FMT reconstruction through 3-D fusion dual-sampling deep neural network. For FMT, it is proved that when trained on simulation data sets only, deep learning network with an encoder-decoder framework could directly reconstruct the fluorescence distribution with high performance from the measured fluorescent signal and may be generalized to the reconstruction of phantom experiments or in-vivo experiments. However, as far as we know, there have not been reports on deep learning reconstruction of fluorescence yield and lifetime distributions in time-domain FMT. And because the relationship between the fluorescence lifetime distribution and the fluorescent signal is nonlinear and much more complicated, simultaneous reconstruction of fluorescence yield and lifetime distributions by deep learning method remains challenging.

In this paper, we propose an end-to-end encoder-decoder network for simultaneous reconstruction of fluorescence yield and lifetime distributions with TD-rFMT, named En-De-YL. The network extracts feature from the organized temporal point spread functions (TPSF) collected on the surface within a field of view (FOV) of 50 mm and reconstructs the distributions of fluorescence yield and lifetime within a 70×30×10 mm³ region. A customized loss function is adopted in the supervised and self-supervised joint training of the network. The relationship and its linearization approximation among fluorescence yield distribution, lifetime distribution and the TPSF are applied to the loss function as the self-supervised terms, to combine the reconstruction of fluorescence yield and lifetime distributions and improve the reconstruction accuracy of lifetime distribution. Through the En-De-YL method, the reconstruction performance of TD-rFMT is enhanced with higher spatial resolution, more accurate reconstructed target position, more accurate reconstructed lifetime value, and faster reconstruction speed to meet the requirements of conventional small animal applications and possible extended applications such as surgical navigation and so on.

2. Methods

2.1 Network architecture and system setup

The network architecture of En-De-YL is depicted in Fig. 1. It consists of a 3D convolution encoder, 7 fully connected layers, and a 3D deconvolution decoder. An input block of the organized TPSFs is fed into the network to extract the hidden features, including positions, sizes, yield values, lifetime values and so on. Then the two-branch expansive decoder reconstructs the distribution of fluorescence yield and inverse lifetime from the hidden features. In this paper, the lifetime distribution is reconstructed as inverse lifetime to avoid singular value [8]. The decoder and fully connected layers branch from the feature layer and the two branches do not share coefficients. And in the end of the network, a mask is generated to ensure the spatial consistency between yield and lifetime distributions. Based on the outputs of the former layer, the mask is given by $f(\alpha ({Y_1} - {y_{gate}})) \times f(\alpha ({L_1} - li{f_{gate}})) + \varepsilon$. Y₁ and L₁ denote the outputs of the yield distribution branch and the inverse lifetime distribution branch respectively. f is the sigmoid function, α = 150, and ε = 0.001. y_gate = 0.15 and lif_gate = 0.5 ns⁻¹ is the threshold of Y₁ and L₁ respectively. All the coefficients above are empirically chosen. The fully connected layers transfer the topological information of the source-detector pairs into the spatial information of the distribution. The 5×32×72 reconstructed fluorescence distributions represent the distributions of a detected region with size of 10×32×72 mm³ (height × depth × width).

Fig. 1. Schematic illustration of the En-De-YL network architecture.

Download Full Size | PDF

In this paper, all the TPSFs are collected or generated based on the TD-rFMT system built in our previous work [8]. The system consists of a TCSPC module (SPC-150, Becker & Hickl GmbH, Germany), a PMT (PML-16-C, Becker & Hickl GmbH, Germany), a femto-second laser generator (Spectra-Physics, Newport Corporation, Canada) working at 780-nm wavelength (80 MHz, 100 fs pulse-width), a rectangular phantom, a 16×2 optical switch for fibers, and a group of filters, including an achromatic doublet (AC254-030-B, Thorlabs, Newton, NJ) and two bandpass fluorescence filters with center wavelength of 840 nm (ff01-840/12-25, Semrock, Rochester, NY; XBPA840, Asahi Spectra, Torrance, CA) [8]. As shown in Fig. 2, on the surface of the phantom, 11 excitation/detection points are uniformly placed within a 50-mm FOV with an adjacent distance of 5 mm. The FOV and number of points are chosen empirically for fluorescence distribution reconstruction within 25-mm depth. All the excitation/detection points are at the same height called the excitation height and the transverse section at the excitation height is the excitation plane as shown in Fig. 2(b). The cylindrical targets with height of 10 mm are placed symmetrically about the excitation plane. The excitation light is guided into the object through one of the 11 points in sequence, while the other 10 points serve as the detection points. Therefore, for each simulation or phantom experiment, there are 110 TPSFs. At the excitation plane, the locations of the targets are defined by the rectangular coordinate with the corner of the phantom as the origin. The depth and horizontal location of the targets are shown in Fig. 2(b). Specifically, the depth of a target is the distance from its center to the phantom surface.

Fig. 2. Schematic diagram of the phantom and experimental setup. (a) The rectangular phantom and the excitation /detection points. (b) Transverse view of the phantom.

Download Full Size | PDF

The 11 excitation/detection points are marked as 1-11 successively. Then the 110 TPSFs are organized by the order of the point position and formed into an 11×11×220 block, marked as TB₁, with dimensions corresponding to the excitation point position, the detection point position, and the time of the TPSF, respectively. Because the TPSFs in the same detection position as the excitation position can not be detected, the TPSFs in the diagonal are padded with zeros. The temporal resolution of the TPSF (Δt) is 25 ps, and the TPSFs in the block are between 0.75 ns and 6.25 ns. As shown in Fig. 1, the kernel sizes are larger in the third dimension for the TPSF with high temporal resolution. In the block, the adjacent TPSFs are spatially related, so the information could be extracted by convolution. The TB₁ is normalized by its maximum value for better training of the En-De-YL network. Otherwise, for TD-rFMT, because signals of targets at different depths vary from each other, the value range of the TPSF blocks of different data would be too large for network training. Furthermore, the peak value of TPSFs from different source-detector pairs in the same block are also in a large range, and the highest peak value are usually hundreds of times of the lowest peak value. In this case, the information of TPSFs with low peak value may be ignored by the network. Therefore, in order to make full use of the high dynamic range information of TPSFs, the normalized block TB₁ is multiplied by 20 and 400 and then truncated by the maximum value of TB₁ which is 1 to generate TB₂ and TB₃. The numbers 20 and 400 are empirically chosen. Then, TB₁, TB₂, TB₃, and the logarithm of the TB₁, which is marked as TB₄, are further formed into a 11×11×220×4 input block, marked as IB. The composition of the input block (IB) of the network is shown as follows:

T{B_2}\textrm{ = min(20} \times T{B_1},1)

T{B_3}\textrm{ = min(400} \times T{B_1},1)

T{B_4}\textrm{ = ln(}T{B_1} + \sigma ),\sigma = {10^{ - 6}}

IB\textrm{ = }\left[ {\begin{array}{cccc} {T{B_1}}&{T{B_2}}&{T{B_3}}&{T{B_4}} \end{array}} \right]

where TB₁ is the normalized 11×11×220 block as mentioned above. σ is a smooth factor to avoid singular values. The logarithm block TB₄ could provide more direct information of the lifetime value because the distribution of the inverse lifetime could be estimated from the logarithm of the tail of TPSF [8].

2.2 Loss function

A customized loss function is adopted in the training of the En-De-YL network, which is shown as follows:

(1)$$\begin{array}{l} Loss = Sc({Y^{\prime},Y} )+ Sc({L^{\prime},L} )+ {\lambda _1}Sc({S^{\prime},S} )+ {\lambda _2}||{l^{\prime} - l} ||_2^2 + {\lambda _3}||{Y^{\prime} - Y} ||_2^2\\ {\lambda _4}({||{{D_1}Y^{\prime}} ||_2^2\textrm{ + }||{{D_2}L^{\prime}} ||_2^2\textrm{ + }||M ||_2^2} )\textrm{ + }{\lambda _5}({TV({Y^{\prime}} )+ TV({L^{\prime}} )} )\end{array}$$

where Y’ and L’ denote the flattened reconstructed yield and inverse lifetime distributions, respectively. Y and L are the corresponding true yield and inverse lifetime distributions. Sc denotes the cosine similarity. S’ = Tpsf_i’(t_j+1) - Tpsf_i’(t_j), where i = 1, 2, …, 110 is the index of TPSF, j = 1, 2, …, 11 is the index of the time points chosen for loss function, and Tpsf_i’ is the ith predicted TPSF. The empirically chosen time points are 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.8, 3.6, 4.4, 5.2, and 6.0 ns. $Tps{f_i}^{\prime}(t )= \int\limits_r {Y^{\prime}(r ){W_i}({t,r} )\ast L^{\prime}(r ){e^{ - L^{\prime}(r )t}}dr}$, where r is the coordinate, t is the time, * denotes the temporal convolution, and W_i (t,r) is the ith forward model matrix generated by the telegraph equation [19]. Herein, S’ is the temporal first-order derivative of the predicted TPSF, which could improve the reconstruction performance [20]. S is the corresponding true value of S’, which is the temporal first-order derivative of the corresponding input TPSF. l is the slope of the logarithm of the TPSF tail, which is given by $l(i) = {{({\ln ({Tp\textrm{s}{f_i}({{T_1}} )} )- \ln ({Tp\textrm{s}{f_i}({{T_2}} )} )} )} / {({{T_1} - {T_2}} )}}$, where T₁ and T₂ are two time points late enough, and set as 4.85 ns and 5.25 ns in this paper. The logarithm of the TPSF tail is approximately linearly, and its slope, l, is strongly related to the inverse lifetime distribution. l’ is the linear approximation of l and is given by $l^{\prime}(i) = {{{W_i}({{T_1}} )({Y^{\prime} \circ L^{\prime}} )} / {{W_i}({{T_1}} )Y^{\prime}}}$, where W_i(T₁) is the row vector of W_i (t,r) and ${\circ}$ denotes the Hadamard product. D₁ and D₂ are the depth regularization diagonal matrices. The ith diagonal element of D₁ is given by D₁(i) = min(max(β(d - d₀), −200), 200), where d is the depth of the element, d₀ = 15 mm, and β = 30 mm⁻¹. And the ith diagonal element of D₂ is given by D₂(i) = min(max(β(d - d₀), 20), 300). Both D₁ and D₂ are empirically chosen. $M = f(\alpha (Y^{\prime} - {y_{gate}})) \times f(\alpha (L^{\prime} - li{f_{gate}})) + \varepsilon$ is the flattened mask, where f is the sigmoid function, α = 150, ε = 0.001, y_gate = 0.15, and lif_gate = 0.5 ns⁻¹ as mentioned above. TV denotes the total variation. In this paper, the values of the coefficients of the terms in the loss function are shown in Table 1.

Table 1. Coefficients of terms in the loss function

View Table | View all tables in this article

In the loss function, the cosine similarities of yield and lifetime distributions are utilized for reconstruction of relative distributions, which leads to a faster convergence speed during network training. For the reconstruction of the absolute lifetime distribution, the cosine similarity of the time derivative of TPSF, Sc(S’, S), is utilized because the absolute lifetime information could be found in the relative change of the TPSF. Moreover, Sc(S’, S) is a self-supervised loss term and may improve the generalization ability of the network [21]. However, due to the memory limit of GPU, only a small amount of time points which is no more than 11 time points as mentioned above could be chosen and used in Sc(S’, S), leading to the insufficiency of information of the absolute lifetime value. In this case, despite the deviation due to the linearization approximation of l, the mean square error between l’ and l is adopted to enhance the absolute lifetime reconstruction. To be noticed, the mean square error between l’ and l is also a self-supervised loss term. And the L2-norm depth regularization of lifetime distribution is used to further improve the accuracy of the reconstructed absolute lifetime value because the lifetime value would be influenced by the depth. On the other hand, the mean square error between Y’ and Y, and the L2-norm depth regularization of yield distribution are adopted to improve the accuracy of the relative yield values. In addition, the L2-norm regularization of the mask are adopted to increase the sparsity of the distributions, and the total variation are adopted to improve the image quality.

To demonstrate the linearization approximation of l’, the ith TPSF is formulated as :

(2)$$Tps{f_i}(t )= \int\limits_r {Y(r )W({t,r} )\ast L(r ){e^{ - L(r )t}}dr} \textrm{ = }H(t )\ast {L_i}{e^{ - {L_i}t}}$$

where L_i is the effective inverse lifetime for the ith measurement which is also the ith element of l, given by ${{({\ln ({Tp\textrm{s}{f_i}({{T_1}} )} )- \ln ({Tp\textrm{s}{f_i}({{T_2}} )} )} )} / {({{T_1} - {T_2}} )}}$. And on the TPSF tail when the exponential term dominates, it could be simplified as :

(3)$$\sum\limits_\textrm{r} {Y(r )L(r )W({{T_1},r} ){e^{ - A(r )t}} \ast {e^{ - L(r )t}}} = C \cdot {e^{ - {A_S}t}} \ast {L_i}{e^{ - {L_i}t}}$$

where $A(r )={-} {{d\log ({W({{T_1},r} )} )} / {dt}}$ and ${A_S} ={-} {{d\log ({W({{T_1}} )Y} )} / {dt}}$, in which T₁ is a time point on the tail of TPSF. Therefore,

(4)$$\sum\limits_\textrm{r} {\frac{{Y(r )L(r )W({r,{T_1}} )}}{{A(r )- L(r )}}({{e^{ - L(r )t}} - {e^{ - A(r )t}}} )} \textrm{ = }\frac{{C \cdot {L_i}}}{{{A_S} - {L_i}}}({{e^{ - {L_i}t}} - {e^{ - {A_S}t}}} )$$

And it is reasonable to make the assumption that L_i could be approximated by the distribution of L(r), and ${L_i} \approx \sum\limits_\textrm{r} {{k_r}L(r )}$, $\sum\limits_\textrm{r} {{k_r}} \textrm{ = }1$. For TD-rFMT, W(r,T₁) could be considered sparse with only about 10% of its values higher than one tenth of its maximum value. Moreover, when W(r,T₁) is higher than one tenth of its maximum value, the corresponding values of A(r) range from 1.8 ns⁻¹ to 2.0 ns⁻¹. And A_S is about 1.9 ns⁻¹. In addition, in this paper, the inverse lifetime usually ranges from 0.7 ns⁻¹ to 1.5 ns⁻¹. And because of the sparsity of the distribution, the number of the non-zero elements of Y(r) and L(r) are from 10 to 200. Based on the value ranges mentioned above, assuming that ${k_r}\textrm{ = }\frac{{W({{T_1},r} )Y(r )}}{{\sum {W({{T_1},r} )Y(r )} }}$, 1,00,000 times of random simulations of the Eq. (4) are carried out. About 80% of the relative errors given by ${{\left|{{L_i} - \sum\limits_\textrm{r} {{k_r}L(r )} } \right|} / {{L_i}}}$ are lower than 0.4. As a supplement to the self-supervised loss term Sc(S’, S), the errors of the linear approximation ${L_i} \approx \sum\limits_\textrm{r} {{k_r}L(r )}$ are acceptable. Therefore, in this paper, for the loss function, ${L_i} \approx l^{\prime}(i )= {{{W_i}({{T_1}} )({Y \circ L} )} / {{W_i}({{T_1}} )Y}}$.

2.3 Quantification metrics

To demonstrate the reconstruction performance quantitatively, the following quantification metrics are applied in this paper.

The intersection-over-union ratios (IoU) describes how close the shape and localization of the reconstructed targets are to their ground truth. IoU is defined as:

(5)$$I\textrm{o}U = {{({Ta{r_{recon}} \cap Ta{r_{true}}} )} / {({Ta{r_{recon}} \cup Ta{r_{true}}} )}}$$

where Tar_recon and Tar_true denote the area of the reconstructed targets and the true targets respectively. They are defined as the area with yield value higher than 0.05 of the maximum yield value of the target.

The positioning error (PE) describes how close the center of the reconstructed target is to the center of the true target. The PE is defined as follows.

(6)$$PE\textrm{ = }{||{{x_{recon}} - {x_{true}}} ||_2}$$

where x_true and x_recon denote the value-weighted center coordinate of the true targets and the reconstructed target, respectively. They are given by $x = {{\sum\limits_{r \in Tar} {Y(r)r} } / {\sum\limits_{r \in Tar} {Y(r)} }}$, where Y denotes the reconstructed fluorescence yield distribution, r is the coordinate, and Tar denotes the area of the targets.

The effective inverse lifetime (EIL) is the yield-value-weighted mean value of inverse lifetime, which is given by:

(7)$$EIL = {{\sum\limits_{r \in Tar} {Y(r)L(r)} } / {\sum\limits_{r \in Tar} {Y(r)} }}$$

where Y and L denote the reconstructed fluorescence yield and inverse lifetime distribution, respectively. r is the coordinate and Tar denotes the area of the target.

Also, because the fluorescence lifetime is an important quantitative indicator, the relative errors of EIL (EILE) are used to show the quantification accuracy of the reconstructed inverse lifetime value. EILE is defined as:

(8)$$EILE = {{|{EI{L_{\textrm{recon}}} - EI{L_{\textrm{true}}}} |} / {EI{L_{\textrm{true}}}}}$$

where EIL_recon and EIL_true denote the effective inverse lifetime of the reconstructed target and the true target, respectively.

The relative yield value (RV) is the ratio of the sum of the yield values of the targets. It is defined as:

(9)$$RV\textrm{ = }R{V_1}:R{V_2} = \sum\limits_{r \in Ta{r_1}} {Y(r)} :\sum\limits_{r \in Ta{r_2}} {Y(r)}$$

where Y denotes the reconstructed fluorescence yield distribution. r is the coordinate and Tar denotes the area of the targets.

The relative error of RV (RVE) describes the reconstruction performance on the relative yield values. It is defined as:

(10)$$RVE\textrm{ = }{{|{R{V_{\textrm{re}con}} - R{V_{true}}} |} / {R{V_{true}}}}$$

where RV_recon and RV_true denote the relative yield value of the reconstructed targets and the true targets, respectively. To be noticed, RVE could only be applied to the experiments and simulations with 2 targets.

3. Experiments and results

The simulation data for the training of En-De-YL are generated by finite element method through the telegraph equation [19]. Firstly, 4,800 sets of single-target simulations were generated. In the single-target simulations, horizontal locations of the targets range from 12 to 58 mm with increment of 2 mm, and depths of the targets range from 5 to 23 mm with increment of 2 mm. The inverse lifetime values of the targets range from 0.72 to 1.8 ns⁻¹ with increment of 0.12 ns⁻¹, while inverse lifetime random deviations which follow the Gaussian distribution N(0 ns⁻¹, 0.06 ns⁻¹) are added to increase the diversity of inverse lifetime values in the simulation data. So the inverse lifetime values of the targets applied in the single-target simulations follow the Gaussian distributions N(0.12n + 0.6 ns⁻¹, 0.06 ns⁻¹), n = 1,2,…,10, which is chosen according to the inverse lifetime values of Indocyanine Green (ICG) in different solvents. In addition, for each case, two radii values of the targets are chosen, with one from the uniform distributions U(2.5 mm, 3 mm) and the other from U(3 mm, 3.5 mm). Moreover, for the optical coefficients of the phantom in each single-target simulation, the absorption coefficient μ_a ∼ N(0.004 mm⁻¹, 0.001 mm⁻¹) and the reduced scattering coefficient μ_s’ ∼ U(0.9 mm⁻¹, 1 mm⁻¹) are also randomly chosen. Furthermore, because the TPSF is linear in the fluorescent yield distribution, utilizing the additivity and homogeneity of the TPSF, 500,000 sets of simulation data for training could be generated from the 4,800 sets of single-target data, with yield values varying from 0.5 to 2.0 and number of targets from 1 to 3. In the generated training data, the proportion of 2-targets simulations is about 60% while the proportion of 1-target simulations and 3-targets simulations are both about 20%. Then 1,000 sets of data are respectively generated for validation and testing. In addition, to approximate actual data, multiplicative noise, additive noise, and time deviation are applied to the generated data. The multiplicative noise that follows N(1, 0.05) simulates the magnification difference among measurements. And the additive noise was added to approximate the signal-to-noise ratio of the actual data, which is about 20 dB for targets at 20-mm depth. And the time deviation varies from −3 to 3 Δt.

The reconstruction performance of the En-De-YL network in the testing sets is shown in Fig. 3. Higher IoU indicates better overall target positioning of the reconstructed targets, including the accuracy of target center location and target shape. As shown in Fig. 3(a), the IoUs of the reconstructed targets decrease significantly as the depths of targets increase, especially in the multiple-targets simulations. The IoUs of the targets are mostly higher than 0.4, except for the deep targets in the 3-targets simulations. Also, as an important quantitative indicator, the quantification accuracy of the reconstructed inverse lifetime value is indicated by the relative errors of the effective inverse lifetime (EILE). It could be observed in Fig. 3(b) that the EILEs are mostly within 20%. In addition, even though the errors of relative yield value (RVE) are 27 ± 19% in the 2-targets simulations, accurate reconstructed target position and accurate reconstructed lifetime value are achieved by En-De-YL in the testing sets.

Fig. 3. The reconstruction performance of the En-De-YL network in testing sets. (a) IoUs and (b) relative error of the effective inverse lifetime (EIFE) of the targets.

Download Full Size | PDF

To further evaluate the performance of En-De-YL, 2 sets of phantom experiments and 100 sets of simulations on heterogeneous phantoms were carried out, in which there are two close cylindrical targets with different yield and lifetime values at different depths. In the phantom experiments, for different yield and lifetime values, the two targets with radius of 2.5 mm and height of 10 mm are filled with 10 μM ICG/dimethylsulphoxide (DMSO) and 2 μM ICG/alcohol (ACH) respectivley. According to our measurements, the inverse lifetime value of targets filled with 10 μM ICG/DMSO is 0.95 ns⁻¹, while that of 2 μM ICG/ACH is 1.45 ns⁻¹, and the ratio of the fluorescence yield of targets with 10 μM ICG/DMSO to targets with 2 μM ICG/ACH is 1:0.55. In phantom experiment 1, target 1 is at depth of 10 mm, with yield value of 0.55 and inverse lifetime value of 1.45 ns⁻¹, and target 2 is at depth of 20 mm, with yield value of 1 and inverse lifetime value of 0.95 ns⁻¹. The horizontal distance between the two targets is 10 mm, leading to an edge-edge distance of 9.1 mm. In phantom experiment 2, target 1 is at depth of 10 mm, with yield value of 1 and inverse lifetime value of 0.95 ns⁻¹, and target 2 is at depth of 20 mm, with yield value of 0.55 and inverse lifetime value of 1.45 ns⁻¹. The horizontal distance is set as 20 mm. The deep target in phantom experiment 2 has much weaker yield value and much higher inverse lifetime value than the shallow target, which results in non-negligible deviation in the lifetime distribution estimation of the deep target [8] and leads to larger reconstruction difficulty. Furthermore, the 100 sets of simulations were specially generated for quantification of the reconstruction performance. In the simulations, 2 non-uniform blocks with size of 10×10×20 mm³, μ_a ∼ N(0.004 mm⁻¹, 0.001 mm⁻¹) and μ_s’ ∼ U(0.9 mm⁻¹, 1 mm⁻¹) were randomly placed in the phantom. The shallow targets were located at depth ∼ U(5 mm, 15 mm) while the locations of the deep targets depend on the shallow targets. The horizontal distance between the targets follows U(0, 10 mm), and the depth distance follows U(5 mm, 12 mm). The targets have radii ∼ U(2.5 mm, 3.5 mm), yield values ∼ U(0.5, 2) and inverse lifetime values ∼ U(0.7 ns⁻¹, 1.9 ns⁻¹). In this case, the two targets in the simulations on heterogeneous phantoms would be close to each other and at different depths, which is difficult to be accurately reconstructed. And the same multiplicative noise, additive noise, and time deviation were added to the data. Then, the fluorescence yield and lifetime distributions of all the phantom experiments and simulations were reconstructed by both the En-De-YL network and the previously-proposed PrSP-TK-D/L1WSR method [8].

The quantification results of simulations on heterogeneous phantoms are shown in Fig. 4 and Table 2. As shown in Fig. 4(a), at any target center depths, the IoUs of targets reconstructed by En-De-YL are all higher than those by PrSP-TK-D. It could be more directly seen in Table 2 that IoUs by En-De-YL are nearly twice as high as those by PrSP-TK-D, and even more than four times for the deep targets, which is also shown in Fig. 4(d). Moreover, it could be found from the positioning error (PE) of the center of the targets in Table 2 that En-De-YL also performs better, especially for deep targets. As shown in Fig. 4(b), En-De-YL achieves smaller positioning errors at most target center depths except at about 9-mm depth. The IoUs and PEs indicate that better overall target positioning accuracy could be achieved by En-De-YL with more accurate target center location and target shape, and most importantly, its performance on the deep targets is good and barely deteriorate compared with its performance on the shallow targets. It could be concluded that higher spatial resolution is achieved by En-De-YL. Furthermore, the Pearson correlation coefficient (PPC) of the yield distribution in Table 2 shows that the overall image quality of the yield distribution reconstructed by En-De-YL is also better. To be mentioned, because of the noise added to the data, there are missed detections of the deep target in 1% of the yield distribution reconstructed by En-De-YL, while there are artifacts in 15% of the yield distributions reconstructed by PrSP-TK-D. The missed detected targets and artifacts would not be considered in the IOUs, PEs, and EIFEs. In addition to target position, the artifacts also contribute to the PPC difference between PrSP-TK-D and En-De-YL. For reconstruction performance of lifetime distribution, as shown in Table 2 and Fig. 4(c), the EIFE of En-De-YL are also significantly smaller than those of the previously-proposed L1WSR method. The En-De-YL method excels in accurate reconstruction of lifetime values at any target center depths, especially for deep targets. The EIFEs by En-De-YL are mostly lower than 20%, as shown in Fig. 4(c), which could be valuable because accurate absolute lifetime values could provide precise quantification of the biological tissue properties like pH and temperature. In the simulations on heterogeneous phantoms, it is proved that although the accuracy of the relative yield values by En-De-YL is similar to that of the previously-proposed method, the En-De-YL network could significantly increase the accuracy of reconstructed target position, especially in the case with close targets, demonstrating its improvement in spatial resolution, and could improve the reconstruction performance of lifetime distribution with more accurate quantification of absolute lifetime value. En-De-YL has largely increased the target positioning accuracy and the lifetime quantification accuracy of deep targets. The enhancement is valuable because accurate target location and accurate lifetime value can provide abundant important biological information.

Fig. 4. The reconstruction performance of the En-De-YL and PrSP-TK-D/L1WSR methods on the targets with different center depths in simulations on heterogeneous phantoms. (a)-(c) IoUs, PEs, and EILEs of all the targets. (d)-(f) IoUs, PEs, and EILEs of the deep targets.

Download Full Size | PDF

Table 2. Quantification Analysis of Overall Reconstruction Performances of En-De-YL and PrSP-TK-D/L1WSR in Simulations on heterogeneous phantoms

View Table | View all tables in this article

For the phantom experiments, although the yield and lifetime distributions are successfully reconstructed by the previously-proposed PrSP-TK-D and L1WSR methods, the En-De-YL network further enhances the reconstruction performance, which are in agreement with the results of the simulations on heterogeneous phantoms. In phantom experiment 1, as shown in Fig. 5(a) and (b), the target positions are more accurately reconstructed by En-De-YL, especially for target 2 at deeper depth. The PE and IoU of target 2 are 2.4 mm and 0.30 for En-De-YL, while they are 4.4 mm and 0.11 for PrSP-TK-D. In addition, as shown in Fig. 5(c) and (d), the quantification accuracy of the reconstructed inverse lifetime value of deep target is also evidently improved by En-De-YL, with EILE of 2.1%, while the EILE by L1WSR is 15.8%. Meanwhile, in phantom experiment 2, as shown in Fig. 6, the performance on target positioning and quantification accuracy of absolute lifetime values has been also significantly improved by En-De-YL. Because of the deviation in the estimation of lifetime distribution [8], the yield distribution reconstructed by PrSP-TK-D is not as good as expected with PEs of targets 1 and 2 being 3.0 mm and 4.4 mm respectively. But in the distribution reconstructed by En-De-YL, PEs of targets 1 and 2 are reduced to 0.1 mm and 1.3 mm respectively. Furthermore, for quantification accuracy of lifetime, the En-De-YL method achieves 1.1% EILE for the shallow target and 4.1% EILE for the deep target, which shows obvious improvement when compared with the results of the previously-proposed method. In spite of training without phantom data, the generation ability of En-De-YL ensures its good performance in the phantom experiments with accurate target positioning, accurate reconstructed relative yield values and accurate reconstructed absolute inverse lifetime values. En-De-YL could be applied to the reconstruction of phantom experiments and improve the reconstruction performance.

Fig. 5. The section images of the reconstructed distribution at the excitation height in phantom experiment 1. (a) Yield distribution and (c) inverse lifetime distribution reconstructed by En-De-YL. (b) Yield distribution and (d) inverse lifetime distribution reconstructed by PrSP-TK-D/L1WSR. The yellow and purple circles denote the true positions of the targets.

Download Full Size | PDF

Fig. 6. The section images of the reconstructed distributions at the excitation height in phantom experiment 2. (a) Yield distribution and (c) inverse lifetime distribution reconstructed by En-De-YL. (b) Yield distribution and (d) inverse lifetime distribution reconstructed by PrSP-TK-D/L1WSR. The yellow and purple circles denote the true positions of the targets.

Download Full Size | PDF

4. Conclusion

In conclusion, the En-De-YL network is capable of reconstructing the fluorescence yield and lifetime distributions simultaneously and enhancing the reconstruction performance in TD-rFMT. For TD-rFMT reconstruction, En-De-YL not only evidently increases the reconstructed target positioning accuracy and the spatial resolution, but also improves the reconstruction performance of lifetime by increasing the lifetime accuracy of targets. The improvements in deep region are especially significant. The supervised and self-supervised joint customized loss function and the noise added to the training data have improved the generalization ability of En-De-YL, which ensures its improvements in reconstruction performance of the phantom experiments. In addition, the En-De-YL network has fewer layers and fewer parameters compared with conventional deep learning reconstruction methods in FMT, hence the En-De-YL network could be trained with lower time cost. On the GPU server (2 × 48G Nvidia RTX A6000 GPU), En-De-YL could be trained within 2 hours, and the reconstruction time of a batch of data with batch size of 64 is as short as about 80 ms. The reconstruction speed is obviously improved by En-De-YL, but it is unnecessary to demonstrate in detail because deep learning methods always excel in reconstruction speed. All the significant enhancements suggest that the En-De-YL network may meet the requirements in animal experiments and in the clinic. To the best of our knowledge, En-De-YL is the first deep learning network developed for simultaneous reconstruction of fluorescence yield and lifetime distributions, and the reconstruction of lifetime distribution has indeed largely increased the training difficulty which has been dealt with in this paper. En-De-YL with TD-rFMT provides high-resolution yield distribution and accurate absolute lifetime distribution, along with reflective geometry that could circumvent the penetration limit and provide simpler scanning procedure, and it may be a vital tool in biological research, surgical navigation and so on. However, as mentioned above, because of the GPU memory limitation, there are not enough time points of TPSF utilized in the self-supervised training. Although it has been dealt with in this paper, the reconstruction accuracy of absolute lifetime may be further improved if more time points could be used in the self-supervised training. Moreover, the accuracy of relative yield values could also be further improved. In addition, for simplicity, the target geometry are all cylindrical in the paper. With more complex target geometry and system upgrades like k-space illumination [22], the spatial resolution in the horizontal direction may be further improved. And most importantly, in-vivo experiments should be carried out for further validation.

Funding

National Natural Science Foundation of China (61871022, 61871251, 62027901).

Acknowledgment

We would like to acknowledge the instrumental and technical support of the Multi-modal Biomedical Imaging Experimental Platform, Institute of Automation, Chinese Academy of Sciences.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. V. Ntziachristos, E. A. Schellenberger, J. Ripoll, D. Yessayan, E. Graves, A. Bogdanov, L. Josephson, and R. Weissleder, “Visualization of antitumor treatment by means of fluorescence molecular tomography with an annexin V-Cy5.5 conjugate,” Proc. Natl. Acad. Sci. U. S. A. 101(33), 12294–12299 (2004). [CrossRef]

2. K. Zhou, Y. Ding, I. Vuletic, Y. Tian, J. Li, J. Liu, Y. Huang, H. Sun, C. Li, Q. Ren, and Y. Lu, “In vivo long-term investigation of tumor bearing mKate2 by an in-house fluorescence molecular imaging system,” Biomed Eng Online 17(1), 187 (2018). [CrossRef]

3. J. A. Benitez, C. Zanca, J. Ma, W. K. Cavenee, and F. B. Furnari, “Fluorescence molecular tomography for in vivo imaging of glioblastoma xenografts,” J. Vis. Exp. (2018).

4. F. Stellari, A. Sala, F. Ruscitti, C. Carnini, P. Mirandola, M. Vitale, M. Civelli, and G. Villetti, “Monitoring inflammation and airway remodeling by fluorescence molecular tomography in a chronic asthma model,” J. Transl. Med. 13(1), 336 (2015). [CrossRef]

5. L. Li, Y. Du, X. J. Chen, and J. Tian, “Fluorescence molecular imaging and tomography of matrix metalloproteinase-activatable near-infrared fluorescence probe and image-guided orthotopic glioma resection,” Mol Imaging Biol 20(6), 930–939 (2018). [CrossRef]

6. X. Wang, B. Zhang, X. Cao, F. Liu, S. Liu, B. Shan, and J. Bai, “In vivo validation of dual-modality system for simultaneous positron emission tomography and optical tomographic imaging,” J Innov Opt Heal Sci 04(02), 165–171 (2011). [CrossRef]

7. P. Mohajerani, M. Koch, K. Thuermel, B. Haller, E. J. Rummeny, V. Ntziachristos, and R. Meier, “Fluorescence-aided tomographic imaging of synovitis in the human finger,” Radiology 272(3), 865–874 (2014). [CrossRef]

8. J. Cheng, P. Zhang, C. Cai, Y. Gao, J. Liu, H. Hui, J. Tian, and J. Luo, “Depth-recognizable time-domain fluorescence molecular tomography in reflective geometry,” Biomed. Opt. Express 12(7), 3806–3818 (2021). [CrossRef]

9. R. Endoh, M. Fujii, and K. Nakayama, “Depth-adaptive regularized reconstruction for reflection diffuse optical tomography,” Opt. Rev. 15(1), 51–56 (2008). [CrossRef]

10. M. Y. Berezin, K. Guo, W. Akers, R. E. Northdurft, J. P. Culver, B. Teng, O. Vasalatiy, K. Barbacow, A. Gandjbakhche, G. L. Griffiths, and S. Achilefu, “Near-Infrared Fluorescence Lifetime pH-Sensitive Probes,” Biophys. J. 100(8), 2063–2072 (2011). [CrossRef]

11. M. Y. Berezin and S. Achilefu, “Fluorescence Lifetime Measurements and Biological Imaging,” Chem. Rev. 110(5), 2641–2684 (2010). [CrossRef]

12. P. R. Kommidi and B. R. Reddy, “Fluorescence lifetime sensing of temperature,” in Advanced Environmental, Chemical, and Biological Sensing Technologies IV, T. VoDinh, R. A. Lieberman, and G. Gauglitz, eds. (Conference on Advanced Environmental, Chemical, and Biological Sensing Technologies IV, 2006).

13. Y. Gao, K. Wang, Y. An, S. X. Jiang, H. Meng, and J. Tian, “Non model-based bioluminescence tomography using a machine-learning reconstruction strategy,” Optica 5(11), 1451–1454 (2018). [CrossRef]

14. C. J. Cai, K. X. Deng, C. Ma, and J. W. Luo, “End-to-end deep neural network for optica l inversion in quantitative photoacoustic imaging,” Opt. Lett. 43(12), 2752–2755 (2018). [CrossRef]

15. C. Huang, H. Meng, Y. Gao, S. Jiang, K. Wang, and J. Tian, “Fast and robust reconstruction method for flfluorescence molecular tomography based on deep neural network,” Proc. SPIE, vol. 10881, Mar. 2019, 108811 K.

16. D. Li, C. Chen, J. Li, and Q. Yan, “Reconstruction of fluorescence molecular tomography based on graph convolution networks,” J. Opt. 22(4), 045602 (2020). [CrossRef]

17. L. Guo, F. Liu, C. Cai, J. Liu, and G. Zhang, “3D deep encoder-decoder network for fluorescence molecular tomography,” Opt. Lett. 44(8), 1892–1895 (2019). [CrossRef]

18. P. Zhang, G. Fan, T. Xing, F. Song, and G. Zhang, “UHR-DeepFMT: Ultra-high spatial resolution reconstruction of fluorescence molecular tomography based on 3-D fusion dual-sampling deep neural network,” IEEE T Med Imaging 40(11), 3217–3228 (2021). [CrossRef]

19. B. Zhang, X. Cao, F. Liu, X. Liu, X. Wang, and J. Bai, “Early-photon fluorescence tomography of a heterogeneous mouse model with the telegraph equation,” Appl. Opt. 50(28), 5397–5407 (2011). [CrossRef]

20. J. Cheng, C. Cai, and J. Luo, “Reconstruction of high-resolution early-photon tomography based on the first derivative of temporal point spread function,” J. Biomed. Opt. 23(6), 1 (2018). [CrossRef]

21. J. Zhang, Q. He, Y. Xiao, H. Zheng, C. Wang, and J. Luo, “A general framework for inverse problem solving using self-supervised deep learning: validations in ultrasound and photoacoustic image reconstruction,” 2021 IEEE International Ultrasonics Symposium (IUS), 4 (2021).

22. N. I. Nizam, M. Ochoa, J. T. Smith, and X. Intes, “3D k-space reflectance fluorescence tomography via deep learning,” Opt. Lett. 47(6), 1533–1536 (2022). [CrossRef]

		PCC	IoU	PE(mm)	EIFE	RVE
En-De-YL	All targets	0.75 ± 0.08	0.56 ± 0.14	1.66 ± 0.67	9.5 ± 9.6%	38 ± 24%
En-De-YL	Deep targets	0.75 ± 0.08	0.49 ± 0.15	1.86 ± 0.83	13 ± 12%	38 ± 24%
PrSP-TK-D /L1WSR	All targets	0.35 ± 0.17	0.30 ± 0.24	3.14 ± 2.60	19 ± 18%	39 ± 22%
PrSP-TK-D /L1WSR	Deep targets	0.35 ± 0.17	0.11 ± 0.13	5.04 ± 2.37	29 ± 20%	39 ± 22%

		PCC	IoU	PE(mm)	EIFE	RVE
En-De-YL	All targets	0.75 ± 0.08	0.56 ± 0.14	1.66 ± 0.67	9.5 ± 9.6%	38 ± 24%
En-De-YL	Deep targets	0.75 ± 0.08	0.49 ± 0.15	1.86 ± 0.83	13 ± 12%	38 ± 24%
PrSP-TK-D /L1WSR	All targets	0.35 ± 0.17	0.30 ± 0.24	3.14 ± 2.60	19 ± 18%	39 ± 22%
PrSP-TK-D /L1WSR	Deep targets	0.35 ± 0.17	0.11 ± 0.13	5.04 ± 2.37	29 ± 20%	39 ± 22%

Encoder-decoder deep learning network for simultaneous reconstruction of fluorescence yield and lifetime distributions

Abstract

1. Introduction

2. Methods

2.1 Network architecture and system setup

2.2 Loss function

2.3 Quantification metrics

3. Experiments and results

4. Conclusion

Funding

Acknowledgment

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Tables (2)

Equations (14)

Biomedical Optics Express