Experimental Bioluminescence Tomography with Fully Parallel Radiative-transfer-based Reconstruction Framework

Yujie Lu; Hidevaldo B. Machado; Ali Douraghy; David Stout; Harvey Herschman; Arion F. Chatziioannou

doi:10.1364/OE.17.016681

1. Introduction

Bioluminescence Imaging (BLI) has become an increasingly important tool for in vivo preclinical research [1][2]. Currently, planar (two-dimensional) BLI is the conventional imaging modality used in optical imaging. Optical photons emitted from bioluminescence sources in the small animal body (usually mice) propagate in biological tissues and are detected once emitted from the subject surface. Since in vivo biological tissues have high absorption and scattering characteristics, planar BLI only indirectly reflects the activity of the targeted biological object via the surface photon distribution [3]. Optical signals are easily confounded by the tissue properties in in vivo mouse experiments, especially when monitoring the initiation and progression of tumors over time within the same subject. The aim of BLT is to quantitatively acquire 3D information of bioluminescence sources, significantly improving the information quality of bioluminescence imaging.

Monte Carlo (MC) methods used with statistical models can obtain very precise simulations by tracking photon propagation in biological tissues [4]. However, these methods are severely time-consuming. Solving the radiative transfer equation (RTE) (i.e. Boltzmann equation) can also obtain precise simulation results as there is no Poisson noise in the simulation data [5]. Currently, nearly all 3D BLT reconstruction algorithms are based on the diffusion equation, which is a simple approximation to the RTE [6][7] [8][9]. Simulation and experimental reconstructions have shown that the diffusion approximation (DA) introduces significant artifacts in BLT reconstruction [3][10]. Therefore, it is necessary to develop high-order approximation-based reconstruction methods to improve BLT reconstruction. The first- and second-order formulations of the RTE are usually used to directly solve the RTE [5]. Discrete ordinates (S_N) and spherical harmonics (P_N) methods, as two usual numerical approximations, can yield simulation solutions based on two types of formulations. Compared with first-order formulations, the operators acting on the second-order forms such as the even-odd parity (EOP) equations are self-adjoint [5]. This distinct advantage provides a straightforward application of the finite element methods (FEM) easily executed on complex heterogeneous geometries [11]. Furthermore, the acquired FEM matrix in the second-order equations is sparse positive-definite (SPD), which yields better numerical stability and efficiency and benefits the development of reconstruction algorithms [12]. In order to generate an accurate simulation model, and regardless of the first- and second-order equations, one has to set N as large as possible and then N(N+2) and (N+1)² coupled equations corresponding to S_N and P_N methods need to be solved. This computational complexity, especially for the whole-body of small animals, creates a substantial challenge in the development of novel BLT reconstruction algorithms. Recently, a novel type of second-order approximation form, the simplified spherical harmonics (SP_N) method, has been developed for optical imaging [13], improving computational efficiency. Furthermore, a fully parallel adaptive FEM method was proposed to improve the simulation speed [14]. However, to obtain more accurate BLT reconstructions, a novel BLT reconstruction framework needs to be developed with the radiative transfer-based approximations to the RTE.

BLT is an inverse source problem and in the general case, its solution is not unique [15]. A priori information plays an indispensable role in BLT reconstruction. Among the various types of a priori information, multispectral measurements are important for achieving BLT reconstructions [16][17][18][19]. However, spectrally-resolved data sets can significantly increase the computational burden, especially when non-contact measurements are made using highly sensitive CCD cameras which acquire detailed surface photon distribution. In addition, due to the curved surface topography of the mouse and the necessity of using the heterogeneous characteristics of mouse tissues, numerical reconstruction algorithms, such as those based on FEM, are more suitable compared to analytical and statistical modeling-based methods [4].

BLT reconstruction is, in principle, similar to that of single photon emission computed tomography (SPECT) and positron emission tomography (PET) imaging. Therefore, reconstruction algorithms appropriate for PET and SPECT can be introduced to realize the BLT reconstruction [17]. In this case, the system response P-matrix needs to be computed, which is a very time-consuming step, although it can be obtained prior to acquiring the measured data. The BLT reconstruction is sensitive to various factors. Precalculating the P-matrix can affect the reconstruction quality due to the use of different heterogeneous geometries between the calculation and the experiment. Diffuse optical tomography (DOT) has been investigated for several decades and its reconstruction algorithms are easily applied to BLT. In this case, the Jacobian matrix needs to be calculated for each iteration, which is time-consuming [7]. Since the BLT problem is linear, the least-square problem can be solved to realize the BLT reconstruction by establishing the linear relationship between the unknown source variable and the measured surface data [6]. Until now, this method has not been extended to the radiative-transfer-based model. In addition, matrix inversion needs to be performed when using this strategy. Additional investigations should be performed to improve the reconstruction speed.

In this paper, a radiative-transfer-based fully parallel BLT reconstruction framework is developed using simplified spherical harmonics (SP_N) equations. This framework uses finite element methods to process complex reconstruction domain geometries. The linear relationship between the unknown source and the spectrally-resolved measured data using the SP_N approximation is established to achieve BLT reconstruction. To improve the reconstruction speed and enable BLT reconstruction for the whole body of the mouse, the finite element-based matrices are stored and operated in a parallel distribution mode. Furthermore, for the time-consuming problems of the key steps in the reconstruction, corresponding improvements are also performed, which significantly accelerate the reconstruction. Timing analysis demonstrates the improved performance of the proposed framework. Experimental reconstructions using mouse-shaped phantoms and real mice show the potential of this framework for practical BLT applications. The next section introduces the proposed fully parallel framework using the SP_N approximation. In the third section, the performance tests and analysis are described and experimental BLT reconstructions also are demonstrated. Finally, we discuss relevant issues.

2. Methods

2.1. Radiative transfer equation and SPN approximation

The radiative transfer equation (RTE) is an approximation to Maxwell’s equations. In bioluminescence imaging, the source intensity is generally assumed to be time invariant during the data acquisition. In addition, the photons at different wavelengths are considered to be independent, therefore we get

\hat{s} \cdot \nabla ψ (r, \hat{s}, λ) + (μ_{s} (r, λ) + μ_{a} (r, λ)) ψ (r, \hat{s}, λ)

where ψ(r, ŝ,λ) denotes photons in the unit volume traveling from point r in direction ŝ. Based on the principle of energy conservation, the RTE suggests that the radiance ψ(r, ŝ,,λ) is equal to the sum of all factors affecting it (including absorption µ_a(r,λ), scattering µ_s(r,λ), and source energy S(r, ŝ,λ)) when light photons cross a unit volume [20]. p(ŝ, ŝ′) is the scattering phase function and gives the probability of a photon scattering anisotropically from direction ŝ′ to direction ŝ. Generally, the Henyey-Greenstein (HG) phase function is used to characterize this probability [21]:

p (\cos θ) = \frac{1 - g^{2}}{4 π {(1 + g^{2} - 2 g \cos θ)}^{3 ⁄ 2}}

where g is the anisotropy parameter; cosθ denotes the scattering angle and is equal to ŝ·ŝ′ when we assume that the scattering probability only depends on the angle between the incoming and scattering directions. When photons reach the body surface of a mouse, that is r∈∂Ω, some of them are reflected and cannot escape from the mouse body Ω because of the mismatch between the refractive indices n_b for Ω and n_m for the external medium. When the incidence angle θ_b from the mouse body is not larger than the critical angle θ_c(θ_c=arcsin(n_m/n_b) based on Snell’s law), the reflectivity R(cosθ_b) is given by [22]:

R (\cos θ_{b}) = \frac{1}{2} [\frac{\sin^{2} (θ_{b} - θ_{m})}{\sin^{2} (θ_{b} + θ_{m})} + \frac{\tan^{2} (θ_{b} - θ_{m})}{\tan^{2} (θ_{b} + θ_{m})}]

where θ_m is the transmission angle. Furthermore, we can get the exiting partial current J+(r) at each boundary point r [13]:

J^{+} (r, λ) = \int_{\hat{s} \cdot v > 0} [1 - R (\hat{s} \cdot v)] (\hat{s} \cdot v) ψ (r, \hat{s}, λ) d \hat{s}

where v is the unit outer normal vector. After a series of deductions with the P_N method, the SP_N approximation is obtained [13]

- (\frac{n + 1}{2 n + 1}) \nabla \cdot \frac{1}{μ_{a, n + 1} (λ)} \nabla ((\frac{n + 2}{2 n + 3}) ϕ_{n + 2} (λ) + (\frac{n + 1}{2 n + 3}) ϕ_{n} (λ))

where µ_a,n(λ)=µ_a(λ)+µ_s(λ)(1-gⁿ); when ψ(λ) is expanded by the P_N approximation, ϕ_n(λ) are the Legendre moments of ψ(λ) (2≤n≤N, N is an odd positive integer). Although the SP_N solution is asymptotic and cannot converge to an exact radiative transfer solution with the increase of N, the simulation results have shown good agreement between the SP₇ approximation and Monte Carlo methods [14]. Through some further deductions, (N +1)/2 boundary conditions can be obtained corresponding to (N+1)/2 Eqs. (5). These boundary conditions are mixed and consist of linear combinations of the even-order ϕ_n and their first derivatives. With respect to the composite moments φ_n of ϕ_n, which are

φ_{1} = ϕ_{0} + 2 ϕ_{2},

We can get the general equations of the SP_N approximation and its boundary conditions when practical measurements are performed at the wavelength λ_k using bandpass filter:

- \nabla \cdot 𝓒_{i, \nabla φ_{i}} (λ_{k}) \nabla φ_{i} (λ_{k}) + \sum_{j = 1}^{(N + 1) ⁄ 2} 𝓒_{i, φ_{j}} (λ_{k}) φ_{j} (λ_{k}) = 𝓒_{i, S} (λ) S_{i} (λ_{k})

\sum_{j = 1}^{(N + 1) ⁄ 2} 𝓒_{i, \nabla φ_{j}}^{b} (λ_{k}) v \cdot φ_{j} (λ_{k}) = \sum_{j = 1}^{(N + 1) ⁄ 2} 𝓒_{i, φ_{j}}^{b} (λ_{k}) φ_{j} (λ_{k}) i \in [1, (N + 1) ⁄ 2]

where $𝓒_{i, \nabla φ_{i}} (λ_{k})$ , $𝓒_{i, \nabla φ_{j}} (λ_{k})$ , $𝓒_{i, {\nabla φ}_{j}}^{b} (λ_{k})$ , and $𝓒_{i, \nabla φ_{j}}^{b} (λ_{k})$ can be calculated. The details and the above coefficients of the SP ₁ to SP₇ approximations can be found in [13].

2.2. Fully parallel reconstruction algorithm

Fig. 1. The flowchart of the proposed fully parallel framework.

Download Full Size | PDF

Based on the finite element analysis, we can get a general weak formulation for the SP_N approximation [23]:

\int_{Ω} {𝓒_{i, \nabla φ_{i}} (λ_{k}) \nabla φ_{i} (λ_{k}) \cdot \nabla υ + \sum_{j = 1}^{(N + 2) ⁄ 2} 𝓒_{i, φ_{j}} (λ_{k}) φ_{j} (λ_{k}) \cdot υ} d Ω

To avoid the processing of v·φ_i in boundary integration, we assume v·φ_i are unknown variables in the boundary equations (Eq. (7b)). We obtain $f_{v \cdot φ_{i}} (\cdot)$ by solving a set of first order equations. The boundary conditions can be easily processed using this method.

Figure 1 shows the flowchart of the proposed fully-parallel framework. Fully-parallel reconstruction means that almost all of the steps in the reconstruction framework should work in parallel mode. After the reconstruction domain Ω is discretized into the volumetric mesh T, the next step is to partition this mesh into N_c mesh subdomains 𝓣_c(1≤𝓣_c≤N_c), where N_c is the number of the utilized CPUs. Regarding the finite element implementation, the space of linear finite elements 𝓥 is introduced on 𝓣.φ_i(λ_k) and S_i(λ_k) are approximated:

φ_{i} (r, λ_{k}) \approx \sum_{p = 1}^{N_{𝒫}} φ_{i, p} (λ_{k}) υ_{p} (r)

S_{i} (r, λ_{k}) \approx \sum_{p = 1}^{N_{𝒫}} S_{i, p} (λ_{k}) υ_{p} (r)

where φ_i,p(λ_k) and s_i,p(λ_k) are the discretized values at a discretized point p when using the basis function υ_p(r); N𝓟 is the total number of the discretized points over the entire domain. Considering Eqs. (8), (9a), and (9b), for a volumetric element τ_e, we have

[\begin{array}{c} m_{{1 φ}_{1}} (λ_{k}) & m_{{1 φ}_{2}} (λ_{k}) & \dots & m_{{1 φ}_{(N + 1) ⁄ 2}} (λ_{k}) \\ m_{{2 φ}_{1}} (λ_{k}) & m_{{2 φ}_{2}} (λ_{k}) & \dots & m_{{1 φ}_{(N + 1) / 2}} (λ_{k}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ m_{(N + 1) ⁄ 2 φ_{1}} (λ_{k}) & m_{(N + 1) ⁄ 2 φ_{2}} (λ_{k}) & \dots & m_{(N + 1) ⁄ 2 φ_{(N + 1) / 2}} (λ_{k}) \end{array}] [\begin{array}{c} φ_{1, τ_{e}} (λ_{k}) \\ φ_{2, τ_{e}} (λ_{k}) \\ ⋮ \\ φ_{(N + 1) ⁄ 2, τ_{e}} (λ_{k}) \end{array}] =

where

$m_{{iφ}_{j}} (λ_{k}) {\begin{array}{c} \begin{array}{c} \begin{array}{c} \int_{τe} {𝓒_{i, \nabla φ_{i}} (λ_{k}) \nabla υ_{p} \cdot \nabla υ_{p} + 𝓒_{i, φ_{i}} (λ_{k}) υ_{p} υ_{q}} d r \\ - \int_{\partial τ_{e}} 𝓒_{i, \nabla φ_{i}} (λ_{k}) f_{v \cdot φ_{i}} (υ_{p}) υ_{q} d r \end{array} & if i = j \\ \int_{τ_{e}} 𝓒_{i, φ_{j}} (λ_{k}) υ_{p} υ_{q} d r - \int_{\partial τ_{e}} 𝓒_{i, \nabla φ_{i}} (λ_{k}) f_{v \cdot φ_{i}} (υ_{p}) υ_{q} d r & if \neq j \end{array} \end{array}$

and

$b_{i, φ_{i}} (λ_{k}) = \int_{τ_{e}} υ_{p} υ_{q} d r$

∂τ_e is the boundary element if τ_e is on the boundary and belongs to the respective subdomain in the parallel implementation. After assembling the submatrices on all the elements, we get

[\begin{array}{c} M_{1 φ_{1}}^{𝒯_{c}} (λ_{k}) & M_{1 φ_{2}}^{𝒯_{c}} (λ_{k}) & \dots & M_{{1 φ}_{(N + 1) ⁄ 2}}^{𝒯_{c}} (λ_{k}) \\ M_{{2 φ}_{1}}^{𝒯_{c}} (λ_{k}) & M_{{2 φ}_{2}}^{𝒯_{c}} (λ_{k}) & \dots & M_{{2 φ}_{(N + 1) ⁄ 2}}^{𝒯_{c}} (λ_{k}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ M_{(N + 1) ⁄ 2 φ_{1}}^{𝒯_{c}} (λ_{k}) & M_{(N + 1) ⁄ 2 φ_{2}}^{𝒯_{c}} (λ_{k}) & \dots & M_{(N + 1) ⁄ 2 φ_{(N + 1) ⁄ 2}}^{𝒯_{c}} (λ_{k}) \end{array}] [\begin{array}{c} φ_{1}^{𝒯_{c}} (λ_{k}) \\ φ_{2}^{𝒯_{c}} (λ_{k}) \\ ⋮ \\ φ_{(N + 1) ⁄ 2}^{𝒯_{c}} (λ_{k}) \end{array}] =

After inverting the entire matrix at the left side of Eq. (11), we have

[\begin{array}{c} φ_{1}^{𝒯_{c}} (λ_{k}) \\ φ_{2}^{𝒯_{c}} (λ_{k}) \\ ⋮ \\ φ_{(N + 1) ⁄ 2}^{𝒯_{c}} (λ_{k}) \end{array}] = [\begin{array}{c} \sum_{j = 1}^{(N + 1) ⁄ 2} 𝓒_{j, S} {IM}_{{1 φ}_{j}}^{𝒯_{c}} (λ_{k}) \cdot B^{𝒯_{c}} \cdot S^{𝒯_{c}} (λ_{k}) \\ \sum_{j = 1}^{(N + 1) ⁄ 2} 𝓒_{j, S} {IM}_{{2 φ}_{j}}^{𝒯_{c}} (λ_{k}) \cdot B^{𝒯_{c}} \cdot S^{𝒯_{c}} (λ_{k}) \\ ⋮ \\ \sum_{j = 1}^{(N + 1) ⁄ 2} 𝓒_{j . S} {IM}_{(N + 1) ⁄ 2 φ_{j}}^{𝒯_{c}} (λ_{k}) \cdot B^{𝒯_{c}} \cdot S^{𝒯_{c}} (λ_{k}) \end{array}]

Where ${IM}_{{iφ}_{j}}^{𝒯_{c}} (λ_{k})$ are the submatrices of the entire inverse matrix IM(λ_k) corresponding to $M_{{iφ}_{j}}^{𝒯_{c}} (λ_{k})$ . Note that matrix inversion is calculated with respect to the entire matrix M(λ_k) although ${IM}_{{iφ}_{j}}^{𝒯_{c}} (λ_{k})$ are used to describe the subdomain submatrices. Because matrix inversion is always time-consuming, to accelerate the reconstruction, direct and iterative matrix inversion methods are compared to optimize the execution time. Details can be found in Section 2.3. After removal of the rows in the matrices $\sum_{j = 1}^{(N + 1) ⁄ 2} 𝓒_{j, S} {IM}_{{iφ}_{j}}^{𝒯_{c}} (λ_{k}) \cdot B^{𝒯_{c}} \cdot S^{𝒯_{c}} (λ_{k})$ corresponding to the non-boundary measurable discretized points, we further manipulate Eq. (4) to get [13]

J^{𝒯_{c}, +, b} (λ_{k}) = \sum_{j = 1}^{(N + 1) ⁄ 2} β_{j} (λ_{k}) φ_{j}^{𝒯_{c,} b} (λ_{k}) = \sum_{j = 1}^{(N + 1) ⁄ 2} β_{j} (λ_{k}) G_{j}^{𝒯_{c}} (λ_{k}) S^{𝒯_{c}} (λ_{k})

where β_j(λ_k) can be calculated with respect to Eq. (4) when ψ(r, ŝ,λ) is expanded; $G_{j}^{𝒯_{c}} (λ_{k})$ are the corresponding matrices on subdomain 𝓣_c after removing the rows in Eqs. (12). When the surface optical data are collected at K wavelengths, we get

J^{𝒯_{c}, +, b} = 𝒜^{𝒯_{c}} S^{𝒯_{c}}

where

J^{𝒯_{c}, +, b} = [\begin{array}{c} J^{𝒯_{c}, +, b} (λ_{1}) \\ ⋮ \\ J^{𝒯_{c}, +, b} (λ_{k}) \\ ⋮ \\ J^{𝒯_{c}, +, b} (λ_{K}) \end{array}], 𝒜^{𝒯_{c}} = [\begin{array}{c} γ_{1} G^{𝒯_{c}} (λ_{1}) \\ ⋮ \\ γ_{k} G^{𝒯_{c}} (λ_{k}) \\ ⋮ \\ γ_{K} G^{𝒯_{c}} (λ_{K}) \end{array}]

$𝒜^{𝒯_{c}}$ is the relationship matrix between $J^{𝒯_{c}, +, b}$ and $S^{𝒯_{c}}$ ; γ_k is the percentage at the wavelength λk of the total energy. It is usually considered as an ill-conditioned matrix because of the ill-posedness of BLT. The surface measured data $J^{𝒯_{c}, +, m}$ corresponding to $J^{𝒯_{c}, +, b}$ will likely lead to reconstruction failure when solving Eq. (14) directly due to the noise factor. Through solving the bound-constrained least-square problem

min_{0 < S^{𝒯_{c}} < S^{\sup}} Θ (S^{𝒯_{c}}) : {∥ 𝒜^{𝒯_{c}} S^{𝒯_{c}} - J^{𝒯_{c}, +, m} ∥}^{2} + δ η (S^{𝒯_{c}})

We can generate the BLT reconstruction, where S^sup is the upper bound of the source density; δ the regularization parameter; and η(·) the penalty function. Since all the data in Eq. (16) are distributed on N_c CPUs, the optimization algorithms should work in parallel mode.

2.3. Further demonstration

Overall, BLT reconstruction can be obtained by solving Eq. (16) using parallel optimization algorithms. Although a fully parallel spectrally-resolved reconstruction framework can be realized, performance optimization is necessary to accelerate the reconstruction. With respect to the steps of the proposed framework, three aspects are further demonstrated:

Load balancing is critical for high performance parallel reconstructions. If there are large differences between the sizes of the mesh subdomains on different CPUs, the performance is adversely affected. At the beginning of the proposed framework, mesh partitioning is an important step. In this framework, a multilevel k-way method is used to perform this [24].

The method results in improved performance by reducing the dimensions of the mesh, partitioning it into smaller sizes, and refining it to the original one. Another consideration is the distribution of the relationship matrix 𝓐 between the spectrally-resolved measured data and the unknown source variable. The distribution of the measurable boundary discretized points is not uniform on each CPU. In addition, $𝒜^{𝒯_{c}}$ is formed by combining K $K G^{𝒯_{c}}$ . The redistribution of 𝓐 is necessary in order to optimize the performance of parallel optimization algorithms.

Matrix inversion is the major component in the Relationship Forming step. Although it is very time-consuming, it is unavoidable in the proposed framework. Furthermore, with the increase of the number N in the expansion to the radiance, the dimensions of the matrix M(λ_k) become very large. It is essential to find a very efficient inversion matrix algorithm [25]. Since M(λ_k) is sparse positive-definite, LU factorization has good performance in various direct inversion algorithms. Another strategy is to use iterative methods. Preconditioning strategies in recent years have significantly improved the performance of iterative methods [26]. Since the performance of two strategies depends on the matrix properties, it is necessary to decide the type of strategy suitable for the specified problem.

Optimization methods Eq. (16) is a least squares problem, and it is easy to obtain the Hessian matrix in Newton-type optimization methods [27]. However, due to the large scale of the problem, a significant amount of memory is required during the optimization procedure. Even if the Hessian matrix can be calculated at each iteration, the process is extremely time-consuming. In addition, when computing the search direction, it is necessary to invert the Hessian matrix, which is also time-consuming. The speed of BLT reconstruction is therefore severely affected using Hessian matrix-based optimization algorithms. One solution to this problem is to use quasi-Newton methods. Generally, these methods build up an approximate Hessian matrix by using gradient and iteration algorithms. This approximate matrix is obtained by vector-vector multiplications in real time and is easy to inverse, saving memory and time costs. Here, the limited memory variable metric bound constrained quasi-Newton method (BLMVM) in parallel mode is used for BLT reconstruction [28].

3. Results

The bioluminescence imaging experiments were performed on a Maestro 2 in vivo imaging system (CRI, Woburn, Massachusetts). This system uses a cooled CCD camera with a custom lens as the detector. The distinct characteristic of this system is that a liquid crystal tunable filter (LCTF) is used to acquire multispectral data. Generally, the filter bandpass width was set to 20nm and the optical data was collected from a single view. The exposure time for each wavelength was adjusted to obtain high signal-to-noise ratio (SNR). After completing each optical signal acquisition, the phantom or mouse were scanned using an X-ray microCAT system (Siemens Preclinical Solutions, Knoxville, TN) to obtain CT images. The software Amira (Mercury Computer Systems, Inc. Chelmsford, MA) used the CT images to generate volumetric meshes for BLT reconstructions.

The framework was implemented in libMesh [29]. LibMesh is an open-source, high-quality software package and is developed to meet the needs of parallel FEM-based simulation. LibMesh provides almost all of the components used in parallel PDE-based simulation with unstructured discretization. Its design concept is to use existing software packages as far as possible. PETSc developed by Argonne National Laboratory (ANL) was used to solve the linear systems in parallel mode [30]. By default, an open-source serial graph partitioning package, METIS, realizing the multilevel k-way partitioning algorithm was used to partition the whole domain in libMesh [24]. In addition, in order to observe the effect of the model errors in the reconstruction quality, the regularization parameter δ was set to zero in the reconstructions. In order to test the proposed framework, we selected the SP ₁(DA), SP ₃ and SP₇ approximations to perform BLT reconstructions. All the simulations were performed on a cluster of 27 nodes (2 CPUs of 3.2GHz and 4 GB RAM at each node).

3.1. Mouse-shaped phantom experiments

Table 1. Optical properties of a mouse-shaped phantom and actual mouse muscle

View Table | View all tables in this article

Fig. 2. (a) Photograph of Caliper mouse-shaped phantom in a Maestro 2 system; (b) and (c) are the photon distribution at 580nm and 640nm corresponding to (a).

Download Full Size | PDF

Fig. 3. (a) shows the volumetric mesh and the mapped photon distribution at 640nm. (b) is the mesh partitioning results when 10 CPUs are used in BLT reconstruction.

Download Full Size | PDF

In the first case, a commercial mouse-shaped phantom (Caliper Life Sciences, Hopkinton, Massachusetts, USA) was used to acquire multispectral data. The phantom was fabricated from a polyester resin, TiO₂ and Disperse Red. To imitate the bioluminescence source, an optical fiber coupled to a green LED was embedded within the phantom. The emission spectrum of the LED was similar to that of a bioluminescence source. Its wavelength range was from 500nm to 700nm with a peak at about 567nm. The photon distribution data at two wavelengths (580nm and 640nm) were used in the BLT reconstruction. Table 1 shows the optical properties (µ_a and µ′_s) at two wavelengths measured with the inverse adding-doubling method [31]. For the high-order SP_N approximations, we set the anisotropic parameter g to 0.9. More detailed information about this phantom can be obtained elsewhere [16]. Figure 2(a) shows a photograph of the phantom in the Maestro 2 system. To avoid the curved surface effect in the measured data, the bottom flat surface was used as the detection surface. Figures 2(b) and 2(c) show the photon distribution at 580nm and 640nm respectively. They were acquired using an exposure time of 5min. There are distinct differences between them because of the different optical properties at different wavelengths, which benefit the 3D source localization.

Fig. 4. BLT reconstructions with SP_N approximations. (a), (b) and (c) are the reconstructed results corresponding to SP ₁(DA), SP ₃ and SP₇. Green dotted lines are used to align the boundaries of CT slices with those of reconstructed slices. Thin red lines pass through the center of the source in CT slices. (Unit: mm)

Download Full Size | PDF

With respect to the photon distribution, about two-thirds of the overall phantom was selected for mesh generation. The volumetric mesh, as shown in Fig. 3(a), contained 4,969 nodes and 21,348 tetrahedral elements. Figure 3(a) also shows that the photon distribution was mapped on the mesh surface using a manual co-registration method. Figure 3(b) shows the results after the mesh was partitioned when 10 CPUs were used for the BLT reconstruction. The number of discretized points in each subdomain is similar, avoiding a load imbalance. Figure 4 further shows the reconstructed results based on SP ₁, SP ₃ and SP₇ approximations. Due to the absence of a regularization parameter, the SP ₁-based reconstruction was very sensitive to the measured noise and we could not obtain good source localization, as shown in Fig. 4(a). Figures 4(b) and 4(c) show the reconstructed results when SP ₃ and SP₇ approximations were used. From the figures it is clear that the source positions are reconstructed well when using high-order approximations. The localization errors are less than 1mm in two directions, which can clearly be observed from Figs. 4(b) and 4(c). These reconstructed results show the importance and performance of high-order SP_N approximations for BLT reconstruction.

3.2. Reconstruction performance optimization

Although the high-order SPN-based BLT reconstructions yield good source localization, the reconstruction memory and time costs are significantly increased with respect to the first order approximation (SP ₁). In order to save a dense inverse matrix IM(λ_k), the SP ₁-based reconstruction requires only about 94 MB of space compared to about 1.5 GB in the SP₇-based reconstructions. Although the proposed fully parallel framework has the ability to process matrices with large dimensions by distributing the storage, reconstruction time becomes impractical with the increase of the approximation order N and the number of used total wavelengths K. Performance optimization is indispensable to improve the efficiency of the proposed framework. The quasi-Newton optimization method (BLMVM) has been selected to significantly reduce reconstruction time when compared to general Newton-type methods. Additionally, matrix inversion and the number of utilized CPUs further optimizes the reconstruction framework.

3.2.1. Direct vs. iterative inversions

Table 2. Performance comparison between direct and iterative inversions when 10 CPUs are used in reconstructions. DI is Direct Inversion; II denotes Iterative Inversion; and DI/II is the ratio of total time between DI and II.

View Table | View all tables in this article

When the reconstruction domain is discretized into N_𝓟 points, the SP_N-based BLT reconstruction must process a N*N𝓟×N*N _𝓟 matrix. The computational complexity of the matrix inversion is O((N*N _𝓟)³) if direct inversion methods are used. The computational burden is significantly increased with the increase of N and N _𝓟. When 10 CPUs were used in the above BLT reconstructions, the SP ₁-based reconstruction required only 1,163.1sec, as opposed to 3,086.1sec for SP ₃ and 10,754.5sec for SP7 (Table 2). LU-factorization-based relationship forming (i.e. forming Eq. (14)) utilized most of the total reconstruction time. For the SP ₁ and SP₇ approximations, the percentage increased from 58.0% to 96.8%, making it critically important to improve the performance of the matrix inversion. For iterative matrix inversion methods, the parallel incomplete LU (ILU) conjugate gradient (CG) method was used to accelerate the inversion. This preconditioner was provided by the Hypre open source package [32], developed by Lawrence Livermore National Laboratory (LLNL). For the SP ₁- and SP₇-based reconstructions, the total reconstruction time sharply decreased from 644.5 to 3,367.5sec, as shown in Table 2. Although the percentage of the relationship forming part in the total time increased regardless of SP ₁ and SP₇ approximations, the reconstruction speed was improved by a factor of 1.80 and 3.19 corresponding to the SP1 and SP7 approximations.

3.2.2. Parallel reconstruction performance

Iterative matrix inversion methods show the improved performance in the Relationship Forming step. Ideally, parallel execution on an increased number of CPUs should provide improved reconstruction performance. In order to evaluate the proposed fully-parallel reconstruction framework, BLT reconstructions with SP ₁, SP ₃ and SP₇ approximations were performed using different number of CPUs. Iterative matrix inversion was used in these evaluations. Figure 5(a) shows the total reconstruction time depending on the CPU number. The SP ₁-based reconstruction time increased with an increased number of CPUs. For SP ₃- and SP₇-based reconstructions, there were an optimal CPU number which will provide the shortest reconstruction time (3 and 23 CPUs corresponding to SP ₃ and SP₇). In the proposed framework, the four main steps are 1) Mesh Partitioning, 2) Matrix Assembly, 3) Relationship Forming, and 4) Optimization. To further observe the above behavior, time analysis was performed while observing these steps, as shown in Figs. 5(b) and 5(c). Since Mesh Partitioning required the least time among the four steps, it was negligible with respect to the entire reconstruction (data is not shown in Figs. 5(b) and 5(c)). With the increase in CPU number, the time cost of Matrix Assembly was gradually reduced. Despite the fact that the matrix assembly time increased with the application of high-order approximation, Matrix Assembly required a small percentage of the overall reconstruction. Relationship Forming and Optimization comprised nearly all the reconstruction time. Furthermore, both of these steps had an optimal number of CPUs to obtain the minimal time cost. Since BLT reconstructions are performed on a cluster, the time cost of the communication between the CPUs becomes significant compared with the performance improvement from parallel execution. Higher speed communication methods, such as shared memory mode, could significantly improve the reconstruction speed. With the current hardware architecture and software settings, the number of CPUs must be preselected to obtain optimal reconstruction time.

Fig. 5. Performance comparison depending on CPU number in SP_N-based BLT reconstruction. (a) is the total reconstruction time depending on CPU number; (b) and (c) are the reconstruction time of the major steps in the proposed framework and the percentages of the total reconstruction time respectively. Note that SP₇-based reconstruction becomes possible when at least 4 CPUs are used.

Download Full Size | PDF

3.3. Real mouse experiments

Fig. 6. (a) shows the volumetric mesh and the mapped photon distribution at 660nm for real mouse experiments. (b) is the mesh partitioning results when 30 CPUs are used in BLT reconstruction.

Download Full Size | PDF

To further validate the proposed framework, experiments with a living mouse were performed in the Maestro 2 system. To simulate the bioluminescence source, a luminescent bead (Mb-Microtec, Bern, Switzerland) whose emission spectrum is similar to the in vivo spectrum of a firefly luciferase-based source was used. This bead uses tritium (the half life is about 12 years) to excite phosphor which generates photons, making it a very stable source. The bead dimensions are 0.9mm in diameter and 2.5mm in length. Prior to performing the experiments, the mouse was anesthetized and the bead was injected into the thigh using a syringe. The photon distribution data at 580nm and 660nm were collected from the ventral view. The exposure time was set to 1.5min. The volumetric mesh used in the reconstruction was generated using CT images of the mouse and contained 5,932 points and 24,120 tetrahedral elements. Figure 6(a) shows the mapped photon distribution after co-registration between the photograph of the mouse and the volumetric mesh. From the CT images, the tritium source can be clearly identified, as shown in Fig. 7. Since the photon propagation path consists almost totally of muscle, the reconstruction domain was considered to be homogeneous muscle tissue. The corresponding mouse muscle optical properties were then used in the reconstruction (Table 1).

Fig. 7. BLT reconstructions with SPN approximations for real mouse experiments. (a), (b) and (c) are the reconstructed results corresponding to SP ₁(DA), SP ₃ and SP₇. Green dotted lines are used to align the boundaries of CT slices with those of reconstructed slices. Thin red lines pass through the center of the source in CT slices. (Unit: mm)

Download Full Size | PDF

Figure 6(b) shows the partitioned mesh when 30 CPUs were used in the BLT reconstruction. The reconstructed results corresponding to SP ₁, SP ₃, and SP₇ approximations are shown in Fig. 7. The actual center position of the tritium source was at (51.8,-0.1). The reconstructed center position obtained from SP ₁, SP ₃, and SP₇ approximations was at about (51.1,0.2). The SP₇-based reconstruction was similar with the SP ₃-based one regarding the source center position. Although the SP ₁-based reconstruction was somewhat different compared to the SP ₃-and SP₇-based results, the source localization errors were measured to be less than 1mm in two directions. This result was similar to the SP ₃- and SP₇-based reconstruction in the phantom experiments. The difference between phantom- and real mouse-based BLT reconstructions was that the tritium source could be localized well in the SP ₁-based reconstruction. This was likely because the tritium source was superficial compared to the LED source in the mouse-shaped phantom. In addition, the mouse surface was more irregular than the mouse-shaped phantom surface, it should have the effect in the reconstruction.

4. Discussions and Conclusion

In this paper, a radiative-transfer-based fully-parallel reconstruction framework was developed for spectrally-resolved bioluminescence tomography. Although the BLT reconstruction was performed based on the simplified spherical harmonics approximation, the proposed framework was also suitable for other high-order self-adjoint approximations to the RTE. The application of the finite element methods made the framework suitable for processing complex geometries. Fully-parallel execution made the BLT reconstruction for the whole-body of a small animal feasible. The reconstruction performance optimization significantly improved the reconstruction speed. The experimental reconstruction using the mouse-shaped phantom and real mouse further demonstrated the effectiveness of the proposed framework.

Since bioluminescence tomography can provide more accurate bioluminescent source information, the importance of developing mature BLT technologies is critical, given the successful and extensive application of planar bioluminescence imaging in biological research. Diffusion theory approximation leads to inaccurate reconstructions and more accurate approximation models are necessary for BLT reconstruction. However, the corresponding computational burden prevents the realization of such reconstruction algorithms. The proposed framework addresses this problem. Although good reconstruction performance can be obtained using high-order approximation models, the memory and time cost cannot be neglected. The balance between reconstruction quality and cost should be further explored. From the reconstruction cases presented in this paper, we find that the SP3 approximation is suitable for obtaining good BLT reconstructions after comparing its results with those based on SP ₁ and SP₇ approximations.

The performance optimization presented here is relevant not only to the improvement of the algorithms involved, but also to the development of the computational hardware. While the application of quasi-Newton optimization methods and iterative matrix inversion have effectively improved the reconstruction performance, further optimization strategies should be developed to obtain higher reconstruction speed. Since the matrix M(λ_k) is sparse, sparse approximate matrix inversion can be considered to accelerate the reconstruction. With respect to the hardware, it is necessary to improve the data communication speed between CPUs. Developing multi-core CPU technology and shared-memory high performance computer will significantly benefit the proposed algorithm.

In conclusion, we have developed a fully-parallel spectrally-resolved BLT reconstruction framework for radiative-transfer-based high-order approximations. A performance optimization was also performed and described. Preliminary experimental reconstruction verifications show the feasibility and effectiveness of the proposed framework. Further investigations will focus on real mouse experiments used as disease models.

Acknowledgement

We would like to thank Dr. Laurent Bentolila from Department of Chemistry & Biochemistry, University of California Los Angeles for providing us with access to the Maestro 2 system. We are grateful to Judy Edwards and Waldemar Ladno at the small-animal imaging facility of the Crump Institute for Molecular Imaging for their assistance with mouse experiments. This work is supported by the NIBIB R01-EB001458, a NIH/NCI 2U24 CA092865 cooperative agreement, the Department of Energy DE-FC02-02ER63520, and the NCI grant 5-R01 CA08572.

References and links

1. V. Ntziachristos, J. Ripoll, L. V. Wang, and R. Weisslder, “Looking and listening to light: the evolution of whole body photonic imaging,” Nat. Biotechnol. 23, 313–320 (2005). [CrossRef] [PubMed]

2. R. Weissleder, “Scaling down imaging: Molecular mapping of cancer in mice,” Nat. Rev. Cancer 2, 11–18 (2002). [CrossRef] [PubMed]

3. J. Virostko, A. C. Powers, and E. D. Jansen, “Validation of luminescent source reconstruction using single-view spectrally resolved bioluminescence images,” Appl. Opt. 46, 2540–2547 (2007), http://www.opticsinfobase.org/abstract.cfm?URI=ao-46-13-2540. [CrossRef] [PubMed]

4. A. P. Gibson, J. C. Hebden, and S. R. Arridge, “Recent advances in diffuse optical imaging,” Phys. Med. Biol. 50, R1–R43 (2005). [CrossRef] [PubMed]

5. E. E. Lewis and W. F. Miller Jr., , Computational Methods of Neutron Transport, (JohnWiley & Sons, New York, 1984).

6. W. Cong, G. Wang, D. Kumar, Y. Liu, M. Jiang, L. V. Wang, E. A. Hoffman, G. McLennan, P. B. McCray, J. Zabner, and A. Cong, “Practical reconstruction method for bioluminescence tomography,” Opt. Express 13, 6756–6771 (2005), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-13-18-6756. [CrossRef] [PubMed]

7. X. Gu, Q. Zhang, L. Larcom, and H. Jiang, “Three-dimensional bioluminescence tomography with model-based reconstruction,” Opt. Express 12, 3996–4000 (2004), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-12-17-3996. [CrossRef] [PubMed]

8. Y. Lv, J. Tian, W. Cong, G. Wang, J. Luo, W. Yang, and H. Li, “A multilevel adaptive finite element algorithm for bioluminescence tomography,” Opt. Express 14, 8211–8223 (2006), http://www.opticsinfobase.org/abstract.cfm?URI=oe-14-18-8211. [CrossRef] [PubMed]

9. H. Dehghani, S. C. Davis, S. Jiang, B. W. Pogue, K. D. Paulsen, and M. S. Patterson, “Spectrally resolved bioluminescence optical tomography,” Opt. Lett. 31, 365–367 (2006), http://www.opticsinfobase.org/abstract.cfm?URI=ol-31-3-365. [CrossRef] [PubMed]

10. A. D. Klose, “Transport-theory-based stochastic image reconstruction of bioluminescent sources,” J. Opt. Soc. Am. A 24, 1601–1608 (2007), http://www.opticsinfobase.org/josaa/abstract.cfm?URI=josaa-24-6-1601. [CrossRef]

11. C. R. E. de Oliveira, “An arbitrary geometry finite element method for multigroup neutron transport with anisotropic scattering,” Progr. Nucl. Energ. 18, 227–236 (1986). [CrossRef]

12. S. Wright, M. Schweiger, and S. R. Arridge, “Reconstruction in optical tomography using the PN approximations,” Meas. Sci. Technol. 18, 79–86 (2007). [CrossRef]

13. A. D. Klose and E. W. Larsen, “Light transport in biological tissue based on the simplified spherical harmonics equations,” J. Comput. Phys. 220, 441–470 (2006). [CrossRef]

14. Y. Lu and A. F. Chatziioannou, “A parallel adaptive finite element method for the simulation of photon migration with the radiative-transfer-based model,” Commun. Numer. Methods Eng. 25, 751–770 (2009). [CrossRef]

15. G. Wang, Y. Li, and M. Jiang, “Uniqueness theorems in bioluminescence tomography,” Med. Phys. 31, 2289–2299 (2004). [CrossRef] [PubMed]

16. C. Kuo, O. Coquoz, T. L. Troy, H. Xu, and B.W. Rice, “Three-dimensional reconstruction of in vivo bioluminescent sources based on multispectral imaging,” J. Biomed. Opt. 12, 024007 (2007). [CrossRef] [PubMed]

17. A. J. Chaudhari, F. Darvas, J. R. Bading, R. A. Moats, P. S. Conti, D. J. Smith, S. R. Cherry, and R. M. Leahy, “Hyperspectral and multispectral bioluminescence optical tomography for small animal imaging,” Phys. Med. Biol. 50, 5421–5441 (2005). [CrossRef] [PubMed]

18. Y. Lv, J. Tian, W. Cong, G. Wang, W. Yang, C. Qin, and M. Xu, “Spectrally resolved bioluminescence tomography with adaptive finite element analysis: methodology and simulation,” Phys. Med. Biol. 52, 4497–4512 (2007). [CrossRef] [PubMed]

19. G. Alexandrakis, F. R. Rannou, and A. F. Chatziioannou, “Tomographic bioluminescence imaging by use of a combined optical-PET (OPET) system: a computer simulation feasibility study,” Phys. Med. Biol. 50, 4225–4241 (2005). [CrossRef] [PubMed]

20. T. Vo-Dinh, Biomedical Photonics Handbook, (CRC Press, 2002).

21. A. Ishimaru, Wave propagation and scattering in random media, (IEEE Press, 1997).

22. R. C. Haskell, L. O. Svaasand, T. Tsay, T. Feng, M. S. McAdams, and B. J. Tromberg, “Boundary conditions for the diffusion equation in radiative transfer,” J. Opt. Soc. Am. A 11, 2727–2741 (1994), http://www.opticsinfobase.org/abstract.cfm?URI=josaa-11-10-2727. [CrossRef]

23. S. S. Rao, The finite element method in engineering, (Butterworth-Heinemann, Boston, 1999).

24. G. Karypis and V. Kumar, “Multilevel k-way partitioning scheme for irregular graphs,” J. Parallel Distrib. Comput. 48, 96–129 (1998). [CrossRef]

25. G. H. Golub and C. F. Van Loan, Matrix computations (3rd ed.), (Johns Hopkins University Press, 1996).

26. M. Benzi, “Preconditioning techniques for large linear systems: a survey,” J. Comput. Phys. 182, 418–477 (2002). [CrossRef]

27. J. Nocedal and S. J. Wright, Numerical Optimization, (Springer, New York, 1999). [CrossRef]

28. S. J. Benson and J. Moré, “A limited-memory variable-metric algorithm for bound-constrained minimization,” Technical Report ANL/MCS-P909-0901, Mathematics and Computer Science Division, Argonne National Laboratory (2001).

29. B. Kirk, J. W. Peterson, R. H. Stogner, and G. F. Carey, “libMesh: A C++ Library for Parallel Adaptive Mesh Refinement/Coarsening Simulations,” Eng. Comput. 22, 237–254 (2006). [CrossRef]

30. S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang, PETSc Web page, 2001, http://www.mcs.anl.gov/petsc.

31. S. A. Prahl, M. J. C. van Gemert, and A. J. Welch, “Determining the optical properties of turbid media by using the adding-doubling method,” Appl. Opt. 32, 559–568 (1993), http://www.opticsinfobase.org/abstract.cfm?URI=ao-32-4-559. [CrossRef] [PubMed]

32. R. D. Falgout and U. M. Yang, “hypre: A library of high performance preconditioners,” In Proceedings of the International Conference on Computational Science-Part III, p. 632–641 (2002).

	Mouse phantom	Mouse muscle
Wavelength	580nm 640nm	580nm 660nm
µ_a (λ_k )[mm ^-1]	0.038 0.004	0.463 0.08
µ′_s (λ_k )[mm ^-1]	1.82 1.57	0.975 0.902

	Mouse phantom	Mouse muscle
Wavelength	580nm 640nm	580nm 660nm
µ_a (λ_k )[mm ^-1]	0.038 0.004	0.463 0.08
µ′_s (λ_k )[mm ^-1]	1.82 1.57	0.975 0.902

Experimental Bioluminescence Tomography with Fully Parallel Radiative-transfer-based Reconstruction Framework

Abstract

1. Introduction

2. Methods

2.1. Radiative transfer equation and SPN approximation

2.2. Fully parallel reconstruction algorithm

2.3. Further demonstration

3. Results

3.1. Mouse-shaped phantom experiments

3.2. Reconstruction performance optimization

3.2.1. Direct vs. iterative inversions

3.2.2. Parallel reconstruction performance

3.3. Real mouse experiments

4. Discussions and Conclusion

Acknowledgement

References and links

Cited By

Figures (7)

Tables (2)

Equations (29)

Optics Express

	Total time (sec.)		Relationship Forming (sec.)		Percentage		Ratio
	DI	II	DI	II	DI	II	DI/II
SP ₁	1,163.1	644.5	674.1	478.3	58.0%	74.2%	1.80
SP ₃	3,086.1	954.3	2,701.8	866.3	87.5%	90.8%	3.23
SP₇	10,754.5	3,367.5	10,405.4	3,236.0	96.8%	96.1%	3.19