Illumination pattern optimization in tomography based on the Kalman estimation filter and optimal experiment design

Ramin Pashaie

doi:10.1364/OE.520196

1. Introduction

Data acquisition is a costly and time consuming operation in any form of tomography. During the data acquisition phase, a larger number of measurements are made to collect enough data to achieve the target precision in reconstructed images. Data collection can reduce the speed of the scanner, or the temporal resolution of the platform, to the point that the system cannot catch up with the speed of changes in the imaging of dynamically evolving objects. Increasing the number of measurements can also increase the operation cost and in some tomography platforms (e.g., X-ray based imaging systems) cause damage to the medium under test. Therefore, it is desirable to reduce the number of measurements as long as the target precision is achieved.

The information content of different measurements are not equal. Some measurements are more informative than others. If, instead of conducting all possible measurements, we compute and only conduct a handful of the most informative measurements, we can speed up the data acquisition process without noticeably sacrificing the precision. The main objective of this article is to introduce a systematic procedure to compute the optimal measurement in each step of the data acquisition process in real-time using the prior information and the outcome of all previous measurements when scanning the sample.

It is possible to improve the information content of measurements by increasing the number or the positions of sources/detectors installed on the scanner or the geometry of the scanner in general. However, without loss of generality, in this article, the focus is on a scanner with a given geometry and a constant number of sources/detectors. Here, we optimize measurements by computing the radiation intensity of each source, a.k.a. the illumination pattern, to make the measurements more informative. Same algorithms can be used to optimize source or detector positioning or other changes in the scanner’s geometry.

In previous studies, two general approaches were tested to find optimal illumination patterns. The first approach was based on the idea that stimulating a system with complex input patterns can provide more information about the system than measuring a set of impulse responses similar to raster scanning. Several illumination patterns were tested, including checkerboard [1], sinusoidal [2], wavelet [3], or diffractive optics patterns [4]. The improvement achieved by these algorithms were not significant. The second approach was designed based on the theory of optimal experiments mostly formulated via the maximum likelihood estimation theory. Here, illumination patterns were designed to minimize the trace of the covariance matrix (A-optimal) as a measure of uncertainty in reconstruction, minimize the determinant (D-optimal) or the norm (E-optimal) of the error covariance matrix [5]. An example of this approach is the study in which they computed illumination patterns to optimize the condition of the Fisher information matrix, which is an example of E-optimal solution, [6]. While this second approach was more successful in improving the quality of reconstructed images, it still has some major drawbacks. In this approach, all illumination patterns are computed in advance at the beginning of scanning mainly based on the geometry of the scanner. Therefore, the prior information about the medium or the outcome of ongoing measurements are not taken into consideration. This is a major drawback specifically when the scanner is imaging dynamically evolving objects. In the current manuscript, the optimization is formulated for a dynamical system and optimal experiments are computed at each iteration based on all available information collect to that point. The theory is formulated based on optimizing experiments in a Kalman estimation filter and it is under the category of A-optimal experiment design.

Kalman filter was introduced in 1960 [7] as a mathematical framework to compute the minimum variance estimation of the state by combining the prior estimation with the outcome of a noise contaminated measurements [8]. Sequential measurements followed by state estimation updates form the main structure of Kalman filter iterations. The theory has been quite successful from early days and has found a large number of applications in different industries [9]. Several more advanced versions of the theory was developed including the Extended Kalman Filter (EKF) algorithm which was designed to make the filter applicable to nonlinear systems [10]. In his original work, Kalman did not offer a recipe for the design of measurements to further improve the precision of computed estimations. In this article, a systematic procedure is introduced to design optimal measurements for Kalman iterations in general. Through this analysis, we also demonstrate the connection between the Kalman estimation filter and the theory of principal components (PCs). This approach can also be used for system identification [11] or sensor position optimization applications in tomography [12]. In the article, the developed theory was applied to a typical optical tomography problem to evaluate the improvement we can be achieved via this optimization process.

2. Theoretical background

In the published literature, Kalman filter, together with the well-known Bayesian learning and the theory of maximum likelihood, cast the main framework of the estimation theory [8]. In the current article, optimal measurements are computed based on the formulation of Kalman filter. Nonetheless, since these theories are closely related, it is possible to formulate the same results, for example based on Bayesian learning. In the following section, we first review the main formulation of the Kalman filter. Then, we show the connection between Kalman filter, Bayesian learning, and the theory of maximum likelihood.

2.1 Basics of the Kalman filter

Consider a linear dynamical system formulated by the following transition model:

(1)$$\begin{aligned} x(n+1) &= A(n)~x(n) + B(n)~u(n) + \varepsilon_p(n),\\ y(n) &\;= C(n)~x(n) + \varepsilon_m(n). \end{aligned}$$

In these equations, $n$ is the step in time, $x(n) \in \mathbb {R}^p$ is the state variable, $y(n) \in \mathbb {R}^q$ is the measurement outcome or observation, and $u(n) \in \mathbb {R}^r$ is the input applied to the system. Here, $\varepsilon _p(n) \sim \mathcal {N}(0,R_p)$ and $\varepsilon _m(n) \sim \mathcal {N}(0,R_m)$ are process and measurement noises modeled by a zero-mean normal distribution with covariance matrices $R_p$ and $R_m$, respectively. The process noise $\varepsilon _p(n)$ models our lack of confidence in the model we have for the dynamics of the system and the way input affects the dynamics. Based on Kalman theory, if $x^{n|n-1}$ and $x^{n|n}$ are estimations of the state variable before (prior) and after (posterior) the $n$th measurement, then [8]:

(2)$$x^{n|n} = x^{n|n-1} + K(n) ~[~y(n) - C(n)~x^{n|n-1}~].$$

As the equation shows, the posterior estimation is a combination of the prior estimation and the difference between the actual and estimated value of the measurement. The matrix $K(n)$, a.k.a. Kalman gain, is the weight that decides whether we should relay more on the prior estimation or the measurement outcome to obtain a more accurate posterior. Kalman computed the optimal value for $K(n)$ by minimizing the trace of the posterior covariance via solving the equation $\nabla _{K(n)} Tr[P^{n|n}]=0$ to get:

(3)$$K(n) = P^{n|n-1} C^T(n) \left\{C(n) P^{n|n-1} C^T(n) + R_m \right\}^{{-}1}.$$

Here, $P^{n|n-1}=Cov(x^{n|n-1})$ and $P^{n|n}=Cov(x^{n|n})$ are the prior and posterior covariance matrices. In Kalman filter, posterior covariance is computed iteratively using the following equation:

(4)$$P^{n|n} = \left\{~I - K(n) C(n)~ \right\} ~ P^{n|n-1}.$$

After updating the state estimation by Eq. (2), the dynamics is pushed one step forward to compute the prior for the next iteration:

(5)$$\begin{aligned} x^{n+1|n} &= A x^{n|n} + B u(n+1),\\ P^{n+1|n} &= A P^{n|n} A^T + R_p. \end{aligned}$$

Kalman iterations can continue with a finite or infinite horizon.

2.2 Correlation between Kalman filter, Bayesian learning, and maximum likelihood estimation

We assumed that the prior estimation has the following normal distribution:

(6)$$p(x) \sim \mathcal{N} \left(x^{n|n-1},P^{n|n-1}\right).$$

By using this equation and the equation of measurement in (1), we can compute the probability distribution of the measurement:

(7)$$p(y) \sim \mathcal{N} \left(C~x^{n|n-1}, C~P^{n|n-1}~C^T+R_m\right).$$

From these two distributions, one can compute the covariance between the state and the corresponding measurement:

(8)$$Cov(x,y) = Cov(x,Cx+\varepsilon_m) = P^{n|n-1}~C^T.$$

In this case, the joint probability distribution of the state and the measurement, $p(x,y)$, prior to the $n$th measurement, has the following normal distribution:

(9)$$p(x,y) \sim \mathcal{N} \left( \left[ \begin{array}{c} x^{n|n-1} \\ C(n)x^{n|n-1} \end{array} \right], \Sigma_{x,y}=\left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right),$$

where:

(10)$$\begin{aligned} \Sigma_{11}&=Cov(x)=P^{n|n-1},\\ \Sigma_{12}&=Cov(x,y)=P^{n|n-1}C^T(n)=\Sigma^T_{21},\\ \Sigma_{22}&=Cov(y)=C(n)P^{n|n-1}C^T(n)+R_m. \end{aligned}$$

In Bayesian learning, the posterior is defined as the conditional probability $p(x|y)$ which is the probability of the state given the outcome of the measurement. We can prove (Appendix A) that the posterior estimation has the following normal distribution:

(11)$$\begin{aligned} &p(x|y) \sim \mathcal{N} \left( x^{n|n}, \Sigma_{x|y} \right),\\ &x^{n|n} = x^{n|n-1} + \Sigma_{12} ~\Sigma_{22}^{{-}1}\left[~y(n)-C(n)x^{n|n-1}~\right],\\ &\Sigma_{x|y} = \Sigma_{11} - \Sigma_{12} ~\Sigma^{{-}1}_{22} ~ \Sigma_{21} = \{I - K(n)C(n)\}P^{n|n-1}. \end{aligned}$$

By comparing the equation for $x^{n|n}$ in (2) with the same equation in (11), we can see that if $K(n) = \Sigma _{12} ~\Sigma _{22}^{-1}$ the equation for the posterior estimation (and its covariance) in the Bayesian framework matches the Kalman estimation filter.

In Kalman formulation, the posterior covariance is usually calculated from the prior covariance via Eq. (4). It is possible to show that these two covariance matrices are also related by (Appendix B):

(12)$$(P^{n|n})^{{-}1} = (P^{n|n-1})^{{-}1} + C^T(n) R_m^{{-}1} C(n).$$

In the case of no prior information and before conducting the first measurement, we have infinite amount of uncertainty regarding the state of the system in the next time step or: $P^{1|0} \rightarrow \infty$, which translates to:

(13)$$P^{1|1} = \left[~C^T(1) R_m^{{-}1} C(1)~\right]^{{-}1}.$$

We can rewrite the Kalman gain Eq. (3) as:

(14)$$K(n) \{C(n) P^{n|n-1} C^T(n) + R_m \} = P^{n|n-1} C^T(n) .$$

Simply rearrange this equation to get:

(15)$$K(n)= \left\{ I - K(n)C(n) \right\} P^{n|n-1} C^T(n)R^{{-}1}_m = P^{n|n}C^T(n)R^{{-}1}_m.$$

If we use the equation of posterior covariance for the case of no prior, the Kalman gain becomes:

(16)$$K(1) = \left[~C^T(1) R_m^{{-}1} C(1)~\right]^{{-}1} C^T(1)R^{{-}1}_m.$$

The posterior estimation equation in this case is:

(17)$$x^{1|1} = x^{1|0} + K(1) \left\{~y(1)-C(1)x^{1|0}~ \right\}.$$

Assume that the coordination is adjusted so that $x^{1|0}=0$, then:

(18)$$x^{1|1} = \left[~C^T(1) R_m^{{-}1} C(1)~\right]^{{-}1} C^T(1)~R^{{-}1}_m~y(1).$$

It is possible to show that Kalman state estimation equation for the case of no prior information matches the estimation obtained by the theory of maximum likelihood. To see this, assume that a measurement is made following the Eq. (1), or $y=Cx + \varepsilon _m$. The probability distribution of the observed outcome is:

(19)$$p(y) \sim \mathcal{N} \left(~Cx~,~C~Cov(x)~C^T + R_m ~\right).$$

When the state variable is given, we have: $Cov(x) = 0$. Then, the likelihood probability is:

(20)$$p(y|x) \sim \mathcal{N} \left(Cx,R_m\right) = \frac{1}{(2 \pi)^{\frac{q}{2}} |R_m|^{\frac{1}{2}}} exp\left\{\frac{1}{2}(y-Cx)^TR^{{-}1}_m(y-Cx) \right\}.$$

To find the state that maximizes the probability of this observation, we take the derivative of the logarithm of the likelihood probability w.r.t. the state variable and put it equal to zero:

(21)$$\frac{d}{dx} log~p(y|x) = \left\{C^TR^{{-}1}_m y - C^TR^{{-}1}_mCx\right\}^{{-}1} = 0.$$

From this, we compute the maximum likelihood estimation of the state variable, $\hat {x}_{ML}$:

(22)$$\hat{x}_{ML} = \left[~C^T R_m^{{-}1} C~\right]^{{-}1} C^TR^{{-}1}_m y.$$

We can compute the covariance of this estimation directly from Eq. (22):

(23)$$Cov(\hat{x}_{ML}) = \left[C^T R_m^{{-}1} C\right]^{{-}1} C^TR^{{-}1}_m \cdot Cov(y) \cdot \left\{ \left[C^T R_m^{{-}1} C\right]^{{-}1} C^TR^{{-}1}_m \right\}^T.$$

Simplify this equation, to arrive at:

(24)$$Cov(\hat{x}_{ML}) = \left[~ C^T R^{{-}1}_m C ~ \right]^{{-}1}.$$

It is now clear that the state estimation and its covariance computed by the maximum likelihood theory, Eqs. (22) and (24), matches the ones computed by the Kalman filter in the case of no prior as stated in Eqs. (18) and (13).

3. Optimal experiment

In a dynamical system formulated by Eq. (1), $C(n)$ is the measurement matrix. Assume that every time we make a measurement, we have the option to choose our measurement matrix. By choosing the best possible measurement, we can further reduce the trace of the posterior covariance, which is our measure of uncertainty. Trace of the posterior is a function of Kalman gain and the measurement matrix. We can consider an iterative process in which we first optimize the trace over the Kalman gain, for a given measurement matrix, then we assume that the gain is constant and we optimize the posterior trace over the measurement matrix and we continue these iterations till convergence. Previously, we took the derivative of the trace of the posterior covariance matrix, w.r.t. Kalman gain, and we found the optimal gain for a given measurement matrix. Now, we take the derivative of the posterior trace w.r.t. the measurement matrix and solve $\nabla _{C(n)} Tr[P^{n|n}]=0$ to find the optimal measurement for the given gain. In fact, this leads to an interesting result. For $C^*(n)$ being the optimal measurement matrix, we get:

(25)$$C^*(n) = K^{{\dagger}}(n) = \left[ \Sigma_{12}(n) ~\Sigma_{22}^{{-}1}(n) \right]^{{\dagger}}.$$

Here, $K^{\dagger }(n)=[K^T(n)~K(n)]^{-1}K(n)^T$. This equation suggests that, for a given gain, the optimal measurement matrix is the left pseudo-inverse of $K(n)$. We can compute the optimal gain and measurement matrix using Eqs. (3) and (25) iteratively and continue the iterations until the algorithm converges to an extremum of the trace function. Obviously, the algorithm converges to an extremum point of the trace function (possibly a saddle point) but not necessarily the global minimum, which is the ideal outcome. Nevertheless, Eq. (25) provides a new vision toward computing the optimal experiment.

One question here is: is it possible to find the global optimal measurement matrix without even running the iterations between $K(n)$ and $C(n)$? We focus on answering this question in the following.

It is known that the prior covariance is a symmetric positive definite matrix. Therefore, we can write the following eigenvalue decomposition for the prior covariance:

(26)$$P^{n|n-1} = W~\Lambda~W^T.$$

Here, $W=[w_1~\cdots ~w_p]$ is the matrix of eigenvectors of $P^{n|n-1}$ and $\Lambda =diag[\lambda _1~\cdots ~\lambda _p]$ the matrix of eigenvalues, assuming $\lambda _1 \geq \lambda _2\cdots \geq \lambda _p$. Eigenvectors of the covariance matrix are traditionally known as the principal components (PCs). Eigenvectors with largest eigenvalues are the main PCs of a covariance matrix. For the posterior covariance matrix, when the experiment is optimal, we have:

(27)$$P^{n|n} = \{I - K(n) K^{{\dagger}}(n) \} P^{n|n-1}.$$

Assume that $Col[K(n)]$ and $Col^{\bot }[K(n)]$ are the $K(n)$’s column-space and the corresponding orthogonal space. The matrix $I - K(n) K^{\dagger }(n)$ in this equation is a projection operator that projects columns of $P^{n|n-1}$ to $Col^{\bot }[K(n)]$ [13]. This means that the posterior covariance will have no eigenvector in the $Col[K(n)]$.

Trace of the covariance matrix has extremums along it’s principal components. Iterations between $K(n)$ and $C(n)$, depending on the initial point of iterations, converge to one of these extremums. Since the measurement matrix has $q$ rows, $K(n)$ has $q$ columns. We can choose each column of $K(n)$ to be an eigenvector of the prior covariance. In this case, when $C^*(n)=K^{\dagger }(n)$, the drop in the trace of the posterior covariance is equal to the summation of all $q$ corresponding eigenvalues. However, if we choose $K(n)=[w_1~\cdots ~w_q]$ so that its columns are the first main principal components of the prior covariance with the largest eigenvalues, then we can expect to observe the maximum possible drop in the trace of the posterior covariance in an experiment with $q$ detectors. Therefore, $C^*(n)=[w_1~\cdots ~w_q]^{\dagger }$ is the equation for the global optimal experiment.

This analysis shows the intersection between the Kalman filter and the theory of principal components. The analysis suggests that best directions to measure components of the state vector are directions along which we have maximum variances which are the main principal components of the prior covariance.

Sequential iterations between computing optimal $K(n)$ and $C(n)$ leads to the development of a measurement matrix in which each row of $C(n)$ is a principal component of the prior covariance. However, the convergence in these iterations is only asymptotic. To show this, consider the following equation derived from Eq. (15):

(28)$$K(n) R_m = P^{n|n} C^T(n).$$

In the case of optimal measurement, there is a paradox embedded in this equality which can be resolved only at the limit when the number of iterations goes to infinity. Since for optimal measurement $P^{n|n}$ belongs to the $Col^{\bot }[K(n)]$, it has no component in $Col[K(n)]$. Therefore, the right side of this equation can never match the left side, which is in $Col[K(n)]$, unless they are both zeros. Obviously, this can happen in the absence of noise when $R_m=0$, which means the noise power is zero. However, when $R_m \neq 0$, iterations resolve this paradox by continuously increasing the second norm of rows of the measurement matrix in each $K(n)$-$C(n)$ iteration, which translates to reduction of the second norm of columns of $K(n)$, since $C^*(n)K(n)=I$. In other words, the optimization process continuously increases the power of signal in measurements to minimize the effect of noise. When the number of iterations goes to infinity, the signal power is increased so much that noise is effectively canceled and both sides of the equation match, but only asymptotically.

Now, we answer the question we asked before. Based on this analysis, it is possible to avoid $K(n)$-$C(n)$ iterations. We have two factors when synthesizing the optimal measurement matrix. First, we should choose the right directions (direction of rows of the measurement matrix), which are directions of the main principal components, and then we should increase the power of signal (second norm of rows of the measurement matrix) to the maximum level that is practically allowed.

Convergence of $K(n)$-$C(n)$ iterations for a simple two-dimensional state vector is shown in Fig. 1. Only one measurement is made at a time in this example, therefore $C(n)$ has only one row. The figure shows the result of the first $10$ iterations when iterations were started at different initial points. The graph shows how these iterations gradually align the measurement vector with the main principal component of the prior covariance and then increase the length of the vector to improve the signal power.

Fig. 1. Asymptotic convergence of $K(n)$-$C(n)$ iterations. A two-dimensional normal distribution of a state vector is shown in both panels. In this example only one measurement is made at a time. The figure shows the result of the first $10$ iterations for the measurement vector $C(n)$ while starting at two different initial points in the left and right panels. Even when the initial point is closer to the second principal component, the iterations move the measurement vector toward the main PC along which the variance is maximum. As shown, during iterations the second norm of the measurement vector is increasing. In other words, these iterations correct the direction of the measurements and increase the signal power to overcome the noise. If the second norm of the $C(n)$, representing the signal power, is limited to $1.0$, then the optimal measurement is a point along the direction of the main PC on the unit circle shown in both panels.

Download Full Size | PDF

While this approach seems logical, in most real-world applications, there are practical constrains that make it impossible to synthesize the optimal measurement matrix. In such cases, running $K(n)$-$C(n)$ iterations is a reasonable approach to solve the optimization problem and search for an optimal measurement matrix which is practically feasible. In the following, we demonstrate this scenario in an optical tomography problem.

4. Optimal scanning in diffuse optical tomography

Consider the tomography scanner displayed in Fig. 2. Here, the turbid medium under test, modeled by absorption ($\mu _a$) and reduced scattering ($\mu '_s$) coefficients, is located between an array of point light sources and photodetectors [14]. Each source radiates light into the medium which is then detected by the detectors after interacting with the medium. Assume that the scanner has $s$ number of sources, $d$ detectors, and the medium is divided to $v$ pixels. The goal of tomography in this example is to identify the distribution of the absorption coefficient in each pixel, $\eta _k$ for $k=1,\ldots,v$, from the data collected by the detectors. The state vector in this example is $\eta =[\eta _1,\ldots,\eta _v]^T$. The energy radiated from the point source $S_i$, located at $r_{S_i}$, that reaches the pixel $V_k$ at $r_{V_k}$ is $I_i~G(r_{V_k};r_{S_i})$. Here, $I_i$ is the radiation intensity of the source $S_i$. The light that is scattered from this pixel propagates to detectors such that the light intensity sensed by the detector $D_j$, located at $r_{D_j}$, is: $\sum ^V_{k=1}I_i\eta _k G(r_{D_j};r_{V_k}) G(r_{V_k};r_{S_i})$, where $G(r_{V_k};r_{S_i})$ and $G(r_{D_j};r_{V_k})$ are the Green’s functions that model the propagation of the light from the $i$th point source to the $k$th pixel, and the propagation of the scattered light from the $k$th pixel to the $j$th detector. Using the matrix-vector notation, the equation of measurement for this scanner can be written as [11], [6]:

(29)$$y = \left\{~G_2~diag[~G_1~I(n)~]~\right\}~ \eta + \varepsilon_m,$$

Fig. 2. Structure of a diffuse optical tomography scanner. Medium is surrounded by an array of sources and detectors. $G(r_{V_k};r_{S_i})$ and $G(r_{D_j};r_{V_k})$ are Green’s functions that model the propagation of the light from the $i$th source to $k$th pixel, and propagation of the scattered light from the $k$th pixel to the $j$th detector. $\eta _k$ is the concentration of absorbing molecules in pixel $V_k$. The scanner has $s$ number of point sources, $d$ point detectors, and the medium is divided to $v$ number of differential pixels.

Download Full Size | PDF

Here, $G_2$ and $G_1$ are matrices of dimensions $d \times v$ and $v \times s$ with elements $G^{jk}_2=G(r_{D_j};r_{V_k})$ and $G^{ki}_1=G(r_{V_k};r_{S_i})$. In the equation $diag(.)$ is the operator that takes a vector and returns a diagonal matrix. $I=[I_1,I_2,\ldots,I_s]^T$ is the source intensity vector or the illumination pattern. We assume that $0\le I_i \le I_{Max}$ where $I_{Max}$ is the maximum radiation intensity of each source. The measurement matrix in this example is $C(n)=G_2~diag[G_1~I(n)]$, which is determined by the geometry of the scanner and the illumination pattern. We can optimize the performance of the scanner by changing the geometry or illumination patterns. In this example, we assume that the geometry is given and $I(n)$ is the optimization variable. $y = [y_1,y_2,\ldots,y_d]^T$ is the output of the detectors and $\varepsilon _m$ is the measurement noise. Fixed geometry of the scanner and the upper limit for source intensity restrict our ability to synthesize desired measurement matrices.

Traditionally, for scanning a sample, sources are turned on sequentially such that only one source in turned on during each measurement with its maximum radiation intensity and the scanner records the output of all detectors. This protocol is known as raster scanning [14]. It is also possible to turn on all sources with random radiation intensities within the acceptable range. The random scanning protocol is used in some tomography platforms. Our goal is to compare the performance of these two commonly used scanning protocols with the optimal measurement algorithm where the illumination pattern, $I(n)$, is tuned in each measurement to maximize the information we obtain about the state variable $\eta$. By taking the trace of the covariance of both sides of the Eq. (2), we arrive at:

(30)$$\begin{aligned}Tr(P^{n|n}) &= Tr(P^{n|n-1}) - 2~Tr \left[K(n)C(n)P^{n|n-1}\right]\\ &+ Tr\left[ [K(n)C(n)]~P^{n|n-1}~[K(n)C(n)]^T \right] \end{aligned}$$

To find the optimal illumination pattern, we minimize $Tr(P^{n|n})$ over the variable $I(n)$. This optimization problem can be formulated as:

(31)$$\begin{aligned} Minimize^{}_{I(n)} ~ \psi &={-}2~Tr \left[K(n)C(n)P^{n|n-1}\right] +\\ &\;Tr\left[ [K(n)C(n)]~P^{n|n-1}~[K(n)C(n)]^T \right]\\ s.t. ~~~~~~&0\le I_i(n) \le I_{Max}, ~\forall i \in \{1,2,\ldots,s\}. \end{aligned}$$

For a single row $C(n)$, this optimization is a quadratic programming which is convex [5]. But in general, when the measurement matrix $C(n)$ has more than one row, this optimization is non-convex. Fortunately, it is possible to compute the gradient of the cost function and use gradient-based optimization algorithms (such as the projected gradient descent method [15]) to search for local minimums. We can prove that (Appendix D):

(32)$$\begin{aligned}\nabla_{I(n)}~\psi = &-2G^T_1~diag^*\left[G^T_2K^T(n)P^{n|n-1}\right]\\ &+ 2G^T_1~diag^* \left[G^T_2K^T(n)K(n)C(n)P^{n|n-1} \right]. \end{aligned}$$

Here, $diag^*(A)$ is the operator that takes the matrix $A$ and returns a vector of diagonal elements of $A$. Consider the two scanner configurations shown in Fig. 3. Scanner dimensions are in an arbitrary unit of length. In the scanner shown at the top, the medium is surrounded by $54$ sources and $23$ detectors and divided to $225$ pixels, $15$ pixels in each direction. The figure also shows the initial actual (left panel) and assumed (right panel) distributions of the absorbing molecules in pixels. The assumed distribution is our prior estimation of the distribution, which is clearly far from the reality in this example. The prior covariance matrix used had eigenvalues uniformly distributed between $0.001$ and $1$ with the condition number of $10^3$. We assumed that point sources and detectors are implemented as optical fiber tips that are assembled on the sides of the scanner. The scanner at the bottom of Fig. 3 is similar to the scanner shown on top, but the number of sources and detectors are reduced to $36$ and $13$, respectively.

Fig. 3. Configuration of the tomography scanner used in the simulation: For the top scanner, the turbid medium is surrounded by $54$ sources and $23$ detectors. For the scanner shown at the bottom, the number of sources and detectors are reduced to $36$ and $13$, respectively. The medium is divided to $225$ pixels, $15$ pixel in each direction. The actual and assumed initial distributions are shown in the left and right panels, for both scanners. Dimensions are in an arbitrary unit of length. The figure shows the actual and assumed distributions of the perturbation in the absorption coefficient, and not the background. The color bar shows the percentage of change in the absorption coefficient compared to the background.

Download Full Size | PDF

The following equation was used for the Green’s function:

(33)$$G(r;r') = \sqrt{\frac{2\delta}{\pi|r-r'|}} \frac{exp(-|r-r'|/\delta)}{2\pi\mu_a\delta^2} exp\left[ \frac{-[\angle(\hat{n},r-r')-\pi/2]^2}{\alpha}\right],$$

where $\delta =\sqrt {D/\mu _a}$, $D = 1/3(\mu _a+\mu '_s)$, $\mu _a=\mu '_s=1.0$ (per arbitrary unit of length), and $\angle (\hat {n},r-r')$ is the angle between the normal to the surface of the source/detector and vector $r-r'$. In this simulation $R_m = \sigma U$ and $R_p = \sigma U$ where $\sigma = 0.001$ and $U$ is the identity matrix of appropriate dimension. The first term in the equation is the Green’s function for a point source/detector and the second term is included to add some minimum directivity caused by the numerical aperture of optical fibers, modeled by the parameter $\alpha =0.6$.

Figure 4 shows the snapshots of the reconstructed images produced by the first scanner. It was assumed that the system has a dynamics and the distribution is changing over time by moving toward the right-top of the scanner and simultaneously diffusing into the medium (matrix of the dynamics is given in Appendix C). Snapshots of this dynamics are shown in the first row. We assumed that measurements are made much faster than changes in the distribution so that during the period of each measurement the change in the state variable is negligible. Optimal, random, and raster protocols were tested and results are shown in the figure. Borders of the actual and reconstructed distributions are marked by red and green contours, respectively. The simulation clearly demonstrates the superior performance of the optimal scanner regarding the precision of reconstructed images. Figure 5 shows snapshots of the reconstructed images generated by the second scanner with the reduced number of sources and detectors. From the comparison between Figs. 4 and 5 we can see the effect of number of sources/detectors in the quality of reconstructed images. Obviously, larger number of sources/detectors, specifically for small source/detector counts, improve the precision of reconstructed images. However, after some point, this improvement is almost negligible, because of the close correlation between the recordings when these numbers are large.

Fig. 4. Snapshots of the actual dynamics (top row), estimated distributions after optimal (second row), random (third row), and raster measurements (forth row) are shown separately in each row generated by the fist scanner. The perturbation in this example is changing as a function of time by slowly diffusing in the medium as moving toward the up right corner of the scanner. Borders of the actual and reconstructed distributions are shown in all snapshots. Snapshot are for state estimations at time steps $n = [0,72,143,215,286,358,429,500]$. The simulations show how optimal measurements outperform the traditional raster and random scanning in terms of revealing more accurate pictures of absorption coefficient distribution in the medium.

Download Full Size | PDF

Fig. 5. The same snapshots as Fig. 4 but generated by the second scanner with a smaller number of sources/detectors.

Download Full Size | PDF

Figure 6 shows changes in the normalized uncertainty and error recorded as a function of the measurement number for both scanners. Normalized uncertainty is the trace of the covariance matrix after each measurement divided by the same value before conducting any experiments. Normalized error is the sum of square of differences between the actual and reconstructed pixels divided by the same value before conducting any measurements. Superior performance of the optimal scanning protocol is clear in these plots. For instance, for the first scanner, the normalized error reaches $0.58$ after $188$ measurements. The random scanning protocol reaches the same error after $500$ iterations while raster scanning does not reach this level within the first $500$ measurements. A more interesting result comes from the caparison between the performance of the first and second scanners. The second scanner, with much smaller number of sources and detectors, when following the optimal scanning protocol, has superior performance compared to the first scanner when using either raster or random protocols. For instance, the second scanner reached the normalized error of $0.63$ via optimal scanning at iteration $258$. The first scanner reached this normalized error after about $400$ iterations using the random protocol and about $500$ iterations using the raster protocol. Therefore, following the optimal protocol make it possible to reduce the complexity of the hardware and the operation cost while improving the speed and temporal resolution of the scanner.

Fig. 6. Normalized uncertainty and normalized error plotted as a function of the measurement number for optimal, random, and raster scanning algorithms for the first (top) and second (bottom) scanners. Optimal protocol outperforms other algorithms based on these two quantitative measures. Data shows that the optimal algorithm reaches the normalized error $0.58$ after $188$ measurements. The random protocol, which performs better than the raster scan, reaches the same error rate after $500$ measurements.

Download Full Size | PDF

5. Conclusion

A mathematical approach for the design of optimal experiments in Kalman filter was introduced in the article. By optimizing experiments, it is possible to reduce the number of resources used for the measurement to reduce the cost of the operation while achieving the same precision level in the computed estimations. The compromise is in the computational power used to calculate the optimal measurement matrix in each step.

When parameters of the dynamics (matrices $A(n)$ or $B(n)$, the initial value of the state vector and its covariance) are known, we can compute all optimal measurements in advance. Nonetheless, in many applications, parameters of the dynamics are unknown and system identification algorithms, such as the expectation maximization [16], subspace analysis [17], or dynamic mode decomposition [18], [19], are run concurrently to simultaneously estimate these parameters from the collected data in addition to estimating the state variable. In those cases, we cannot compute the direction of optimal measurements in advance and optimal measurements should be computed online. Conducting optimal experiments can also significantly improve the performance of system identification algorithms and the precision of state estimations [11].

There are two aspects to further investigate in the future. The first aspect is around the concept of observability. In control theory, observability is a measure of how well the state of a system can be inferred from feasible partial measurement [13]. Observability is a function of the measurement matrix. By optimizing the measurement matrix, we can improve observability of the partial measurements in closed-loop control systems. The approach taken here can be applied to the optimization of the observability Gramian and synthesize experiments that offer optimal observability.

The second direction for future studies is in the field of signal sampling. It was shown in the theory of compressed sensing that a signal which has a sparse representation in an isometric transform domain can be reconstructed from far less number of samples compared to what is prescribed by the Nyquist theorem [20], [21]. The theory of optimal measurements in this article was formulated around the idea of using the covariance between different components of the hidden state vector to find the most informative experiments. In the signal sampling problem, it is possible to use the correlation between values of a signal at different points in time and only sample the signal at limited points and yet reconstruct the entire signal, with high confidence, based on these correlations. The sparsity that is used in compressed sensing is the result of some form of correlation between samples that is extracted by an isometric transform. But do we know which transform is optimal and generate the most sparse representation? The optimal experiment theory as discussed can perform the same task but it has the potential to find the optimal transform domain. Therefore, there is a potential to surpass the performance of the traditional compressed sensing algorithm.

Appendix A.

To compute the posterior conditional probability $p(x|y)$, we factorize the joint probability distribution using $p(x,y)=p(x|y)p(y)$. Previously, it was assumed that:

(34)$$\begin{aligned}p(x,y) \sim &\mathcal{N} \left( \left[ \begin{array}{c} \hat{x} \\ \hat{y} \end{array} \right], \Sigma_{x,y}=\left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right)=\\ & \qquad frac{1} {(2 \pi)^{\frac{p+q}{2}}~ |\Sigma_{x,y}|^{\frac{1}{2}} } exp \left\{ -\frac{1}{2} \left[ \begin{array}{c} x- \hat{x} \\ y - \hat{y} \end{array} \right]^T \Sigma^{{-}1}_{x,y} \left[ \begin{array}{c} x-\hat{x} \\ y-\hat{y} \end{array} \right] \right\}. \end{aligned}$$

For brevity, we used the notation: $\hat {x} = x^{n|n-1}$, and $\hat {y}=C(n) x^{n|n-1}$. In this equation, $|\Sigma _{x,y}|$ is the determinant of the covariance matrix. In the following, we will use two lemmas to factorize this distribution and extract the equation of the conditional probability.

Lemma 1: Determinant factorization,

(35)$$|\Sigma_{x,y}| = |\Sigma_{x,y}/\Sigma_{22}|~\cdot~|\Sigma_{22}|,$$

where $\Sigma _{x,y}/\Sigma _{22}=\Sigma _{11} - \Sigma _{12} \Sigma ^{-1}_{22} \Sigma _{21}$ is the Schur complement of $\Sigma _{x,y}$. Based on this lemma, we can write:

(36)$$(2 \pi)^{\frac{p+q}{2}} |\Sigma_{x,y}|^{\frac{1}{2}} = (2 \pi)^{\frac{p}{2}} |\Sigma_{x,y}/\Sigma_{22}|^{\frac{1}{2}} \cdot (2 \pi)^{\frac{q}{2}} |\Sigma_{22}|^{\frac{1}{2}}.$$

Lemma 2: Matrix inversion,

(37)$$\Sigma^{{-}1}_{x,y} = \left[ \begin{array}{cc} I & 0 \\ \Sigma^{{-}1}_{22}\Sigma_{21} & I \end{array} \right] \left[ \begin{array}{cc} \Sigma_{x,y}/\Sigma_{22} & 0 \\ 0 & \Sigma_{22} \end{array} \right]^{{-}1} \left[ \begin{array}{cc} I & -\Sigma_{12}\Sigma^{{-}1}_{22} \\ 0 & I \end{array} \right].$$

If we use this matrix inversion lemma in Eq. (34), the exponential term can be factorized as:

(38)$$\begin{aligned} &exp \left\{ -\frac{1}{2} \left[ \begin{array}{c} x- \hat{x} \\ y - \hat{y} \end{array} \right]^T \Sigma^{{-}1}_{x,y} \left[ \begin{array}{c} x- \hat{x} \\ y - \hat{y} \end{array} \right] \right\} =\\ & \qquad exp \left\{ -\frac{1}{2} [x - \gamma]^T (\Sigma_{x,y}/\Sigma_{22})^{{-}1} [x - \gamma] \right\} \times exp \left\{ -\frac{1}{2} [y - \hat{y}]^T \Sigma^{{-}1}_{22} [y - \hat{y}] \right\}. \end{aligned}$$

Here, $\gamma = \hat {x} + \Sigma _{12}\Sigma ^{-1}_{22}(y-\hat {y})$. If we use the factorization of Eqs. (36) and (38) in Eq. (34), we get:

p(x,y) \sim \mathcal{N} \left( \left[ \begin{array}{c} \hat{x} \\ \hat{y} \end{array} \right], \Sigma_{x,y} \right) = \mathcal{N} \left( \gamma, \Sigma_{x,y}/\Sigma_{22} \right) \cdot \mathcal{N}(\hat{y},\Sigma_{22}).

Therefore,

(39)$$p(x|y) \sim \mathcal{N} \left( \hat{x} + \Sigma_{12}\Sigma^{{-}1}_{22}(y-\hat{y}), \Sigma_{x,y}/\Sigma_{22} \right).$$

By using Eq. (10), and the formulation of Kalman gain, in Eq. (3), we can show that:

(40)$$\Sigma_{x,y}/\Sigma_{22} = \left[~ I - K(n)C(n) ~\right] P^{n|n-1}.$$

Therefore,

(41)$$\begin{aligned}&p(x|y) \sim \mathcal{N} (x^{n|n},\Sigma_{x|y}),\\ &x^{n|n} = x^{n|n-1} + \Sigma_{12}\Sigma^{{-}1}_{22} \left\{~y-C(n)x^{n|n-1} ~\right\},\\ &\Sigma_{x|y} = \Sigma_{11} - \Sigma_{12} \Sigma^{{-}1}_{22} \Sigma_{21}. \end{aligned}$$

For the covariance matrix $\Sigma _{x|y}=P^{n|n}$, we can write:

\Sigma_{x|y} = P^{n|n-1} - P^{n|n-1}C^T \{CP^{n|n-1}C^T+R_m\}^{{-}1} CP^{n|n-1}

If we use the Kalman gain Eq. (3) here, we get:

(42)$$\Sigma_{x|y} = \{I - KC\}~P^{n|n-1} = P^{n|n}.$$

For further details, see [22], pages 144-149.

Appendix B.

Consider the following matrix inversion lemma:

Lemma 3: Assume that $Z\in \mathbb {R}^{n \times n}$ and $Y \in \mathbb {R}^{m \times m}$ are invertible square matrices and $X\mathbb {R}^{n \times m}$ and $U \mathbb {R}^{m \times n}$. We can prove that:

(43)$$( Z-X Y^{{-}1} U )^{{-}1} = Z^{{-}1} + Z^{{-}1} X ( Y - U Z^{{-}1} X )^{{-}1} U Z^{{-}1}.$$

If we assume that $Z^{-1}=-P^{n|n-1}$, $X=C(n)^T$, $U = C(n)$, and $Y=R_m$, then the right side of this matrix inversion equation matches the equation we derived for $P^{n|n}$ in (42).

(44)$$P^{n|n} ={-}(Z-X Y^{{-}1}U)^{{-}1} =\left\{(P^{n|n})^{{-}1}+C(n)^TR^{{-}1}_mC(n)\right\}^{{-}1}.$$

As a result:

(45)$$(P^{n|n})^{{-}1} = (P^{n|n-1})^{{-}1} + C(n)^TR^{{-}1}_m C(n),$$

which proves the claim.

Appendix C.

Dynamics was a combination of diffusion and motion. Diffusion was modeled by $\partial \eta /\partial t = \phi \nabla ^2 \eta$. Motion was modeled by the equation $\partial \eta / \partial t = v_x \partial \eta / \partial x + v_y \partial \eta / \partial y$. Each equation was discretized to form a matrix operator; matrix $\mathcal {D}$ for diffusion and matrix $\mathcal {M}$ for motion and the matrix of the dynamics was the multiplication of these two operators, $A = \mathcal {D}~ \mathcal {M}$.

We used the following finite difference equations to compute elements of each matrix:

(46)$$\begin{aligned}&\mathcal{M}^{n+1}_{i,j}=\left(1-\frac{4\Delta t \phi}{h^2}\right) \mathcal{M}^{n}_{i,j} + \frac{\Delta t \phi}{h^2} + \left(\mathcal{M}^{n}_{i,j-1} + \mathcal{M}^{n}_{i-1,j} + \mathcal{M}^{n}_{i+1,j} + \mathcal{M}^{n}_{i,j+1} \right).\\ &\mathcal{D}^{n+1}_{i,j}=\left(1+\frac{\Delta t (v_x + v_y) }{2 h}\right) \mathcal{D}^{n}_{i,j} + \frac{\Delta t v_x}{2 h} \left(\mathcal{D}^{n}_{i+1,j} - \mathcal{D}^{n}_{i-1,j}\right) + \frac{\Delta t v_y}{2 h} \left(\mathcal{D}^{n}_{i,j+1} - \mathcal{D}^{n}_{i,j-1}\right). \end{aligned}$$

Here, $n$ represents the progress in time and $i$ and $j$ are indices of the location along horizontal and vertical axes. In the simulations we chose: $h=0.2$, $\Delta t=0.1$, $v_x = -v_y = 0.005$ and $\phi =10^{-4}$.

Appendix D.

From Eq. (31) we have:

(47)$$\psi ={-}2~Tr \left[K(n)C(n)P^{n|n-1}\right] + Tr\left[ [K(n)C(n)]~P^{n|n-1}~[K(n)C(n)]^T \right]$$

Forbenius product of two matrices $A$ and $B$ is defined as $A:B=Tr(A^TB)$ for which we have $A:BC=B^TA:C=AC^T:B$. The equation for $\psi$ can be rewritten as:

(48)$$\psi ={-}2 K^T(n) P^{n|n-1}:C(n) + K(n)C(n) : K(n)C(n) P^{n|n-1}.$$

Now, we can take the gradient of both sides w.r.t. the illumination pattern $I(n)$:

(49)$$\begin{aligned}\nabla_{I(n)} \psi &={-}2 K^T(n) P^{n|n-1}: \nabla_{I(n)} C(n) +\\ &\nabla_{I(n)} [K(n)C(n)]:K(n)C(n) P^{n|n-1} + K(n)C(n): \nabla_{I(n)} [K(n)C(n) P^{n|n-1} ] =\\ &-2 K^T(n) P^{n|n-1}: \nabla_{I(n)} C(n) + 2 K(n)C(n) P^{n|n-1} : K(n) \nabla_{I(n)} C(n). \end{aligned}$$

If we use the equation $C(n)=G_2 diag[G_1 I(n)]$:

(50)$$\begin{aligned}\nabla_{I(n)} \psi &={-}2 G_2^T K^T(n) P^{n|n-1}: \nabla_{I(n)} diag[G_1 I(n)] +\\ & \qquad + 2 G_2^T K^T(n) K(n)C(n) P^{n|n-1} : \nabla_{I(n)} diag[G_1 I(n)]. \end{aligned}$$

We can show that $A:diag(B) = diag^*(A):B$. Therefore,

(51)$$\begin{aligned}\nabla_{I(n)} \psi &= diag^*\left[{-}2 G_2^T K^T(n) P^{n|n-1}\right]: \nabla_{I(n)} [G_1 I(n)] +\\ & \qquad + diag^*\left[ 2 G_2^T K^T(n) K(n)C(n) P^{n|n-1} \right] : \nabla_{I(n)} [G_1 I(n)], \end{aligned}$$

Which leads to:

(52)$$\nabla_{I(n)} \psi ={-}2 G_1^T diag^*\left[G_2^T K^T(n) P^{n|n-1}\right] + 2 G_1^T diag^*\left[G_2^T K^T(n) K(n)C(n) P^{n|n-1} \right],$$

which proves the claim.

Funding

National Science Foundation (2154267).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Zhang, F. Liu, B. Zhang, et al., “Imaging of pharmacokinetic rates of indocyanine green in mouse liver with a hybrid fluorescence molecular tomography/x-ray computed tomography system,” J. Biomed. Opt. 18(4), 040505 (2013). [CrossRef]

2. Y. Zhang, L. Zhang, G. Yin, et al., “In vivo pharmacokinetics assessment of indocyanine green-loaded nanoparticles in tumor tissue with a dynamic diffuse fluorescence tomography system,” Mol. Imaging Biol. 21(6), 1044–1053 (2019). [CrossRef]

3. D. Kawashima, T. Yuki, S. Li, et al., “Non-invasive imaging of ion concentration distribution around cell spheroids by electrical impedance tomographic sensor printed on circuit board under temporal compensation by ion transport impedance model,” Biosens. Bioelectron. 212, 114432 (2022). [CrossRef]

4. G. R. Myers, A. M. Kingston, T. K. Varslot, et al., “Dynamic tomography with a priori information,” Appl. Opt. 50(20), 3685–3690 (2011). [CrossRef]

5. S. P. Boyd and L. Vandenberghe, Convex optimization (Cambridge university press, 2004).

6. J. Dutta, S. Ahn, A. A. Joshi, et al., “Illumination pattern optimization for fluorescence tomography: theory and simulation studies,” Phys. Med. Biol. 55(10), 2961–2982 (2010). [CrossRef]

7. R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions ASME–Journal Basic Eng. 82(1), 35–45 (1960). [CrossRef]

8. S. Kay, “Fundamentals of statistical signal processing volume i, and ii: Estimation theory and detection theory,” Beijin: Publishing House of Electronics Industry 41, 1 (2006).

9. F. Auger, M. Hilairet, J. M. Guerrero, et al., “Industrial applications of the kalman filter: A review,” IEEE Trans. Ind. Electron. 60(12), 5458–5471 (2013). [CrossRef]

10. S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,” Proc. IEEE 92(3), 401–422 (2004). [CrossRef]

11. M. Javidan, H. Esfandi, R. Anderson, et al., “Optimal data acquisition in tomography,” J. Opt. Soc. Am. A 40(12), 2259–2276 (2023). [CrossRef]

12. M. Javidan, H. Esfandi, and R. Pashaie, “Optimization of data acquisition operation in optical tomography based on estimation theory,” Biomed. Opt. Express 12(9), 5670–5690 (2021). [CrossRef]

13. J. P. Hespanha, Linear systems theory (Princeton university press, 2018).

14. D. A. Boas, C. Pitris, and N. Ramanujam, Handbook of biomedical optics (CRC press, 2016).

15. R. T. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM J. Control Optim. 14(5), 877–898 (1976). [CrossRef]

16. Z. Ghahramani and G. E. Hinton, “Parameter estimation for linear dynamical systems,” Technical Report CRG-TR-96-2, University of Totronto, Dept. of Computer Science (1996).

17. P. Van Overschee and B. De Moor, Subspace identification for linear systems: Theory—Implementation—Applications (Springer Science & Business Media, 2012).

18. P. J. Schmid, “Dynamic mode decomposition of numerical and experimental data,” J. Fluid Mech. 656, 5–28 (2010). [CrossRef]

19. J. H. Tu, “Dynamic mode decomposition: Theory and applications,” Ph.D. thesis, Princeton University (2013).

20. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

21. M. Lustig, D. L. Donoho, J. M. Santos, et al., “Compressed sensing mri,” IEEE Signal Process. Mag. 25(2), 72–82 (2008). [CrossRef]

22. R. Shadmehr and S. Mussa-Ivaldi, Biological learning and control: how the brain builds representations, predicts events, and makes decisions (MIT Press, 2012).

Illumination pattern optimization in tomography based on the Kalman estimation filter and optimal experiment design

Abstract

1. Introduction

2. Theoretical background

2.1 Basics of the Kalman filter

2.2 Correlation between Kalman filter, Bayesian learning, and maximum likelihood estimation

3. Optimal experiment

4. Optimal scanning in diffuse optical tomography

5. Conclusion

Appendix A.

Appendix B.

Appendix C.

Appendix D.

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Equations (54)

Optics Express