Reconstructing a 3D heart surface with stereo-endoscope by learning eigen-shapes

Bo Yang; Chao Liu; Wenfeng Zheng; Shan Liu; Keli Huang

doi:10.1364/BOE.9.006222

1. Introduction

In robotic-assisted off-pump heart surgery, heart beating considerably influences the accuracy of surgical operations, resulting in longer surgical duration and increased surgical risks. By measuring heart motion and actively synchronizing surgical instruments with this motion, a technique called active motion compensation, it is possible to provide a virtually stable operating environment to surgeons [1, 2]. Passive vision using stereo-endoscope, compared with other sensing techniques such as structured lighting [3, 4] and ultrasound [5], is more appropriate for measuring heart motion in a Minimally Invasive Surgery (MIS) because no additional instrument ports are required, as indicated in [6]. The 3D reconstruction with endoscope is also important for other advanced surgical techniques, e.g. augmented reality guidance [7], and multispectral [8] or multimodal [9] imaging. In addition, it is useful for offline applications as well, such as surgery simulation [10] and visual medical record [11], for learning, training or evaluation purposes.

However, in a highly dynamic MIS, it is very challenging to track and reconstruct Regions of Interest (ROIs) with complex soft-tissue deformations from real-time endoscopic videos. Model-based methods have been explored [12–18]. With a parameterized deformable model, an initial ROI is warped as a template to match pixels at subsequent stereo-pair frames. Model complexity (rank) can be simply measured by the number of model parameters. Low-rank models, such as rigid and affine models, are robust and computationally efficient but difficult to deal with complex soft-tissue deformations, while high-rank models generally suffer from problems of parameter convergence and heavy computational burden, difficult to meet real-time and robustness requirements, as indicated in [16].

On the other hand, most existing models, typically based on the mesh [2, 11, 13, 19] or spline [12, 14, 16–18], are designed for general deformations with certain continuity and smoothness constraints, which do not take into account the statistics and specificities of ROIs. Statistical shape models have been explored to reconstruct static anatomical structures, such as bone surfaces from X-ray images [20] and employed as a priori information in a wide range of segmentation tasks from CT [21], MRI [22], 3D ultrasound [23], and optical coherence tomography [24, 25]. The training shapes in the literature are usually from CT databases or obtained manually by experts from pre-operative imaging. To the author’s knowledge, statistical modeling in full 3D space for highly dynamic soft-tissues from stereo-endoscope is still open. It is almost impossible to establish a valid pre-operative shape database for beating heart. The shape similarity of anatomical structures from a given population, which has been used for statistical modeling in the literature, is unavailable for modeling beating heart due to high dynamics of cardiac soft-tissues.

In this work, we first derive a general 3D shape representation of ROI by decoupling a Thin Plate Spline (TPS) mapping into shape and position components. The high-rank version of the decoupled TPS is employed to pre-reconstruct a ROI for a number of frames to obtain ROI-specific shape data. Eigen-shapes are then learnt from these data, on which a low-rank 3D model is built and used to reconstruct subsequent frames. The linear transferability of Principal Component Analysis (PCA) is proved, which significantly reduces the computational burden of eigen-shape learning and thus makes real-time statistical modeling possible. A general model-based reconstruction framework with an iterative algorithm for estimating model parameter is presented. The feasibility and effectiveness of the proposed method is finally validated on stereo-endoscopic videos recorded by da Vinci (Intuitive Surgical, CA) robots.

2. Methods

2.1. Representing 3D shape with TPS

The TPS has been widely used in image alignment and shape matching, and recently used to handle soft-tissue deformations [14, 17]. We extend and decouple the classical TPS mapping into two independent components related to shape and position, respectively.

Let m ∈ ℝ² denote pixel coordinates. The TPS of K Control Points (CPs) is written as

f (m) = φ (m) α + [m^{T} 1] β,

where φ (m) is a row vector of elements

{φ (‖ m - c_{k} ‖)}_{k = 1}^{K}

with φ (r) = r² ln r and c_k being the k-th CP, and {α, β} are parameter vectors associated with non-affine and affine transformations, respectively. All vectors are column vectors, unless otherwise specified. To obtain square-integrable second derivatives, the following constraint is imposed.

[\begin{matrix} {\tilde{c}}_{1} & {\tilde{c}}_{2} & \dots & {\tilde{c}}_{K} \end{matrix}] α = 0

where

\tilde{x}

denotes the augmented vectors of x by adding 1 as the last element. By substituting it into Eq. (1), we can re-parameterize the non-affine part of the TPS as φ′(m)α′, where α′ is a K − 3 dimensional subvector of α (three elements of α are eliminated).

Let p ∈ ℝ³ denote 3D coordinates in the real space. By separately mapping in the x, y, z directions, the TPS can be extended to ℝ² → ℝ³ that maps m to p with the parameter vectors ${α_{X}^{'}, β_{X}, α_{Y}^{'}, β_{Y}, α_{Z}^{'}, β_{Z}}$ , which contain 3K model parameters (scalars) in total.

Let $ℛ = {m_{n}}_{n = 1}^{N}$ be a ROI. Assume that m_o is the center of ℛ, and its 3D counterpart p_o can denote the position of ℛ. The relative 3D coordinates of N pixels with respect to p_o can represent the 3D shape of R that is position-independent. Let p^′ = p − p_o,

p^{'} = [\begin{matrix} b (m) & 0 & 0 \\ 0 & b (m) & 0 \\ 0 & 0 & b (m) \end{matrix}] [\begin{matrix} θ_{X} \\ θ_{y} \\ θ_{Z} \end{matrix}] = B (m) θ,

where b(m) = [φ′(m)− φ′(m_o), (m − m_o)^T], and θ_x/y/z is a parameter vector stacked by

α_{x / y / z}^{'}

and the first two elements of β_x/y/z. Accordingly, the 3D shape of the ROI is represented as

s = B θ,

where s is a N′-dimensional shape vector stacked by

{p_{n}^{'}}_{n = 1}^{N}

with N= 3N, and B is a N′× K′ transformation matrix stacked by

{B (m_{n})}_{n = 1}^{N}

with K= 3(K − 1).

The shape and position parameters {θ, p_o} together determine a 3D surface in the real space, and the decoupled TPS model can be rewritten as p = B(m)θ + p_o.

2.2. Low-rank shape modeling based on eigen-shapes

With enough CPs, the TPS model can accurately describe any 3D surface. Such general models, however, are inefficient for the ROI reconstruction of beating heart. Since heart motion is normally quasi-periodic with stable statistics [26], it is possible to learn a low-rank model for a specific ROI from its past shapes. For this, the PCA is applied to the shape data.

Assume that ${s_{l}}_{l = 1}^{L}$ are past shapes. A shape data matrix of row-wise zero mean is built:

S = [\begin{matrix} s_{1} - \bar{s} & s_{2} - \bar{s} & \dots & s_{L} - \bar{s} \end{matrix}], \bar{s} = \sum_{l = 1}^{L} s_{l} / L

The PCA of S can be computed by using the Eigen Decomposition (ED):

S S^{T} = U Λ U^{T},

where Λ is a diagonal matrix of the eigenvalues

{λ_{j}}_{j = 1}^{N^{'}}

sorted in descending order, and the columns of U are the corresponding eigenvectors. Each eigenvector identifies a specific shape in the 3D space, so we call it eigen-shape.

According to the Eckart-Young theorem, a data matrix can be optimally rebuilt on the first few eigenvectors. Assume that $\hat{S}$ is the rebuilt matrix on the first J eigenvectors. The rebuilding error defined with the Frobenius norm is

{‖ S - \hat{S} ‖}_{F}^{2} = \sum_{j = J + 1}^{N^{'}} λ_{j}

Since the eigenvectors corresponding to the largest eigenvalues are reserved, information loss caused by $\hat{S}$ is minimal of all possible J-rank rebuilding. In fact, the rebuilt data may be closer to ground truth, as the original data are usually contaminated by measurement noise, which is more likely to rest on the discarded eigenvectors.

For quasi-periodic heartbeat, the shape vectors of a ROI at different moments are correlated and sparse-distributed in the high-dimensional shape vector space. Accordingly, the past shapes of the ROI can be recovered accurately on the principal eigen-shapes. With the assumption of stable statistics, the shapes of the ROI in the future can also be reconstructed accurately with

s = U_{J} w + \bar{s},

where U_J is the submatrix of U formed by the first J eigenvectors, and w is a new parameter vector of J dimension. Compared with the general shape model in Eq. (4), the new model in Eq. (8) is more efficient as statistics of the ROI are introduced.

However, the PCA for a large matrix is computationally expensive in both time and memory. The complexity for analyzing S is O (N^′L²⁾ by assuming L ≤ N^′. It is impractical for most real-time systems, as N^′ and L are very large for a typical reconstruction task. Here, we develop a fast eigen-shape algorithm, which reduces the complexity to O (LK^′2) where K^′ ≪ L.

Since rank (B) = K^′, for the shape matrix S there are at most K^′ valid eigen-shapes that correspond to non-zero eigenvalues. Assume that Θ is the shape-parameter matrix of S, i.e. S = BΘ. Let $U_{Θ} Λ_{Θ} U_{Θ}^{T}$ be the ED of ΘΘ^T, then it can be easily proved that:

Proposition 1

Λ_Θ and BU_Θ are the non-zero eigenvalues and eigenvectors of SS^T, respectively, if and only if B is column-orthonormal.

The proposition shows that with an orthonormal transformation matrix (called transferring condition), the PCA can be transferred from a large data matrix to a small parameter matrix. Unfortunately, the shape transformation matrix B of the TPS is not orthonormal, which distorts the mapping from the eigen-parameters (U_Θ) to eigen-shapes (U).

To overcome this problem, we orthonormalize B with the QR decomposition B = B^′R, and re-parameterize Eq. (4) as s = B′θ′. Correspondingly, the parameter matrix Θ′ is built. By using Proposition 1, the eigen-shapes can be computed fast with B′U_Θ′, and the eigenvalues are obtained from Λ_Θ′ directly. The computational complexity is hence reduced to O(LK^′2).

Though proposed for shape modeling, Proposition 1 is useful to other PCA tasks associated with linear systems. We can extend it from the point view of function composition:

Proposition 2

Given two real vector spaces that are isomorphic, there is a linear function $ℱ : X \mapsto B_{V} B_{V}^{T} X$ , mapping one vector set (matrix) X in $U$ to another in $V$ , where $B_{U / V}$ is a transformation matrix formed by an orthonormal basis of $U / V$ . Let $P : X \mapsto U_{X}$ and $S : X \mapsto Λ_{X}$ be the functions of calculating eigenvectors and eigenvalues from a vector set, respectively. Then there are $P \circ ℱ = ℱ \circ P$ and $S \circ ℱ = S$ for any vector set in $U$ .

The proof is given in Appendix A. Proposition 2 shows that between any two isomorphic real vector spaces, we can always find an invertible linear transformation (bijection) that is commutative with the eigenvector operation and transparent to the eigenvalue operation, such that we can map the eigenvectors from one space to another and keep the eigenvalues unchanged.

2.3. Model-based tracking and reconstruction

The past shapes can be collected with high computational cost and even manual intervention by employing a high-rank TPS for pre-reconstruction. The shape transformation matrix of the TPS can be orthonormalized offline because it is constant for any ROI of given size.

The complete steps of the proposed reconstruction method are summarized as follows:

Given a ROI (ℛ) in the template image, the shape transformation matrix of a K-CP TPS is constructed and orthonormalized with B′ = BR⁻¹.
The first L frames are pre-reconstructed with the orthonormalized TPS model, written as $\forall m \in ℛ, p = B^{'} (m) θ^{'} + p_{o},$ where B^′(m) is the submatrix of B′ associated with m. As a result, ${θ_{l}^{'}}_{l = 1}^{L}$ are saved.
The parameter matrix Θ′ is built and decomposed by $Θ^{'} {Θ^{'}}^{T} = U_{Θ^{'}} Λ_{Θ^{'}} U_{Θ^{'}}^{T}$ .
A low-rank deformable model is built on the first J columns of U_Θ′ (denoted as [U_Θ′]_J), $\forall m \in ℛ, p = U_{J} (m) w + \bar{s} (m) + p_{o},$ where U_J (m) = B^′(m)[U_Θ′]_J and $\bar{s} (m) = B^{'} (m) {\bar{θ}}^{'}$ .
The subsequent frames from L + 1 are reconstructed with the new model in Eq. (10).

With a calibrated stereo-endoscope, the ROI reconstruction involved in the steps 2) and 5) can be converted into an optimization problem, i.e., finding the best model parameters that minimize the alignment error of ROI between the template image and the current stereo-pair images.

We discuss the problem, without loss of generality, based on a general ℝ² → ℝ³ model:

p = T (m) ξ + d (m),

where ξ is the parameter vector, T (m) and d (m) are the transformation matrix and offset vector associated with m, respectively. Many existing models can be written in this form. For linear models, both T (m) and d (m) are linear with respect to m, such as the affine model [2], while for nonlinear models at least one of them is nonlinear, such as the TPS, B-Spline [12], Piecewise Bilinear Mapping (PBM) [13], as well as the new statistical model in Eq. (10). The TPS in Eq. (9), for instance, is obtained by letting T (m) = [B′(m), I₍₃₎], d(m) = 0, and ξ be a combined vector of θ′ and p_o, where I₍₃₎ denotes a 3-dimensional identity matrix.

As illustrated in Fig. 1, given a pixel m in the template image, its 3D coordinates p in the real space are determined by the model parameter ξ to be estimated by using Eq. (11). According to the pinhole camera model, the perspective projection of p onto the left/right camera plane of the stereo-endoscope can be represented by the homogeneous coordinates

p^{L / R} = C_{L / R} \tilde{p},

where C_L/R is the 3×4 left/right camera matrix, which can be calibrated with Zhang’s method [27] before surgery. The corresponding pixel coordinates can be calculated by

m^{L / R} = ℋ (p^{L / R}),

where ℋ maps homogeneous coordinates to pixel coordinates,

ℋ ([\begin{matrix} a \\ b \\ c \end{matrix}]) = [\begin{matrix} a / c \\ b / c \end{matrix}]

Fig. 1 Illustration of model-based ROI reconstruction

Download Full Size | PDF

Let $ℐ_{T}$ denote the intensity function of the template image and $ℐ_{L / R}$ be that of the current left/right image. For the given ROI, there should be

ℐ_{T} (m) \equiv ℐ_{L / R} (m^{L / R} |_{ξ^{*}}), \forall m \in ℛ

with an ideal parameter ξ^∗. However, in most cases, the ideal parameter does not exist, involving the solution of an overdetermined nonlinear system of equations. With the sum of squared pixel differences. The reconstruction problem can be formulated as

\arg \min_{ξ} ({‖ I_{L} - I_{T} ‖}^{2} + {‖ I_{R} - I_{T} ‖}^{2}),

where I_L/R and I_T are N-dimensional intensity vectors stacked by

{ℐ_{L / R} (m_{n}^{L / R})}_{n = 1}^{N}

and

{ℐ_{T} (m_{n})}_{n = 1}^{N}

, respectively. Regularization items can be added in Eq. (16) for robust solution, as in [28].

The Efficient Second-order Minimization (ESM) technique [29] is used to solve Eq. (16), which has a wider convergence basin and a faster convergence rate than traditional minimization algorithms. In the ESM, the model parameter is updated iteratively at each frame, ξ ← ξ + Δξ, until ║Δξ║ is less than a preset threshold. The increment is calculated by

Δ ξ = - 2 {[\begin{matrix} \nabla_{ξ} I_{L} + \nabla_{ξ^{*}} I_{L} \\ \nabla_{ξ} I_{R} + \nabla_{ξ^{*}} I_{R} \end{matrix}]}^{+} [\begin{matrix} I_{L} - I_{T} \\ I_{R} - I_{T} \end{matrix}],

where ∇_xY denotes the Jacobian matrix of Y at x, and “+” denotes the matrix pseudo-inverse. The computations of the Jacobians in Eq. (17) are given in Appendix B.

It should be noted that the proposed modeling method as well as the reconstruction algorithm, though derived from the TPS, apply equally to other deformable models. The shape representation in Eq. (4) and thus the statistical shape model in Eq. (8) can be derived from any ℝ² → ℝ³ deformable model that can be written in the form of Eq. (11), including the B-spline and the PBM mentioned before. In fact, the ℝ² → ℝ B-spline based statistical shape model used for tissue segmentation in [23] is sub-optimal, because it is learnt simply from the parameter spaces without orthonormalizing the transformation matrix, which distorts the mapping from eigen-parameters to eigen-shapes, as proved in Proposition 1.

3. Results

Taking shape statistics into account, the low-rank model can be efficient in computation and robust against divergence of model parameter, when incorporated into the model-based reconstruction framework. Four stereo-endoscopic videos captured by the da Vinci robots are used to validate the proposed method. Each video consists of 800 stereo-pair frames with a frame rate of 25 Hz. As shown in Fig. 2, the videos I and II record the motion of a phantom heart model [30, 31],and the videos III and IV are in vivo beating hearts in real TECAB procedures [32]. Color images are converted into 8-bit grayscale for efficient processing.

Fig. 2 Stereo-endoscopic videos (I-IV) for validation

Download Full Size | PDF

3.1. Sparsity and statistical stability of shape data

The rank of eigen-shape based model (J) is adjustable according to the efficiency and robustness requirements of reconstruction tasks in practice, as well as the sparsity of shape data itself, which can be indicated by the eigenvalues. A guideline for choosing J is given in our experiments.

To obtain shape data, an orthonormalized 9-CP TPS is employed to reconstruct a 120 × 120 ROI on each video for 600 frames. At each frame, the ESM is used to estimate the shape and position parameters iteratively. Specular reflections caused by glossy tissue surfaces are removed by using a simple thresholding method [14]. The reconstruction flow is restarted manually when it is interrupted. We finally obtain four 24 × 600 shape-parameter matrices, which identify four hyper-high dimensional shape matrices (43200 × 600).

The eigenvalues and eigen-shapes are computed from shape-parameter matrices by using Proposition 1. As shown in Fig. 3(a), the eigenvalues on each video are sparse, indicating that there is obvious correlation between the shape vectors. The 24-dimensional shape subspace spanned by the 9-CP TPS is redundant for the ROI. Therefore, it is feasible to rebuild the shapes in a lower-dimensional space spanned by few eigen-shapes.

According to Eq. (7), the rebuilding error can be indicated more intuitively with the Signal-to-Noise Ratio (SNR) γ and the Root-Mean-Square Error (RMSE) σ_p, where

γ = 10 \lg (\frac{\sum_{j = 1}^{K^{'}} λ_{j}}{\sum_{j = J + 1}^{K^{'}} λ_{j}}), σ_{p} = \sqrt{\frac{\sum_{j = J + 1}^{K^{'}} λ_{j}}{N L}}

Figure 3(b) gives the SNR and RMSE curves with respect to J. In our study, we choose the least J that ensures γ > 20 dB. Following this guideline, the first 8/6/12/10 eigen-shapes are reserved for statistical modeling on the video I/II/III/IV, corresponding to the RMSE of 0.048/0.072/0.070/0.094 mm/pixel, much lower than the measurement error (about 0.1 mm) of the stereo-endoscope image system itself and thus ignorable.

Fig. 3 PCA of shape data on videos I-IV

Download Full Size | PDF

To show the statistical stability of shape data, we compared the PCA results computed from the first 600 frame and from all 800 frames. Figure 4 shows the corresponding mean shapes and first eigen-shapes on each video. The differences between the corresponding shapes are trivial (of the order of 0.01 mm/pixel), showing that the statistics of heartbeats are stable and thus reconstructing future frames with the statistical model trained from past frames is feasible.

3.2. Reconstruction experiments

In the reconstruction experiments, the first L = 600 frames in each video, serving as training frames, are pre-reconstructed with the 9-CP TPS, while the subsequent 200 frames, serving as test frames, are reconstructed with the eigen-shape based Statistical Deformable Models (SDMs).

Fig. 4 Mean shapes and 1st eigen-shapes calculated from all 800 frames vs. from the first 600 frames. For each video, the first row shows the mean shapes, the second row the 1st eigen-shapes; the left shapes are from all 800 frames, the right shapes from the first 600 frame.

Download Full Size | PDF

3.2.1. Qualitative analyses

Since the ground truth is unknown, reconstructed ROI is usually evaluated qualitatively by visually inspecting its concordance with heart beating and by analyzing its motion trajectories.

Figure 5 shows some reconstructed frames in each video. The SDMs can successfully handle soft-tissue deformations on beating hearts. The 3D shapes reconstructed at various frames are consistent with those perceived by human vision, and the center of the ROI is located accurately.

For comparison, the corresponding frames reconstructed by a 4-CP TPS on the video I are given in Fig. 6. Even though the 4-CP TPS has 12 parameters more than those (8 eigen-shape weights + 3 position parameters) of the SDM learnt for the video I, the 3D surfaces represented by the 4-CP TPS appear to be over-smooth with many 3D details missed out, comparing with those in Fig. 5(a), and therefore the advantages of the SDM are demonstrated.

The motion trajectories of the ROIs on the phantom videos (I and II) and their Power Spectral Densities (PSD) are shown in Fig. 7. The trajectories of p_o along x, y, z axes exhibit highly regular periodicity, and their power concentrates at 1.5 Hz and its second (3.0 Hz) and third (4.5 Hz) harmonics, which is consistent with the given rate of phantom heart beating.

3.2.2. Quantitative evaluation

The SDMs are evaluated quantitatively and compared with well-established deformable models based on the ESM solution in terms of robustness, accuracy and speed. The reference models under comparison include the 9-CP TPS, 9-CP PBM and 9-CP B-Spline (all of them contain 27 parameters), and the Quasi-Spherical Triangle (QST) [16] of 10 parameters. In addition, the 4-CP TPS is tested on the phantom videos and the 5-CP TPS on the in vivo videos, since their ranks are close to those of the SDMs built for the corresponding videos.

Fig. 5 Reconstructed ROI with the statistical deformable models on various videos. (Visualization 1, Visualization 2, Visualization 3, Visualization 4)

Download Full Size | PDF

Fig. 6 Reconstructed ROI with the 4-CP TPS on video I.

Download Full Size | PDF

Fig. 7 Motion trajectory analysis of ROI (p_o) tracked by the statistical deformable model.

Download Full Size | PDF

Robustness

The ROI reconstruction with stereo-endoscope is very challenging. The processing chain for continuous image sequence is often interrupted by dynamic effects in the MIS. As recursive structure is usually adopted for efficiency, in which the final estimate at the current frame serves as an initial estimate for the next frame, reconstruction algorithms normally cannot recover by themselves when interrupted and must be restarted. The robustness of reconstruction algorithms can be measured by the number of frames processed continuously.

All 200 test frames on the phantom videos can be continuously processed by all investigated models, due to ideal imaging conditions and less dynamic disturbance. On the in vivo videos, the SDMs and the QST can take all 200 test frames, performing much better than the 9-CP TPS, 9-CP PBM and 9-CP B-Spline models which take averagely 58, 47 and 42 frames, respectively. It verifies that the high-rank models with too many parameters are susceptible to dynamic effects in the MIS, making the ESM prone to divergence, while the low-rank models with fewer parameters are easier to find optimal solutions in the noisy environments. The SDMs also perform better than the 5-CP TPS (taking averagely 152 frames) on the in vivo videos, though the ranks of these models are identical or similar. The result shows that for a specific ROI, the solution-space designed online by statistical modelling is more efficient than the fixed solution-space spanned by a general deformable model of identical or similar rank.

Accuracy

For evaluating accuracy, natural landmarks in the ROIs are manually tracked in both left and right image sequences. The 3D coordinates of the landmarks obtained by the model-based reconstruction are projected back onto the left and right images, and compared with the manually tracked results. To ensure the comparability of the results, interrupted algorithms are restarted manually. The joint pixel error is calculated at each frame

ϵ = \sqrt{{‖ m^{L} - {m^{'}}^{L} ‖}^{2} + {‖ m^{R} - {m^{'}}^{R} ‖}^{2}},

where m^L/R denotes the pixel coordinates projected onto the left/right image and m′^L/R denotes those manually tracked. The Mean Error (ME) and Standard Deviation (SD) of ϵ over 200 test frames on each video are shown in Table 1.

On the phantom videos, good results are obtained by all models, as in the robustness test. The high-rank (9-CP) models have a slight advantage over the low-rank models (4-CP TPS, SDMs and QST). However, the SDM of 8/6 eigen-shapes on the video I/II performs better than the 4-CP TPS and QST, though have similar ranks. On the in vivo videos, dynamic effects in the MIS complicate the reconstruction task. The best results are obtained by the SDMs, showing that the proposed method is efficient for handling complicated deformations on in vivo hearts. Although the SDMs are trained from the shape data obtained by the 9-CP TPS, they are still able to achieve more accurate results than the latter in in vivo settings, thanks to the intrinsic advantage that the eigen-shapes extracted by the SDMs are insensitive to small noises in the training data.

Speed

The computational cost of reconstruction algorithms depends on the rank of the employed model. The most time-consuming operation in the ESM iteration is the pseudo-inverse calculation of the Jacobian matrix in Eq. (17). For a K-rank model on a N-size ROI, the computational complexity for one frame is O(µNK²), where µ is the number of the ESM iterations. Typically, on the in vivo videos, the average µ of the 12 eigen-shape SDM is 6.8, close to that of the QST and about half those of the 9-CP models, confirming that models with fewer parameters are easier to converge. For the same ROI, the computational cost of a typical 12 eigen-shape SDM is nearly 15% that of a 9-CP model.

To evaluate practical speed, the average CPU time is computed based on our experimental platform (Matlab R2012a, Intel Xeon E3-1231 CPU and 8GB RAM). The speed of the 12 eigen-shape SDM is 2.9 s/frame, much faster than 9.2 s/frame of the 9-CP TPS. By means of statistical modeling, real-time reconstruction is feasible if Graphics Processor Units (GPU) and more efficient programming languages (e.g. C/C++) are adopted.

Finally, the SDMs built with different SNR guidelines (γ > 30 dB and γ > 10 dB) are tested. For γ > 30 dB, the first 12/10/19/16 eigen-shapes are reserved on the video I/II/III/IV. There is not significant improvement in the reconstruction accuracy though more eigen-shapes are employed, while the robustness and speed are decreased obviously. Their performances are between those of the SDM (γ > 20 dB) and the 9-CP TPS. For γ > 10 dB, only the first 4/2/5/5 eigen-shapes are reserved on the video I/II/III/IV, which results in faster and more robust reconstructions. However, the reconstructed surfaces are inaccurate due to too few shape degrees of freedom. The tracking errors are even larger than those of the QST in Table 1.

Overall, high-rank models can render tissue surfaces more accurately in theory; however, in practice they often confuse parameter estimation algorithms due to too many degrees of freedom. For general deformable models based on spline or mesh, increasing CPs will result in a quadratic growth of computational burden and cause model parameters difficult to convergence, while reducing CPs may lead to over-smoothing and loss of 3D details. It is a dilemma for these general models to choose an appropriate rank, since they were designed for handling general surfaces without taking into account the statistics and specificities of the ROI. The proposed method provides a good solution to this dilemma, which can balance the robustness, accuracy and speed well. As shown in the experiments, the SDMs are efficient in computation and robust against dynamic disturbances without sacrificing reconstruction accuracy, and they are fully superior to the general models of the same or similar rank.

Table 1. Tracking errors (ME±SD) compared with manually tracked results.

View Table

4. Conclusions

We propose a statistical modeling method to reconstruct ROIs on beating heart from stereo-endoscopic video. Statistical shape models are trained online by learning eigen-shapes from past shapes of ROIs. To obtain past shapes, the high-rank versions of classical deformable models (such as but not limited to the TPS) after decoupling and orthonormalizing, can be employed to pre-reconstruct a ROI for a number of frames. The linear transferability of PCA is proved, which can significantly reduce the computational burden of eigen-shape learning. Since the statistical model is trained specially for fitting the 3D structure of the ROI, it is very efficient when incorporated into a model-based reconstruction framework, which formulates the dynamic reconstruction as an optimization problem of model parameters for each stereo image frame. Based on a general model equation, the efficient second-order minimization algorithm is derived to iteratively solve model parameters. We validate the modeling method on the stereo-endoscopic videos collected by the da Vinci robots. The analyses of shape data from various videos show the sparsity of eigenvalues and statistical stability of eigen-shapes, and thus the feasibility of low-rank statistical modeling. The ROI reconstruction experiments demonstrate the efficiency of the statistical models and their advantages over the traditional models in balancing robustness, accuracy and speed. The proposed method provides an efficient and robust way to track and reconstruct dynamic soft-tissues with stereo-endoscope, which is important for further improving the functionality and applicability of surgical robots. Besides, the proposed method is also useful for offline tasks, e.g. surgical simulation or visual medical record, which require efficient representation and rendering of 3D tissues.

Appendix A. Proof of proposition 2

Since $U ≅ V, U$ and $V$ have the same dimension, denoted to be M without loss of generality. Since $B_{U / V}$ is a column-orthonormal, for any vector set D ⊂ ℝ^M, using Proposition 1 gives

P (B_{U / V} D) = B_{U / V} P (D)

Suppose $X \subset U$ . Since $U ≅ ℝ^{M}$ , there is a unique coordinates set D ⊂ ℝ^M that identifies X with the orthonormal basis B, i.e. $X = B_{U} D$ and $D = B_{U}^{T} X$ . From Eq. (20), one can get

P (B_{V} B_{U}^{T} X) = P(B_{V} D) = B_{V} P (D)

P (D) = B_{U}^{T} B_{U} P (D) = B_{U}^{T} P (B_{U} D) = B_{U}^{T} P (X),

and thus

P \circ ℱ (X) = P (B_{V} B_{U}^{T} X) = B_{V} B_{U}^{T} P (X) = ℱ \circ P (X)

.

Similarly, using Proposition 1 gives $S (B_{U / V} D) = S (D)$ , and thus $S \circ ℱ (X) = S (B_{V} D) = S (B_{U} D) = S (X)$ .

Appendix B. Computations of Jacobians

B.1. Computation of ∇_ξI_L/R

The n-th row of ∇_ξI_Y (with Y denoting L or R) can be decomposed into four sub-Jacobians

\nabla_{ξ} ℐ_{Y} (m_{n}^{Y}) = \nabla_{m_{n}^{Y}} ℐ_{Y} \nabla_{p_{n}^{Y}} ℋ \nabla_{p} p^{Y} \nabla_{ξ} p |_{m_{n}},

where:

$\nabla_{m_{n}^{Y}} ℐ_{Y}$ is the gradient (1 × 2) of the image $ℐ_{Y}$ at $m_{n}^{Y}$ .
$\nabla_{p_{n}^{Y}} ℋ$ is the derivative of function ℋ at $p_{n}^{Y}$ . From Eq. (14), one derives $\nabla_{x} ℋ = [\begin{matrix} 1 / c & 0 & - a / c^{2} \\ 0 & 1 / c & - b / c^{2} \end{matrix}]$
∇_pp^Y ≡[C_Y]₃ and $\nabla_{ξ} p |_{m_{n}} \equiv T (m_{n})$ are constant for a given ROI.

B.2. Computation of $\nabla_{ξ^{*}} I_{L / R}$

The Jacobian cannot be computed directly because the ideal parameter ξ^∗ is unknown. An effective approximation is derived by using the template image and the current parameter ξ.

Given a parameter ξ, the mapping ℳ_ξ : m ↦ m^Y is normally invertible, i.e., there is $m = ℳ_{ξ}^{- 1} (m^{Y})$ . By using Eq. (15), the n-th row of $\nabla_{ξ^{*}} I_{Y}$ can be written as

\nabla_{ξ^{*}} ℐ_{Y} (m_{n}^{Y}) = \nabla_{ξ^{*}} ℐ_{T} (m_{n}),

and similarly decomposed into

\nabla_{m_{n}} ℐ_{T} \nabla_{m_{n}^{Y}} ℳ_{ξ^{*}}^{- 1} \nabla_{p_{n}^{Y} |_{ξ^{*}}} ℋ \nabla_{p} p^{Y} \nabla_{ξ} p |_{m_{n}}

where:

$\nabla_{m_{n}} ℐ_{T}$ is the gradient of $ℐ_{T}$ at mn.
∇_pp^Y and $\nabla_{ξ} p |_{m_{n}}$ are constant, independent of ξ^∗, the same as in Eq. (23).
$\nabla_{m_{n}^{Y}} ℳ_{ξ^{*}}^{- 1}$ is the inverse of the Jacobian $\nabla_{m_{n}} ℳ_{ξ^{*}} = \nabla_{p_{n}^{Y} |_{ξ *}} ℋ \nabla_{p} p^{Y} \nabla_{m_{n}} p |_{ξ^{*}},$ where $p_{n}^{Y} |_{ξ^{*}}$ , the ideal homogeneous coordinates determined by ξ^∗, can be approximated by $p_{n}^{Y}$ , and $\nabla_{m_{n}} p |_{ξ^{*}}$ , the derivative (3 × 2) of the deformable model at m_n with ξ^∗, for m = [u v]^T, $\nabla_{m_{n}} p |_{ξ^{*}} = [{\frac{\partial T (m)}{\partial u} |}_{u_{n}} ξ^{*} {\frac{\partial T (m)}{\partial v} |}_{u_{n}} ξ^{*}] + \nabla_{m_{n}} d (m),$ can be approximated by letting ξ^* = ξ.

In Eq. (28), ∂T(m)/∂η (with η denoting u or v for convenience) and ∇_md(m) are constant and can be pre-computed for a given ROI. Take the statistical model in Eq. (10) for example:

\frac{\partial T (m)}{\partial η} = [\begin{matrix} \frac{\partial B (m)}{\partial η} R^{- 1} {[U_{Θ^{'}}]}_{J} & 0 \end{matrix}],

\nabla_{m} d (m) = [\begin{matrix} \frac{\partial B (m)}{\partial u} R^{- 1} {\bar{θ}}^{'} & \frac{\partial B (m)}{\partial v} R^{- 1} {\bar{θ}}^{'} \end{matrix}]

where, according to Eq. (3), ∂B(m)/∂η is formed diagonally with

\frac{\partial b (m)}{\partial η} = [\begin{matrix} \frac{φ^{'} (m)}{\partial η} & \underset{for η = u / v}{\underset{︸}{\begin{matrix} 1 / 0 & 0 / 1 \end{matrix}}} \end{matrix}]

Let C₁ and C₂ be two complementary submatrices of the left matrix in Eq. (2), where C₁ is 3 × 3 invertible, corresponding to the three eliminated elements of α, and C₂ corresponds to α′. Then

φ^{'} (m) = φ (m) [\begin{matrix} - C_{1}^{- 1} C_{2} \\ I_{(K - 3)} \end{matrix}]

and the derivatives of the k-th radial basis kernel in φ (m) can be pre-computed by using

[\begin{matrix} \frac{\partial φ (‖ m - c_{k} ‖)}{\partial u} & \frac{\partial φ (‖ m - c_{k} ‖)}{\partial v} \end{matrix}] = (2 \ln ‖ m - c_{k} ‖ + 1) {(m - c_{k})}^{T}

Funding

National Natural Science Foundation of China (61305022); Sichuan Science and Technology Program (2018SZDZX0013, 2017HH0054); State Key Laboratory Virtual Reality Technology and Systems (BUAA-VR-16KF-11).

Acknowledgments

We would like to acknowledge the da Vinci data provided by the Hamlyn Centre, Imperial College London.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. Y. Nakamura, K. Kishi, and H. Kawakami, “Heartbeat synchronization for robotic cardiac surgery,” in Proceedings of IEEE Conference on Robotics and Automation (IEEE, 2001), pp. 2014–2019.

2. T. Ortmaier, M. Gröger, D. H. Boehm, V. Falk, and G. Hirzinger, “Motion estimation in beating heart surgery,” IEEE T. Biomed. Eng. 52(10), 1729–1740 (2005). [CrossRef]

3. M. Hayashibe, N. Suzuki, and Y. Nakamura, “Laser-scan endoscope system for intraoperative geometry acquisition and surgical robot safety management,” Med. Image Anal. 10(4), 509–519 (2006). [CrossRef] [PubMed]

4. N. T. Clancy, D. Stoyanov, L. Maier-Hein, A. Groch, G. Z. Yang, and D. S. Elson, “Spectrally encoded fiber-based structured lighting probe for intraoperative 3D imaging,” Biomed. Opt. Express 2(11), 3119–3128 (2011). [CrossRef] [PubMed]

5. O. Bebek and M. C. Cavusoglu, “Intelligent control algorithms for robotic-assisted beating heart surgery,” IEEE T. Robot. 23(3), 468–480 (2007). [CrossRef]

6. P. Mountney, D. Stoyanov, and G. Z. Yang, “Three-dimensional tissue deformation recovery and tracking,” IEEE Signal Proc. Mag. 27(4), 14–24 (2010). [CrossRef]

7. S. Bernhardt, S. A. Nicolau, V. Agnus, L. Soler, C. Doignon, and J. Marescaux, “Automatic localization of endoscope in intraoperative CT image: A simple approach to augmented reality guidance in laparoscopic surgery,” Med. Image Anal. 30, 130–143 (2016). [CrossRef] [PubMed]

8. N. T. Clancy, D. Stoyanov, D. R. C. James, A. Di Marco, V. Sauvage, J. Clark, G. Z. Yang, and D. S. Elson, “Multispectral image alignment using a three channel endoscope in vivo during minimally invasive surgery,” Biomed. Opt. Express 3(10), 2567–2578 (2012). [CrossRef] [PubMed]

9. K. L. Lurie, R. Angst, E. J. Seibel, J. C. Liao, and A. K. E. Bowden, “Registration of free-hand OCT daughter endoscopy to 3D organ reconstruction,” Biomed. Opt. Express 7(12), 4995–5009 (2016). [CrossRef] [PubMed]

10. J. Zhang, J. Chang, X. Yang, and J. J. Zhang, “Virtual reality surgery simulation: A survey on patient specific solution,” in Next Generation Computer Animation Techniques, J. Chang, J. J. Zhang, N. M. Thalmann, S. M. Hu, R. Tong, and W. Wang, eds. (Springer, 2017), pp. 220–233. [CrossRef]

11. K. L. Lurie, R. Angst, D. V. Zlatev, J. C. Liao, and A. K. E. Bowden, “3D reconstruction of cystoscopy videos for comprehensive bladder records,” Biomed. Opt. Express 8(4), 2106–2123 (2017). [CrossRef] [PubMed]

12. W. Lau, N. A. Ramey, J. J. Corso, N. V. Thakor, and G. D. Hager, “Stereo-based endoscopic tracking of cardiac surface deformation,” in Medical Image Computing and Computer-Assisted Intervention-MICCAI 2004, C. Barillot, D. R. Haynor, and P. Hellier, eds. (Springer, 2004), pp. 494–501. [CrossRef]

13. D. Stoyanov, A. Darzi, and G. Z. Yang, “A practical approach towards accurate dense 3D depth recovery for robotic laparoscopic surgery,” Comput. Aided. Surg. 10(4), 199–208 (2005). [CrossRef]

14. R. Richa, P. Poignet, and C. Liu, “Three-dimensional motion tracking for beating heart surgery using a thin-plate spline deformable model,” Int. J. Robot. Res . 29(2-3), 218–230 (2010). [CrossRef]

15. E. Bogatyrenko, P. Pompey, and U. D. Hanebeck, “Efficient physics-based tracking of heart surface motion for beating heart surgery robotic systems,” Int. J. Comput. Ass. Rad. 6(3), 387–399 (2011).

16. W. K. Wong, B. Yang, C. Liu, and P. Poignet, “A quasi-spherical triangle-based approach for efficient 3-D soft-tissue motion tracking,” IEEE-ASME T. Mech. 18(5), 1472–1484 (2013). [CrossRef]

17. B. Yang, W. K. Wong, C. Liu, and P. Poignet, “3D soft-tissue tracking using spatial-color joint probability distribution and thin-plate spline model,” Pattern Recogn. 47(9), 2962–2973 (2014). [CrossRef]

18. B. Yang, C. Liu, K. L. Huang, and W. F. Zheng, “A triangular radial cubic spline deformation model for efficient 3D beating heart tracking,” Signal Image Video P. 11(7), 1329–1336 (2017). [CrossRef]

19. N. T. Clancy, S. Arya, J. Qi, D. Stoyanov, G. B. Hanna, and D. S. Elson, “Polarised stereo endoscope and narrowband detection for minimal access surgery,” Biomed. Opt. Express 5(12), 4108–4117 (2014). [CrossRef]

20. N. Baka, B. L. Kaptein, M. de. Bruijne, T. van. Walsum, J. E. Giphart, W. J. Niessen, and B. P. F. Lelieveldt, “2D-3D shape reconstruction of the distal femur from stereo X-ray imaging using statistical shape models,” Med. Image Anal. 15(6), 840–850 (2011). [CrossRef] [PubMed]

21. E. M. A. Anas, A. Rasoulian, A. Seitel, K. Darras, D. Wilson, P. S. John, D. Pichora, P. Mousavi, R. Rohling, and P. Abolmaesumi, “Automatic segmentation of wrist bones in CT using a statistical wrist shape + pose model,” IEEE T. Med. Imag. 35(8), 1789–1801 (2016). [CrossRef]

22. X. Albà, M. Pereañez, C. Hoogendoorn, A. J. Swift, J. M. Wild, A. F. Frangi, and K. Lekadir, “An algorithm for the segmentation of highly abnormal hearts using a generic statistical shape model,” IEEE T. Med. Imag. 35(8), 845–859 (2016). [CrossRef]

23. J. Pedrosa, S. Queirós, O. Bernard, J. Engvall, T. Edvardsen, E. Nagel, and J. D’hooge, “Fast and fully automatic left ventricular segmentation and tracking in echocardiography using shape-based B-spline explicit active surfaces,” IEEE T. Med. Imag. 36(11), 2287–2296 (2017). [CrossRef]

24. V. Kajić, M. Esmaeelpour, B. Považay, D. Marshall, P. L. Rosin, and W. Drexler, “Automated choroidal segmentation of 1060 nm OCT in healthy and pathologic eyes using a statistical model,” Biomed. Opt. Express 3(1), 86–103 (2012). [CrossRef]

25. M. Pilch, Y. Wenner, E. Strohmayr, M. Preising, C. Friedburg, E. M. zu Bexten, B. Lorenz, and K. Stieger, “Automated segmentation of retinal blood vessels in spectral domain optical coherence tomography scans,” Biomed. Opt. Express 3(7), 1478–1491 (2012). [CrossRef] [PubMed]

26. B. Yang, C. Liu, W. F. Zheng, and S. Liu, “Motion prediction via online instantaneous frequency estimation for vision-based beating heart tracking,” Information Fusion 35, 58–67 (2017). [CrossRef]

27. Z. Zhang, “A flexible new technique for camera calibration,” IEEE T. Pattern Anal. 22(11), 1330–1334 (2000). [CrossRef]

28. A. I. Aviles-Rivero, S. M. Alsaleh, and A. Casals A, “Sliding to predict: vision-based beating heart motion estimation by modeling temporal interactions,” Int. J. Comput. Ass. Rad. 13(3), 353–361 (2018).

29. S. Benhimane and E. Malis, “Homography-based 2D visual tracking and servoing,” Int. J. Robot. Res. 26(7), 661–676 (2007). [CrossRef]

30. D. Stoyanov, M. V. Scarzanella, P. Pratt, and G. Z. Yang, “Real-time stereo reconstruction in robotically assisted minimally invasive surgery,” in Medical Image Computing and Computer-Assisted Intervention-MICCAI 2010, T. Jiang, N. Navab, J. P. W. Pluim, and M. A. Viergever, eds. (Springs, 2010), pp. 275–282. [CrossRef]

31. P. Pratt, D. Stoyanov, M. V. Scarzanella, and G. Z. Yang, “Dynamic guidance for robotic surgery using image-constrained biomechanical models,” in Medical Image Computing and Computer-Assisted Intervention-MICCAI 2010, T. Jiang, N. Navab, J. P. W. Pluim, and M. A. Viergever, eds. (Springs, 2010), pp. 77–85. [CrossRef]

32. D. Stoyanov, G. P. Mylonas, F. Deligianni, A. Darzi, and G. Z. Yang, “Soft-tissue motion tracking and structure estimation for robotic assisted MIS procedures,” in Medical Image Computing and Computer-Assisted Intervention MICCAI 2010, J. S. Duncan and G. Gerig, eds. (Springs, 2005), pp. 139–146.

Name	Description
Visualization 1	3D Reconstruction result by using the proposed statistical deformable modeling method.on the stereo-endoscopic video (I) that was captured by da Vinci robot.
Visualization 2	3D Reconstruction result by using the proposed statistical deformable modeling method.on the stereo-endoscopic video (II) that was captured by da Vinci robot.
Visualization 3	3D Reconstruction result by using the proposed statistical deformable modeling method.on the stereo-endoscopic video (III) that was captured by da Vinci robot.
Visualization 4	3D Reconstruction result by using the proposed statistical deformable modeling method.on the stereo-endoscopic video (IV) that was captured by da Vinci robot.

Methods	Video I	Video II	Video III	Video IV
SDM	1.26±0.46	1.27±0.62	2.52±0.98	2.40±0.90
QST	1.42±0.51	1.51±0.70	3.09±1.10	2.80±1.09
4/5-CP TPS	1.40±0.52	1.44±0.68	3.03±1.11	2.75±1.01
9-CP TPS	1.21±0.51	1.23±0.64	2.83±1.04	2.59±0.87
9-CP PBM	1.23±0.53	1.29±0.66	2.92±1.09	2.64±0.92
9-CP B-Spline	1.21±0.53	1.27±0.63	2.90±1.10	2.60±0.90

Reconstructing a 3D heart surface with stereo-endoscope by learning eigen-shapes

Abstract

1. Introduction

2. Methods

2.1. Representing 3D shape with TPS

2.2. Low-rank shape modeling based on eigen-shapes

Proposition 1

Proposition 2

2.3. Model-based tracking and reconstruction

3. Results

3.1. Sparsity and statistical stability of shape data

3.2. Reconstruction experiments

3.2.1. Qualitative analyses

3.2.2. Quantitative evaluation

Robustness

Accuracy

Speed

4. Conclusions

Appendix A. Proof of proposition 2

Appendix B. Computations of Jacobians

B.1. Computation of ∇_ξI_L/R

B.2. Computation of $\nabla_{ξ^{*}} I_{L / R}$

Funding

Acknowledgments

Disclosures

References

Supplementary Material (4)

Cited By

Figures (7)

Tables (1)

Equations (33)

Biomedical Optics Express

Abstract

1. Introduction

2. Methods

2.1. Representing 3D shape with TPS

2.2. Low-rank shape modeling based on eigen-shapes

Proposition 1

Proposition 2

2.3. Model-based tracking and reconstruction

3. Results

3.1. Sparsity and statistical stability of shape data

3.2. Reconstruction experiments

3.2.1. Qualitative analyses

3.2.2. Quantitative evaluation

Robustness

Accuracy

Speed

4. Conclusions

Appendix A. Proof of proposition 2

Appendix B. Computations of Jacobians

B.1. Computation of ∇ξIL/R

B.2. Computation of ∇ξ*IL/R

Funding

Acknowledgments

Disclosures

References

Supplementary Material (4)

Cited By

Figures (7)

Tables (1)

Equations (33)

Biomedical Optics Express

B.1. Computation of ∇_ξI_L/R

B.2. Computation of $\nabla_{ξ^{*}} I_{L / R}$