Registration of free-hand OCT daughter endoscopy to 3D organ reconstruction

Kristen L. Lurie; Roland Angst; Eric J. Seibel; Joseph C. Liao; Audrey K. Ellerbee Bowden

doi:10.1364/BOE.7.004995

1. Introduction

As the clinical need for early detection of cancer and other diseases demands better imaging tools, light-based mother-daughter endoscopy systems, where the “mother” white light endoscope (WLE) is complemented by a “daughter” endoscope of a secondary modality placed into its working channel, are becoming more prevalent. Daughter endoscopes from imaging modalities such as spectroscopy [1], endomicroscopy [2], or optical coherence tomography (OCT) [3] provide additional contrast and resolution that permit the detection or classification of cancerous tissue in a breadth of endoscopy applications (e.g., pulmonology, gastroenterology, urology) for which WLE is insufficient on its own. However, because the mother-WLE video and daughter-endoscopy data are recorded and viewed separately, the onus falls to the physician to remember or painstakingly record where the daughter endoscopy data were collected relative to WLE. This need for cognitive fusion hinders many opportunities for advanced analysis: for example to create a comprehensive map of tumor margins from both imaging modalities.

Herein we combine techniques from computer vision and image processing with our design of a forward-viewing daughter endoscope to introduce a new strategy to register regions of daughter-endoscopy images to a 3D image reconstruction of an organ created from WLE data. Two key enabling technologies for the current work include an algorithm we developed to generate 3D reconstructions that capture the shape and appearance of an organ from standard clinical WLE videos [4] and a high-speed, forward-viewing daughter-endoscope [5]. Importantly, the 3D reconstructions can be generated without specialized endoscopy equipment or major deviations of the endoscopy workflow, which is paramount for clinical acceptance. In this manuscript, we extend our 3D reconstruction algorithm to include a complementary method for detecting the pose of the daughter endoscope (i.e., its position and orientation excepting rotation about the cylindrical axis) with respect to the mother WLE. In brief, the relative pose between the daughter endoscope and mother endoscope along with the known geometry of the daughter endoscope dictate the region in the WLE image from which daughter-endoscopy data are collected; we then project this region into 3D using the reconstructed organ model and the global pose of this WLE image. Hence, our reconstruction and registration algorithms enable both mother- and daughter-endoscopy data to be localized to the appropriate anatomical location.

While the mapping of daughter-endoscopy data to WLE images has been explored previously [6–9], prior research has primarily focused on algorithms for re-localizing tissue regions that have been imaged by the daughter endoscope within a single imaging session (i.e., the instrument is not removed from the patient). In contrast, our technique focuses on creating a comprehensive record of the entire imaging session, which may permit tracking of the organ appearance across several imaging sessions. Furthermore, the inability of many existing techniques to generate 3D reconstructions of the tissue [6–8] makes them unsuitable for the current application, while those that can generate 3D reconstructions rely upon a stereoscopic camera system [9] that is not common in endoscopy applications. Moreover, most existing methods represent the position of the daughter endoscope as a single point rather than a region, which inhibits generation of larger fields of view of daughter-endoscope data. These methods also rely on specific endoscope motions or manual input to detect when important daughter-endoscopy data are collected (i.e., when the daughter endoscope is in contact with the tissue), which further hamper their translational potential. Our method avoids these problems by automatically detecting these important daughter-endoscopy data through image analysis.

As an initial application, we demonstrate our technique in the bladder. Early detection of bladder cancer is of significant importance given its high recurrence rate of 50–90% [10] and the high cost burden of bladder cancer surveillance on the health care system. Bladder cancer patients undergo an office-based white light cystoscopy (WLC) at least once a year, but the limited specificity of WLC and its inability to stage tumors can require unnecessary and costly surgeries. Optical coherence tomography (OCT) is a promising complement to WLC due to its ability to image in depth, which allows it to distinguish cancerous from healthy tissue (i.e., based on the number of subsurface layers). While OCT has successfully been used to classify cancerous tissue [3,11–15], the existing workflow only permits on-the-fly tissue classification, as the two imaging modalities are not registered temporally or spatially. OCT data that are co-registered to a 3D reconstruction of the bladder wall may enable complete staging of a tumor or identification of surgical margins, a visualization that could help a surgeon prepare for surgery or track tumor recurrence.

In this manuscript, we describe a novel technique to register OCT-based daughter-endoscopy data to a 3D reconstruction of an organ created from mother data generated with a monoscopic WLC. We further reduced the outer diameter and rigid length of our previously published rapid-scanning, forward-viewing OCT endoscope [5] to permit the technique to be extended to the clinic; the modified endoscope presented here is first such scope capable of being inserted into the working channels of flexible cystoscopes, which are the standard tools used for bladder cancer surveillance in the clinic. Using our OCT endoscope and a commercial WLC, we demonstrate the qualitative and quantitative performance of the registration algorithm with a custom bladder phantom and intraoperative cystoscopy data. We then validate our registration accuracy by directly comparing the appearance of the 3D reconstruction with the co-registered OCT data. This method provides clear evidence that the OCT data are actually registered to the correct location on the 3D reconstruction, and that the method is superior to other techniques that validate the registration using position sensors as proxies [9]. Overall, the technique and validation strategy we present is poised to augment the current standard of care for bladder cancer monitoring and lays the groundwork for a more general approach to registering other mother-daughter endoscopy combinations.

2. System setup

The custom-built, swept-wavelength source (center wavelength 1060 nm) OCT engine described in [5] was modified to include a revised and miniaturized lens assembly for the OCT endoscope Fig. 1. The lens assembly consisted of a 1-mm OD, 2.3-mm length GRIN lens (GoFoton, _W10=SLW10) and 1-mm OD, 2.1-mm length glass rod. The length of the rod was designed so the focus of the endoscope was just below the surface of the tissue when the endoscope was in contact with the tissue. These modifications reduced the diameter and rigid length of the OCT endoscope scope from 3.0 mm and 25 mm, respectively, to 1.3 mm and 19 mm, compared to the previous iteration, and enable the OCT endoscope to fit into a wide range of working cystoscope channels, including those of some flexible cystoscopes. The OCT system had a sensitivity of 94 dB with a depth of penetration of approximately 700 μm, a lateral field of view of a 700-μm diameter circle, an axial resolution of 9.6 μm and a lateral resolution of 10 μm. The lateral resolution expands to 19 μm at a distance 0.5 mm from the focus (Rayleigh range of 290 μm). The OCT volumes contained 7000 A-scans arranged in a spiral pattern [5] and were collected at a volume rate of 12.5 Hz. The WLC images had a resolution of 720 × 1280 pixels and a frame rate of 30 Hz.

We calibrated the OCT scan pattern and WLC to map the lateral position for each A-scan, eliminate distortions and determine the intrinsic camera parameters [5]. Each OCT volume was associated with a given WLC image (i.e., creation of frame pairs) based upon the known OCT and WLC frame rates and a time offset determined by imaging a series of horizontal black (high absorption under OCT) and white (low absorption under OCT) lines translated underneath the OCT-WLC and maximizing the correlation between intensity data within the field of view of the OCT endoscope obtained from both systems.

3. Registration algorithm

The registration algorithm consists of four steps (Fig. 2):

3D model generation: Transform a WLC video (mother data) into a 3D reconstruction that captures the shape and appearance of the bladder [4].
Interest frame identification: Label those frame pairs (comprising an OCT volume (daughter data) and a WLC image acquired simultaneously) that contain a high-quality OCT volume of the bladder wall as interest frame pairs.
OCT footprint detection: Detect positions in the interest images (the WLC image in an interest frame pair) from which the interest volumes (the OCT volume in the interest frame pair) were captured. Denote the projection of the OCT volume onto the 2D image plane of the interest image as the OCT footprint.
OCT footprint projection: Project the footprint onto the 3D bladder reconstruction to find the OCT projection (3D position of the footprint), given the known camera poses associated with the WLC images and the position of the 2D OCT footprint.

A more detailed description follows. All steps were combined into a complete, automated pipeline using a combination of Matlab, Python, and C++.

Fig. 1 WLC and OCT system setup showing optical and electronic system design. The inset shows a cross-section of the distal end of the OCT endoscope. DAQ: data acquisition device, GRIN: graded index lens, PC: polarization controller, PZT: piezo-electric transducer.

Download Full Size | PDF

Fig. 2 Registration pipeline overview comprising inputs and outputs of the four main steps (black boxes) of the algorithm.

Download Full Size | PDF

3.1. 3D model generation (Step A)

The aim of this step is to generate a 3D reconstruction of the bladder, based exclusively on the WLC data, to which the OCT data can later be co-registered. Briefly, we first preprocess images from a WLC video to (1) calibrate the camera to determine intrinsic camera parameters and remove distortions endemic to endoscopic optics, and (2) adjust the color to enable robust feature extraction and minimize lighting artifacts. These processed images are inserted into a structure-from-motion (SfM) step based on a state-of-the-art sequential SfM pipeline [16]. Images selected for further processing are denoted as keyframes. Keyframes are selectively matched with other keyframes on the basis of feature descriptors extracted from interest points (positions of image features to be matched) common to the two keyframes. The result of the SfM step is an initial sparse point cloud, containing 3D points that represent the surface of the bladder, and a set of poses (P_j) that represent the position and orientation of the cystoscope corresponding with each keyframe. The surface of the bladder is represented in a standard way using a triangle mesh [17], after which the bladder appearance is finalized by using a texture-reconstruction algorithm [18] to assign, blend, and overlay selected image patches from the keyframes onto the bladder mesh. The bladder is assumed to be rigid during imaging, which is a valid assumption since the bladder is distended with fluid during imaging.

3.2. Interest frame pair identification (Step B)

Although OCT volumes are captured continuously, only the volumes where the OCT endoscope is placed near the bladder wall are likely to contain meaningful data. The aim of this step is to identify interest frame pairs, or frame pairs in which the OCT data are meaningful. Two key aspects of the imaging protocol facilitate interest frame pair identification: (1) deployment of the OCT endoscope into the working channel of the cystoscope makes it visible in the WLC images, and (2) the shallow depth of focus associated with OCT imaging limits the OCT data channel to produce high-signal-to-noise-ratio (SNR) data only when the tissue is nearly in focus. (High OCT backscatter intensity and thus SNR occurs when the OCT endoscope is nearly in contact with the bladder wall.) Hence, interest frame pairs are uniquely characterized as those having both (1) a visible OCT endoscope in the WLC image and (2) a high-SNR signal in the OCT volume.

We first identify WLC images that contain a visible OCT endoscope using a “blue-by-red” image calculated as I_B/R(i, j) = min(I_B(i, j)/I_R(i, j), 1), where image I is 2×2 matrix with the intensity of pixel (i,j) defined as I(i, j). Here we exploit the contrasting bluish color of the OCT endoscope with the reddish appearance of the bladder. From this image we generate an initial binary mask, $M_{scope}^{init}$ , described by

M_{scope}^{init} (i, j) = {\begin{array}{l} 1 & I_{B / R} > threshold \\ 0 & otherwise \end{array},

where mask M is 720×1280 matrix with the value of pixel (i,j) defined as M(i, j). A typical value for the threshold is 1.4. The mask

M_{scope}^{init}

is eroded to remove noise and anomalous regions (e.g., saturated areas due to debris) and dilated to produce the final binary-endoscope mask: M_scope. If the sum of the pixel intensities in M_scope lies above a minimum threshold (e.g., 2% of image area), the image is classified as endoscope-present.

For images classified as endoscope-present, we next evaluate whether the OCT image has a high-SNR signal. Voxels in an in-focus OCT volume will have an intensity that is considerably larger than the intensity of a background volume taken when there is no sample in front of the endoscope. A volume is classified as high-intensity when its average exceeds a fixed threshold (typically 30 when OCT volumes are stored as the log-magnitude of the intensity at 8-bits). Interest frame pairs thus comprise an endoscope-present WLC image and a high-SNR OCT volume; only these frame pairs are considered for further processing.

3.3. OCT footprint detection (Step C)

The aim of this step is to localize the en face projection of the OCT volume, which we denote as the OCT footprint, within the associated WLC image. The footprint equivalently denotes the location on the bladder wall from where the OCT data were collected and is derived by determining the transformation between the WLC and OCT coordinate systems. The localization of the en face projection of the OCT volume within the WLC image amounts to determining the rigid transformation between the coordinate systems of the OCT and WLC coordinate system (Fig. 3). Because the diameter of the OCT endoscope is narrower than the working channel of the WLC, the transformation between coordinate system is not constrained and thus can be described by three degrees of rotational and three degrees of translational freedom.

Fig. 3 (a) Side view and (b) bottom view depicting the relationship between OCT shaft, WLC shaft and OCT footprint and their respective coordinate systems (c) Appearance of OCT endoscope in WLC image with important features indicated including shaft lines l₁ and l₂ and regions r_i in which the shaft lines split the plane. Although the shaft edges are parallel in 3D space, the shaft lines intersect in the WLC image due to the perspective projection of the WLC.

Download Full Size | PDF

We make the following assumptions about the OCT endoscope to facilitate footprint detection: (1) it is visible in the WLC image, (2) it is in contact with the tissue, and (3) it has a cylindrical shape. These assumptions are reasonable because (1) our definition of an interest frame pair requires that the endoscope be visible in the WLC image, (2) the endoscope is designed to produce in-focus images only when in contact (or just nearly in contact) with the tissue, and (3) the endoscope is designed to have a cylindrical shape. We also assume that the volume imaged by the OCT endoscope is cylindrical and concentrically located with the OCT endoscope. These are reasonable assumptions because the OCT endoscope scan pattern is programmed to be nearly circular and the OCT endoscope system is designed such that the optics, scan pattern, and housing are all approximately concentric. Finally, our OCT footprint detection relies on the blue color of the OCT endoscope. While a limitation of this step, other tool-segmentation algorithms [9, 19] that rely on other color statistics or expected tool motion can be applied to endoscopes with a wider range of appearances.

3.3.1. Representation of the endoscope with single-view geometry

We use the following convention to define the WLC and OCT coordinate systems: the origin of the OCT coordinate system is centered at the distal end of the OCT endoscope shaft. The z-axis points from the distal to the proximal end of its cylindrical shaft (Fig. 3). The WLC is modeled as a pinhole camera with intrinsic matrix K ∈ ℝ^3×3, which describes how a 3D point (p_i) is projected into the image plane. The intrinsic matrix is determined during calibration in Sec. 2 and dictates the origin and orientation of the WLC coordinate system. The transformation between WLC and OCT coordinate systems can be represented as T_OCT→WL = [R t] ∈ ℝ^3×4 where R ∈ ℝ^3×3 is a rotation matrix and t = [t_x t_y t_z] ∈ ℝ^3×1 is a translation vector. This transformation matrix allows the OCT footprint to be represented as 2D points in the WLC image plane. Specifically,

p_{WL} = {KT}_{OCT \to WL} p_{OCT},

where we follow principles from projective geometry in which equality is understood as equal up an unknown scalar multiple. Although the OCT footprint is defined in 2D, p_OCT is defined in 3D; the 2D footprint positions are all located in the plane z = 0.

To describe the cylindrical appearance of the endoscope in the WLC images, we note that a cylinder is a quadric surface and that the image of a quadric in the image plane of a pinhole camera is a conic [20]. Given our coordinate convention, the (infinite) cylinder is described with the point-quadric

Q_{OCT} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & - r_{scope}^{2} \end{matrix}],

with r_scope defined as the radius of the cylinder. In the WLC coordinate system, the quadric is then given by

Q_{WL} = T_{OCT \to WL}^{T} Q_{OCT} T_{OCT \to WL} = [\begin{matrix} Q_{3 \times 3} & q \\ q^{T} & q_{4 \times 4} \end{matrix}],

where Q_3×3 ∈ ℝ^3×3, q ∈ ℝ^3×1, and q_4,4 ∈ ℝ are used to represent the quadratic Q in a block-matrix form. The projection of Q_WL into the undistorted WLC image is a conic given by C_rend = q_4,4Q_3×3 − qq^T [20], where a conic C ∈ ℝ^3×3. In addition to this representation, the conic can also be measured from the given WLC image. Specifically, the conic is determined by the apparent contour of the endoscope, which is spanned by the two straight lines l₁and l₂ (shaft lines) that outline the shaft edges (Fig. 3), which represent the exterior of the OCT endoscope that runs parallel to the z_OCT -axis. (3(c)). Algebraically, the conic in the undistorted WLC image is

C_{obj} = K^{T} (l_{1} l_{2}^{T} + l_{2} l_{1}^{T}) K

. Note that in the world coordinate system, the shaft lines are parallel and thus intersect at infinity. Hence, the projection of this intersection point into the WLC image is known as vanishing point. We use these two representations for the conic (C_obj and C_rend) to solve for the coordinate system transformation (T_OCT→WL) for a single interest frame pair. Note that this method cannot determine rotation of the OCT endoscope about the z_OCT -axis due to the rotational symmetry of the cylindrical shaft of the device.

3.3.2. Detection of the apparent contours of the OCT endoscope

The aim of this step is to extract the shaft lines that outline the contour of the endoscope. First, applying Canny edge detection to I_B/R produces a binary image, I_edge, where each each pixel is labeled as an edge or not an edge based upon the strength of the gradient at that pixel and the presence of edges in surrounding pixels. Edges that lie outside of the binary endoscope mask M_scope, which represents an area that is slightly dilated compared with the true endoscope pose, are set to zero. This step thus removes edges that are not associated with the endoscope shaft. Then, the shaft lines are identified using a Hough transform. Briefly, each edge pixel (a pixel for which I_edge(i, j) = 1) lies on a subset of all possible lines. The Hough transform determines lines on which a significant number of edge pixels lie via a voting scheme. Lines with a significant number of votes are selected as the putative shaft lines. If the Hough transform detects spurious lines (i.e., more than two), the two lines selected as shaft lines are those that intersect at a position that is closest to the intersection point of the two lines found in the previous interest image $(v^{n - 1}) : {argmin}_{i, j} {‖ (l_{i}^{n} \times l_{j}^{n}) - v^{n - 1} ‖}_{2}$ . This makes sense because the endoscope pose remains nearly stationary between sequential interest images; hence, the position of the vanishing point is roughly constant. If no intersection point has been previously determined, the two line segments that are farthest apart are chosen as the shaft lines. The shaft lines are denoted as l₁ and l₂, and the shape traced by these two lines is denoted the “apparent contour” of the OCT endoscope.

3.3.3. Extraction of the rotation matrix and x- and y- translations

To compute the rotation matrix R, we first consider that the back projections of the two shaft lines are tangent planes to the endoscope whose contact points lie on two 3D lines that are parallel to the endoscope axis. Crucially, the intersection of the two shaft lines in the WLC image provides the vanishing point corresponding to the endoscope axis. Algebraically, the position of the vanishing point can be written as v = l₁×l₂ = KRa, where we choose vector a =[0 0 1]^T to be the direction of the endoscope axis in the OCT coordinate system [20]. Hence, given the vanishing point v and intrinsic matrix K (Sec. 22), the rotation matrix can easily be recovered. We first determine the third column of the rotation matrix: $r_{3} = \frac{K^{- 1} v}{| K^{- 1} v |}$ . As the first two columns of the matrix must be orthogonal complements to the third, the complete rotation matrix can be determined (up to a rotation about the z-axis): $R = \frac{[{(r_{3})}_{⊥}, r_{3}]}{| [{(r_{3})}_{⊥}, r_{3}] |}$ , where |A| denotes the determinant of a matrix, and |A|_⊥ denotes the orthogonal complement of a matrix (i.e., A^T [A]_⊥ = 0 and ${[A]}_{⊥}^{T} {[A]}_{⊥} = I$ ).

Next, we determine the translation along the x- and y-axes between the two coordinate systems (i.e., t_x and t_y, respectively). We solve for the two translation parameters by comparing the observed conic, C_obj, with the rendered conic, C_rend, given the coordinate system transformation T_OCT→WL in the WLC image plane. Here we set t_z = 0 (i.e., t = t_x t_y 0]), which yields two translation parameters without loss of generality (the translation along the endoscope axis will be determined in the last step using the detected tip of the endoscope). Note that when the translation values for t_x and t_y are correct, the rendered and observed conics should be equal up to a scale factor, s ∈ R: P(t_x, t_y, s) = C_rend − sC_obj = 0. Thus, P(t_x, t_y, s) is a polynomial system of equations parametrized by s, t_x, and t_y; after some algebraic manipulation it can be rewritten as a $B {[\begin{matrix} t_{x}^{2} & t_{x} t_{y} & t_{y}^{2} & s & 1 \end{matrix}]}^{T} = 0_{6 \times 1}$ . A solution for s, t_x, and t_y can be extracted from the unique null vector of the known system matrix B. In actuality, four solutions for the pair (t_x, t_y) (two values for each variable) can be extracted. When the OCT endoscope is in front of the cystoscope, each of the four solutions corresponds to the appearance of the OCT endoscope in one of the four regions of the WLC imaging plane defined by the shaft lines (Fig. 3(c)). The final (t_x, t_y) values are selected by finding the pair that projects the endoscope into the same quadrant in which the observed endoscope lies using M_scope.

3.3.4. Extraction of the endoscope tip and the z-axis translation

The final step is to determine the translation along the z-axis between the OCT and WLC coordinate systems. Our general strategy is to find the z-axis translation (consistent with the computed R, t_x, and t_y) that best captures the appearance of the endoscope in the WLC image. We follow an analysis-by-synthesis approach that consists of two steps: (synthesis) rendering the appearance of the OCT endoscope in the WLC image given an assumed z-axis translation and (analysis) comparing the rendered image appearance with the original WLC image. Using the coordinate system transformation (T_OCT→WL), we render a binary image of the endoscope I_z, where only pixels that correspond to locations on the endoscope are set to one.

We compare the rendered image to I_B/R at a region near the rendered endoscope tip using the metric $f (z) = \frac{1}{N} \sum_{x, y} [I_{z} (x, y) - I_{B / R} (x, y)]$ , where N is the number of pixels in the evaluation region. Only pixels near the tip are evaluated, because the position of the tip is most critical and there may be pixels outside of this region that bias the comparison. The “correct” z-axis position (i.e., where f(z) is minimized) should occur where the two images I_z and I_B/R are best matched, because both images have pixels equal or close to one on the endoscope and lower values outside the endoscope. This analysis-by-synthesis is fairly efficient since it reduces the problem to a 1D search problem along the z-axis.

3.4. OCT footprint projection (Step D)

To determine the 3D position of the OCT footprints, points contained within the 2D OCT footprint are projected onto the 3D mesh by casting a ray from each 2D point through the camera center of the given WLC image, whose pose with respect to the 3D mesh was determined in Step A. The intersections of these rays with the 3D mesh provide the OCT footprint in 3D. One limitation with this approach occurs when the camera position for an interest frame pair is unknown, as can happen when a WLC image is not used in SfM step in Step A. This may also occur if there were insufficient features in the WLC image that made the image inadequate to include in the 3D reconstruction, or if there was significant deformation in the WLC image compared with nearby images. Both of these situations are possible for WLC images collected during OCT data acquisition, because bringing the cystoscope close to the tissue reduces the number of features in the image; meanwhile, placing the OCT endoscope in contact with the tissue may induce deformation. Although deformations are ignored in the present implementation, we assume the amount of deformation is negligible within the WLC field of view, as the physician can manually control the amount of deformation with visual feedback from live WLC video.

To handle the case where there is no calculated WLC pose associated with an interest image, the OCT footprint is registered to the 3D mesh by chaining together two transformations, T_WL(i)→RI T_{OCT(i)→WL(i)}:

p_{RI} = {KT}_{WL (i) \to RI (i)} T_{OCT (i) \to WL (i)} p_{OCT} .

The transformation T_WL(i)→RI is computed by registering the interest image to a nearby image (a “registration image,” RI), whose WLC pose is known. We assume that the images capture a relatively planar region of the bladder, an assumption that is justified because we require the physician to scan the bladder close to the mucosa wall. Each interest image is initially registered to the RI by matching SIFT features between the two and determining an affine transformation T_{WL(i)→RI(i)} that describes their relationship. If an insufficient number of feature matches are found, the OCT footprint is not registered to the mesh.

Using the 3D footprints, a second mesh (OCT overlay mesh) is created containing the areas imaged with OCT. The OCT footprint can either be displayed in a solid color or by using the OCT en face images as the texture. Using the original mesh and OCT overlay mesh, the regions imaged with OCT can be visualized relative to the bladder anatomy and compared with the appearance of the bladder under WLC.

4. Evaluation and results

4.1. Samples

We validated our technique using in vivo samples collected during intraoperative cystoscopy and an optical phantom. The intraoperative cystoscopy data provide realistic imaging conditions to validate our algorithm qualitatively, while the optical phantom data provide ground-truth data in a more controlled imaging environment, which allows for quantitative and more convincing qualitative evaluation of our algorithm.

In vivo data were collected during an intraoperative cystoscopy in patients scheduled to undergo bladder biopsy or tumor resection. Data were collected from consenting patients undergoing endoscopic procedures in the operating room as part of their standard of care. This protocol was approved by the Stanford Institutional Review Board and the Veterans Affairs Palo Alto Health Care System. For a proof-of-concept validation, a mock probe (a 2-mm, blue ureteral catheter (Cook Medical)) was inserted into the working channel of a standard 21-Fr rigid cystoscope to mimic the placement of the OCT endoscope in images, as OCT data were not collected in vivo.

Phantom data were collected in a laboratory setting with the combined OCT and WLC systems. The phantom consisted of a 3D-printed, three-inch inner-diameter semi-cylinder with a length of four inches onto which a high-resolution bladder image was color-printed and affixed to the interior. The high-resolution bladder image was constructed from the texture of a reconstructed human bladder (Sec. 33.1). Small dark brown circles were added to the bladder image to provide a marker that was clearly distinguishable in both WLC and OCT data: the brown circles are clearly visible in the WLC images, and the increased absorption of the brown regions in the spectrum of the OCT light source compared with the rest of the image provided a simple confirmation of imaging with the OCT endoscope. The increased absorption of the dark brown circles (containing black printer ink) compared to the rest of the image (containing none or limited black ink) was tested by comparing OCT images of test regions (e.g., a black square printed on paper). The OCT endoscope was kept fixed and the paper containing the test regions were passed underneath the endoscope to ensure there were no false conclusions made on the basis of varying OCT endoscope orientation with respect to the test regions. To match the appearance of the probe between in vivo and phantom imaging conditions, a small segment of the blue catheter was placed over the housing of the OCT endoscope during phantom imaging.

4.2. Evaluation of OCT footprint detection

Using the phantom, we evaluated the OCT footprint detection algorithm (Sec. 33.3). The OCT endoscope was deployed from the working channel of the WLC cystoscope and was protruded to various distances (i.e., 4.5 mm – 13.5 mm in increments of 1 mm in the z_OCT direction). At each distance, the phantom was translated underneath the cystoscope so that different regions of the phantom were imaged while the OCT endoscope remained stationary relative to the WLC. Three hundred images were collected for each distance; the relative transformation between the WLC and OCT coordinate systems was computed and the OCT footprint was determined according to Eq. 1 (Fig. 4(a–b)). Using these data, the detected OCT footprint was assessed (1) by comparing the measurements of the footprint radius and center position with expected values based on a projective camera model and (2) by comparing the variation in the measurement when the OCT endoscope was kept fixed relative to the WLC.

Fig. 4 Radii and center points of OCT endoscope as a function of the distance the OCT endoscope protrudes from the end of the WLC (“protrusion distance,” d): (a) Representative WLC image with shaft lines and OCT footprints. (b) Overlay of footprints on WLC image mask. Trends and data for (c) footprint radius and (d) center point in pixels. Error bars and ellipses show ±1σ from mean. Standard deviation for (e) footprint radius and (f) center position in μm from trend.

Download Full Size | PDF

Assuming the OCT endoscope is protruded in a consistent direction for all distances, the radius should be inversely proportional to the distance between the tip of the OCT endoscope and the center of the WLC camera: that is, r = α(d + d₀)⁻¹, where r is the radius in pixels, α is a proportionality constant in units of pixels per mm, d is the protrusion distance and d₀ is the distance between the WLC tip and the camera center. The measured average radii for each protrusion distance were fit with this equation by solving for α and d₀. The measured data fit the expected radii trend well (R²= 0.984): as the OCT endoscope protrudes further from the WLC, the radius of the footprint decreases in size (Fig. 4(c)). We similarly assessed the trajectory of the center point of the OCT footprint as the OCT endoscope was protruded from the working channel (Fig. 4(d)). From the projective camera equation (Eq. 2), it can be shown that the change in x and y coordinates of the center position during protrusion should have a linear relationship, provided the direction of the protrusions is consistent. The measured center points were fit with a line (R²=0.978) to highlight the linear relationship between the measured data. Some of the points tend to oscillate about this line, which we believe is partially due to the experimental setup: the protrusion of the OCT endoscope exceeded its rigid length and, as the diameter of the OCT endoscope was much smaller than the working channel, the protrusion direction varied slightly for different protrusion distances.

In addition to comparing the measured data with the expected trend, we also evaluated the precision of the measurements. Precision was evaluated by computing the standard deviation of a footprint parameter (e.g., radius) for a given protrusion distance. Because the sample was translated underneath the endoscope during the measurement, these results also attest to the robustness of the measurement to noise and imaging conditions. The precision is shown as error bars as a function of pixels in Fig. 4(c–d) and as a function of μm in Fig. 4(e–f). The standard deviation for the radius was 19.92 ± 8.18 μm (2.64 ± 0.92 pixels). For the 2D center points, the standard deviation representing motion around the average center point for each protrusion distance was 84.87 ± 51.96 μm (10.60 ± 5.30 pixels). The values were converted from pixels (Fig. 4(c–d)) to μm (Fig. 4(e–f)) using the known OCT volume radius in μm (350 μm) and the measured radius in pixels as a conversion factor.

In general, larger protrusion distances lead to larger errors. This result is consistent with the observation that for larger protrusion distances, the number of pixels that represent the diameter of the OCT footprint is much smaller (78 pixels for d=4.5 mm vs. 30 pixels for d=13.5 mm). However, for all measurements the precision is below 200 μm – well below the precision required for the identification and standard endoscopic resection of mucosal tumors – which suggests that our algorithm works well for this application.

4.3. Qualitative evaluation of OCT registration

Using both in vivo and phantom samples, we demonstrated the complete registration algorithm. In vivo, we recorded data for 6:40 min with 368 interest frame pairs, which required 167:33 min of processing time with unoptimized code. Due to existing restrictions on the use of our OCT endoscope in humans, only WLC data were recorded in vivo, and a standard blue catheter served as the daughter endoscope (i.e., a “dummy OCT endoscope) to mimic the procedure of collecting OCT data. Although we could detect the appearance of the dummy endoscope in the WLC images, there was no actual OCT data collected, which prevented us from using the high-SNR of OCT images as a criterion to identify interest frame pairs according to Step B of the algorithm. Instead, we selected every fourth frame for which the OCT endoscope was visible in the WLC image as an interest frame pair. WL data were collected over the entire surface of the bladder and the OCT endoscope was deployed and at a few discrete locations of the bladder to mimic how a physician could utilize the OCT endoscope during a combined WL-OCT cystoscopy. Figure 5 shows a complete reconstruction of the human bladder sample overlaid with positions where the “dummy” OCT volumes were collected. Due to a limited number of features seen when the WLC is brought close to the bladder wall, only 24% of the OCT footprints of the interest frame pairs could be registered to the 3D reconstruction. In the future, improved tracking of the WLC pose in Step A (e.g., through a simultaneous localization and maximization (SLAM) algorithm [21]) could help to recover a larger fraction of OCT footprint positions. For the footprints that could be registered, the similar vasculature pattern surrounding the registered dummy volumes and the original WLC images indicate that the volumes are registered in the correct position. However, the position of the OCT footprint is slightly misaligned and varies in size. For example, the bifurcation seen with arrow 5 (A5) in Fig. 5(b) is not visible in Fig. 5(c). This discrepancy is due to the fact that there was no noticeable contact made between OCT endoscope and urothelium during this preliminary in vivo testing, and due to the lack of actual OCT data we had no way to ensure contact between the OCT endoscope and bladder wall. Nonetheless, the locations of OCT footprints in the 3D reconstruction appear consistent with the original corresponding WLC images.

Fig. 5 Example reconstruction and registration for in vivo bladder: (a) full reconstruction with registered OCT volumes (green) (b) zoomed in region (yellow box), and (c) original WLC images that correspond to two interest frame pairs. Color differences between reconstruction and original images are due to image preprocessing (Step A) that reduces lighting gradients. The yellow box in (a–b) represents an area of approximately 1 cm². Arrows indicate similarities between reconstructed texture and original images.

Download Full Size | PDF

To further validate of our algorithm, we imaged a tissue-mimicking phantom to obtain data for which we could be certain the OCT endoscope was in contact with the sample. We collected both WLC and OCT data for a total video length of 3:37 min comprising 437 interest frame pairs. The data were processed according to the steps outlined in Sec. 3, and the computations required 96:38 min with unoptimized code. Figure 6 shows the 3D reconstruction with the en face OCT images overlaid with a false green-scale colormap. The example interest frame pairs show that when the OCT endoscope images a brown circle (#1), there is increased loss of signal at shallow depths compared with when the endoscope images a non-brown circle (#2) (Fig. 6 (c)). This result is expected due to the higher absorption of the brown ink compared with the red ink. This observation affects the en face images that are projected onto the 3D reconstruction: the appearance of the en face image is notably darker when the OCT image is registered to a region within the brown circles (indicated by blue dashed lines). We extended the comparison by classifying both the OCT image and the corresponding registered WL region as imaging or not imaging a brown circle. We classified the OCT image as viewing a brown circle if the average en face intensity was greater than a threshold of 100 and classified the registered WL region as imaging a brown circle if the center of the enface image corresponded with a brown pixel in the circle. The classification between WL and OCT data was 93.6% accurate, and the only misalignment came from a region where the texture was poorly reconstructed. These errors were likely due to inaccuracies of the camera pose (Step A), and could be improved from an additional step that refines the camera poses. This agreement between the 3D reconstruction and the OCT data suggests that the OCT volumes are registered accurately to the 3D reconstruction. Additionally, this method of validation enables a direct comparison between the registration of the two imaging modalities and does not require a separate positioning system for validation as in [9].

Fig. 6 Example reconstruction and registration for phantom bladder : (a) full reconstruction overlaid with registered OCT volumes shown as the enface projections (green), (b) zoomed in region of complete reconstruction, and (c) example interest frame pairs from a tissue region [1] and brown-circle region [2]. In (b), the white circles in the OCT en face images correspond to the OCT B-scans shown in (c). The color differences between reconstruction and original images are due to an image-preprocessing algorithm (Step A), which causes the brown circles to appear pink. To emphasize the brown circles, they are outlined using a blue dotted line. The yellow box in (a–b) represents an area of 6.6 × 6.1 mm². The blue boxes in (c) represents an area of 100 μm². Arrows indicate similar vasculature between reconstructed texture and original WLC images.

Download Full Size | PDF

5. Conclusion

We developed a registration technique that enables localization of volumetric OCT data to a 3D reconstruction of the bladder obtained with a standard cystoscope and a miniature daughter endoscope. The main novelty of our work is a method for detecting the arbitrary pose (position and orientation) of a OCT daughter endoscope and then using this information to register the OCT volume to a 3D bladder reconstruction. In this regard, our work outlines a general strategy for co-registration of mother-daughter endoscopes of different secondary modalities. Additionally, we present the first forward-viewing OCT endoscope small enough to be inserted into the working channel of flexible cystoscopes.

Our strategy for automated detection of OCT frames expedites and simplifies the process of determining viable OCT volumes to analyze compared to other approaches that require manual input or make assumptions about the motion of the endoscope during imaging [6,9]. Moreover, our strategy to detect the OCT footprint has the capability to visualize the entire region that has been imaged with OCT, in contrast with the single-point localization commonly seen in other approaches. In the future, the algorithm can be extended to enable generation of larger FOV panels of OCT volume data (e.g., mosaics) [22] by improving our algorithm to detect the z-axis rotation between the WLC and OCT coordinate systems (e.g., through discernible markings to the OCT endoscope) and to decrease the localization error of the OCT footprint (e.g., through use of a Kalman filter, which would use the transformation T_OCT→WL of previous frames to help to inform the transformation in subsequent frames). Additional improvements to the algorithm may include real-time reconstruction and registration.

Future work will also involve more substantial in vivo validation using an in vivo OCT endoscope in conjunction with a WLC. Based upon our preliminary in vivo experiment presented in Sec. 4.3 and our prior work [4], we observed some bladders do not contain the high vascular contrast over the entire bladder surface. Ultimately, the lack of vascular contrast limits the number of features that are available for image registration, and mostly impacts the ability to develop a robust initial 3D reconstruction of the bladder. New emerging strategies for improving the quality of the WL image data, such the use of narrow-band imaging, are likely to help. Nevertheless, this current limitation may necessitate an alternative strategy for registration of OCT footprints to the 3D reconstruction of the bladder. Such an alternative strategy should allow for better localization of the endoscope position within the bladder for each frame, and may involve computation of an image-based depth map [23,24], which will provide a surface-based feature for co-registration. An alternative approach could include hardware modifications [25,26] to determine the position of the WLC independent of the WLC image of the bladder. The improved registration of individual OCT volumes to one another as discussed in the previous paragraph will also lessen the dependency on the WLC to OCT co-registration procedure.

Additionally, because we used a rigid phantom in our experiments and did not have OCT data in the in vivo experiment, we did not consider tissue deformation in our algorithm. We anticipate the deformation caused by the OCT endoscope in contact with the bladder will need to be addressed for both 3D reconstruction and co-registration steps.

Our registration algorithm can provide a powerful post-procedural review tool in applications where white light endoscopy is complemented with a secondary imaging modality (e.g., confocal laser endoscopy where daughter-endoscopes are already available for clinical use). In the case of OCT-WLC for bladder cancer, this technique could be useful for reviewing patient history prior to cystoscopic surveillance examinations, surgical planning, or longitudinal tracking of suspicious lesions to improve early detection rates.

Acknowledgments

We would like to thank R. Johnston, D. Melville, A.A. Gurjarpadhye, T. Carver and R.G. Lord for help with the development of the OCT-SFE system, M.T. Davenport and T.J. Metzner for assistance with the human data collection. KLL was supported by NSFGRFP and NDSEG fellowships and NSF IIP-1602118. RA was supported by the Max Planck Center for Visual Computing and Communication.

References and links

1. R. M. Cothren, R. Richards-Kortum, and M. V. Sivak, and E. al, “Gastrointestinal tissue diagnosis by laser-induced fluorescence spectroscopy at endoscopy,” Gastrointest. Endosc. 36, 105–111 (1990). [CrossRef] [PubMed]

2. G. A. Sonn, S. N. E. Jones, T. V. Tarin, C. B. Du, K. E. Mach, K. C. Jensen, and J. C. Liao, “Optical biopsy of human bladder neoplasia with in vivo confocal laser endomicroscopy,” J Urol 182, 1299–1305 (2009). [CrossRef] [PubMed]

3. S. P. Lerner, A. C. Goh, N. J. Tresser, and S. S. Shen, “Optical coherence tomography as an adjunct to white light cystoscopy for intravesical real-time imaging and staging of bladder cancer,” Urology 72, 133–137 (2008). [CrossRef] [PubMed]

4. K. L. Lurie, R. Angst, D. Z. Zlatev, J. C. Liao, and A. K. Bowden, “3D reconstruction and co-registration of endoscopic video sequences for longitudinal studies,” (in rev).

5. K. L. Lurie, A. A. Gurjarpadhye, E. J. Seibel, and A. K. Ellerbee, “Rapid scanning catheterscope for expanded forward-view volumetric imaging with optical coherence tomography,” Opt. Lett. 40, 3165–3168 (2015). [CrossRef] [PubMed]

6. B. Allain, M. Hu, L. B. Lovat, R. J. Cook, T. Vercauteren, S. Ourselin, and D. J. Hawkes, “Re-localisation of a biopsy site in endoscopic images and characterisation of its uncertainty,” Med. Image Anal. 16, 482–496 (2012). [CrossRef]

7. S. Atasoy, B. Glocker, S. Giannarou, D. Mateus, A. Meining, G.-Z. Yang, and N. Navab, “Probabilistic region matching in narrow-band endoscopy for targeted optical biopsy,” Med. Image Comput. Comput. Interv. 5761, 499–506 (2009).

8. M. Ye, E. Johns, S. Giannarou, and G. Yang, “Online scene association for endoscopic navigation,” Med. Image Comput. Comput. Interv. 8674, 316–323 (2014).

9. P. Mountney, S. Giannarou, D. Elson, and G.-Z. Yang, “Optical biopsy mapping for minimally invasive cancer screening,” Med. Image Comput. Comput. Assist. Interv. 12, 483–490 (2009). [PubMed]

10. P. Clark, N. Agarwal, and M. C. Biagioli, and E. al, “Clinical Practice Guidelines in Oncology,” J. Natl. Compr. Canc. Netw. 11, 446–475 (2013). [PubMed]

11. H. Ren, W. C. Waltzer, and R. Bhalla, and E. al, “Diagnosis of bladder cancer with microelectromechanical systems-based cystoscopic optical coherence tomography,” Urology 74, 1351–1357 (2009). [CrossRef] [PubMed]

12. C. A. Lingley-Papadopoulos, M. H. Loew, M. J. Manyak, and J. M. Zara, “Computer recognition of cancer in the urinary bladder using optical coherence tomography and texture analysis,” J. Biomed. Opt. 13, 024003 (2008). [CrossRef] [PubMed]

13. E. Sanchez, A. Goh, S. Soni, and S. Lerner, “Optical coherence tomography (OCT) as an adjunct to conventional cystoscopy and pathology for non-invasive endoscopic staging of bladder tumors,” Urology 78, 2011 (2011). [CrossRef]

14. E. V. Zagaynova, O. S. Streltsova, and N. D. Gladkova, and E. al, “In vivo optical coherence tomography feasibility for bladder disease,” J. Urol. 167, 1492–1496 (2002). [CrossRef] [PubMed]

15. J. Schmidbauer, M. Remzi, T. Klatte, M. Waldert, J. Mauermann, M. Susani, and M. Marberger, “Fluorescence cystoscopy with high-resolution optical coherence tomography imaging as an adjunct reduces false-positive findings in the diagnosis of urothelial carcinoma of the bladder,” Eur. Urol. 56, 914–919 (2009). [CrossRef] [PubMed]

16. C. Zach and M. Pollefeys, “Practical methods for convex multi-view reconstruction,” Lect. Notes Comput. Sci. 6314, 354–367 (2010). [CrossRef]

17. M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface reconstruction,” Symp. Geom. Process 7, 61–70 (2006).

18. M. Waechter, N. Moehrle, and M. Goesele, “Let There Be Color! Large-Scale Texturing of 3D Reconstructions,” in “Proc ECCV,” (2014), pp. 836–850.

19. C. Doignon, P. Graebling, and M. De Mathelin, “Real-time segmentation of surgical instruments inside the abdominal cavity using a joint hue saturation color feature,” Real-Time Imaging 11, 429–442 (2005). [CrossRef]

20. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision (Cambridge University Press, 2000).

21. H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping,” IEEE Robot Autom. Mag. 13, 99–116 (2006). [CrossRef]

22. K. L. Lurie, G. T. Smith, S. A. Khan, J. C. Liao, and A. K. Ellerbee, “Three-dimensional, distendable bladder phantom for optical coherence tomography and white light cystoscopy,” J. Biomed. Opt. 19, 036009 (2014). [CrossRef]

23. C. Q. Forster and C. Tozzi, “Towards 3D reconstruction of endoscope images using shape from shading,” SIBGRAPI pp. 90–96 (2000).

24. R. Zhang, P.-s. Tsai, J. E. Cryer, and M. Shah, “Shape from Shading : A Survey,” Rev. Lit. Arts Am. 21, 1–41 (1999).

25. M. Agenant, H.-J. Noordmans, W. Koomen, and J. L. H. R. Bosch, “Real-time bladder lesion registration and navigation: a phantom study,” PLOS ONE 8, e54348 (2013). [CrossRef] [PubMed]

26. J. Penne, K. Höller, M. Stürmer, T. Schrauder, A. Schneider, R. Engelbrecht, H. Feussner, B. Schmauss, and J. Hornegger, “Time-of-Flight 3-D endoscopy,” Med. Image Comput. Comput. Assist. Interv. 12, 467–474 (2009). [PubMed]

Registration of free-hand OCT daughter endoscopy to 3D organ reconstruction

Abstract

1. Introduction

2. System setup

3. Registration algorithm

3.1. 3D model generation (Step A)

3.2. Interest frame pair identification (Step B)

3.3. OCT footprint detection (Step C)

3.3.1. Representation of the endoscope with single-view geometry

3.3.2. Detection of the apparent contours of the OCT endoscope

3.3.3. Extraction of the rotation matrix and x- and y- translations

3.3.4. Extraction of the endoscope tip and the z-axis translation

3.4. OCT footprint projection (Step D)

4. Evaluation and results

4.1. Samples

4.2. Evaluation of OCT footprint detection

4.3. Qualitative evaluation of OCT registration

5. Conclusion

Acknowledgments

References and links

Cited By

Figures (6)

Equations (5)

Biomedical Optics Express