Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Automated segmentation of dermal fillers in OCT images of mice using convolutional neural networks

Open Access Open Access

Abstract

We present a system for automatic determination of the intradermal volume of hydrogels based on optical coherence tomography (OCT) and deep learning. Volumetric image data was acquired using a custom-built OCT prototype that employs an akinetic swept laser at ~1310 nm with a bandwidth of 87 nm, providing an axial resolution of ~6.5 μm in tissue. Three-dimensional data sets of a 10×10 mm skin patch comprising the intradermal filler and the surrounding tissue were acquired. A convolutional neural network using a u-net-like architecture was trained from slices of 100 OCT volume data sets where the dermal filler volume was manually annotated. Using six-fold cross-validation, a mean accuracy of 0.9938 and a Jaccard similarity coefficient of 0.879 were achieved.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Soft tissue augmentation describes the process of injecting dermal filler material into the skin. Most commercially available products consist of hyaluronic acid, a natural polymer which can be stabilized by chemical crosslinking [1]. While the most common application follows from the desire to reduce classic signs of aging, other fields of application include treatment of scars in accident victims and reducing the social stigma of HIV patients suffering from lipoatrophy. Dermal fillers provide only temporary enhancement, as they will be gradually absorbed by the surrounding tissue. A major aim in development of new filler formulations is therefore the evaluation of the residence time in tissue.

Optical coherence tomography (OCT) is a contactless imaging modality that can capture high resolution three-dimensional images of material specimens or tissue samples [2]. Analogous to ultrasound, images are acquired by measuring the magnitude and echo time delay of back-reflected or back-scattered light. OCT systems are able to achieve axial and lateral resolutions of a few micrometers. OCT was first demonstrated ex vivo in 1991 [3], while the first in vivo measurements were performed two years later on the posterior pole of the eye [4,5]. Today OCT is used in different medical fields, most often still in ophthalmology.

A promising field of application for OCT is dermatology, where it has two distinct virtues: While traditional dermatoscopy provides only a high-resolution en face view of the skin surface, OCT offers cross-sectional imaging revealing tissue morphology down to a depth of a few millimeters. Additionally, compared to the gold standard histology, its non-invasive character allows repeated applications to monitor progress over time without the need of taking tissue samples.

Medical imaging approaches facilitated a number of advances in medical care. For new modalities like OCT to become established in clinical routine, they need to provide a clear benefit in diagnosis and treatment of medical conditions. In some cases, this benefit might consist of presenting additional, previously not accessible information, e.g. displaying blood vessels by OCT-mediated angiography. In other cases, a new modality might enable precise measurements helping to quantitatively assess distinct medical conditions. For those applications computer-based analysis is usually a prerequisite. The classical approach for implementing the respective analysis software is to develop custom algorithms that capture relevant features. A segmentation algorithm, for example, might employ region growing, i.e. comparing pixels to their neighbors and clustering those that satisfy a given similarity criterion. A recent alternative method for segmentation tasks or pattern recognition often yielding superior results is machine learning with artificial neural networks. While early research on neural networks dates back to the 1950s [6], vital progress in computational capabilities has only been made in recent years. In 2012, Krizhevsky’s paramount success in improving image recognition by using a deep convolutional neural network [7] certainly can be regarded as a breakthrough, which led to convolutional neural networks even surpassing human-level performance at the ILSVRC (ImageNet Large Scale Visual Recognition Challenge), an annual competition evaluating algorithms that perform object detection and image classification at large scale, only three years later [8].

By now, neural networks have been successfully applied to various problems like speech recognition [9], autonomous car navigation [10], lip-reading [11] and playing chess [12]. Regarding medical imaging, neural networks have been used for making diagnoses, e.g. of age-related macular degeneration from retinal fundus images [13], and for disease prognosis, e.g. lung cancer prognosis from pathology slides [14]. Sometimes neural networks find completely new correlations, for example predicting age, gender, systolic blood pressure and risk of a cardiovascular event only from retinal fundus images [15]. Regarding segmentation tasks, the autoencoder-like u-net architecture [16] delivered impressive performance. In OCT, convolutional neural networks have been employed for segmentation of retinal layer boundaries [17,18], intraretinal fluid [19], subretinal fluid [20], retinal pigment epithelium detachment [21] and coronary lumen in intravascular OCT [22].

In the current paper, a custom-built OCT system for dermal applications able to non-invasively visualize soft tissue fillers is presented. Furthermore, a convolutional neural network capable of automatically segmenting the filler in OCT images with nearly the accuracy of a manual segmentation is introduced. Together, this enables repeated quantification of the volumes of intradermal filler depots and thus the determination of the absorption rate of the used filler material.

2. Materials and methods

2.1. OCT system

Intradermal volumes of soft tissue fillers were assessed using a custom-built OCT prototype. A schematic representation of the system is given in Fig. 1. The setup is based on an akinetic swept laser (Insight Photonic Solutions Inc., Lafayette, Colorado) with a central wavelength of ∼ 1310 nm. The laser produces a flat optical spectrum with a bandwidth of 87 nm, which results in a theoretical axial resolution of ∼ 9 μm in tissue. The fiber-based system uses optical circulators (FOC−/12N−/111−/9/125−/SSS−/1260/1360−/40−/3A3A3A−/1−/1−/WB, OZ Optics Ltd., Ottawa, Ontario, Canada) to separate the two light propagation directions, i.e. the light coming from the laser and thelight reflected or back-scattered from the sample or the reference mirror. On the input side, an optic fiber coupler with a splitting ratio of 75:25 (TW1300R3A2, Thorlabs GmbH, Dachau, Germany) splits the light from the laser into two paths for sample and reference arm, with the sample arm receiving 75% of the power. In the sample arm, a fiber collimator with a focal length of 10 mm (60FC-4-M10-03, Schäfter + Kirchhoff GmbH, Hamburg, Germany) produces a free space beam that is then scanned across the sample using two galvanometer mirrors (6210HSM40, Cambridge Technology, Novanta Europe GmbH, Planegg, Germany) and focused on the sample by a f= 36 mm scan lens (Thorlabs GmbH). The lateral resolution was measured as ∼ 22 μm. The reference arm comprises a fiber collimator, a glass block compensating the dispersion introduced by the scan lens, a variable neutral density filter and a mirror. Polarization controllers are integrated into both sample and reference arms. Two circulators are used to redirect the light originating from sample and reference arm to a second fiber coupler with 50:50 splitting ratio (TW1300R5A2, Thorlabs GmbH) where the light back-scattered and back-reflected from sample and reference are superimposed. A dual balanced photodetector (BPD-1, Insight Photonic Solutions Inc.) converts the interference signal into an analog electric signal, which is then digitized using an ATS-9360 digitizer card (Alazar Technologies Inc., Pointe-Claire, Quebec, Canada). An FPGA (field programmable gate array, PCIe-7841R, National Instruments GmbH, Salzburg-Bergheim, Austria) is used to control the galvo mirrors. The synchronization between laser source, digitizer card and galvo mirrors is described in detail in [23]. The custom-built OCT setup is able to achieve a sensitivity of >110 dB for a radiant power of 33 mW on the sample, which is well below the limit specified by the International Electrotechnical Commission [24].

 figure: Fig. 1

Fig. 1 Schematic representation of the optical coherence tomography system.

Download Full Size | PDF

The dermal OCT system was employed for repeated determination of the volume of intradermal fillers and for evaluation of their degradation rates. The exemplary image data presented below originates from a comparative study of different filler formulations in BALB/c mice. The study has been approved by the ethics committee of the Medical University of Vienna, Austria and was performed fully compliant with Austrian legislation.

2.2. Image acquisition

OCT acquisition was performed over a square skin patch of 10×10 mm. Each volumetric data set comprises 1536 OCT images recorded at 512 different positions, with each OCT image consisting of 512 A-scans. A-scans were calculated from spectra that contain approximately 1630 useful data points. To reduce speckle noise, the three cross-sectional images taken at each distinct position were averaged. Thereafter, the OCT images were resized in axial direction to yield isometric voxel spacing. The final data sets had a size of 512×512×304 voxels.

2.3. Convolutional neural network architecture

A convolutional neural network was trained to automatically segment the dermal filler structure. Its architecture is based on u-net published by Ronneberger in 2015 [16] and designed similar to an auto-encoder, but with additional skip connections (see Fig. 2). The network consists of a contracting path and an expanding path, which together form a Ü" shape. The contracting path contains four downsampling steps, each of which is made up of two 3×3 convolutions with batch normalization and rectified linear units (ReLU) followed by a 2×2 maximum pooling layer that cuts the resolution in half. With every downsampling step, the number of feature channels is doubled. The expanding path mirrors the contracting path, with the maximum pooling layers replaced by upsampling layers followed by 2×2 convolutions which cut the number of feature channels in half. Before again applying two 3×3 convolutions, the current feature maps are concatenated with those from the corresponding downsampling step, thus providing additional high-resolution information through these skip connections. With every upsampling step, the number of feature channels is cut in half. The combination of the main path providing high-level context and the four skip connections retaining local features provides excellent segmentation results.

 figure: Fig. 2

Fig. 2 U-net-like architecture of the neural network. The contracting path forms the left side of the Ü" and the expanding path the right side. The horizontal arrows are the skip connections. The numbers h×w×f denote the shape of the layer with h the height and w the width of the image and f the number of feature channels.

Download Full Size | PDF

A few modifications were implemented as compared to the originally published method [16]: Firstly, a batch normalization layer was introduced before every rectified linear unit (ReLU) activation layer. Batch normalization was first presented by Ioffe and Szegedy [25] at approximately the same time as u-net and is now widely used in convolutional neural networks. The aim of this technique is to keep the distributions of the inputs of each layer fixed. Without batch normalization, the means and variances of the layers’ inputs would change with every training iteration, requiring the weights of the layers to be continuously adapted. Batch normalization improves training stability and allows faster learning.

Secondly, the number of model parameters [26] was reduced: For the application presented here, a reduction of the number of feature channels from 641282565121024 to 3264128256512 had no influence on segmentation accuracy, but led—due to the smaller model size—to a reduction in training and prediction times.

Thirdly, image padding was used for all convolutions. This approach introduces additional pixels around the edges of the image to allow the convolutions to use the neighboring pixels even for pixels at the edge of the image. Ronneberger’s [16] use of unpadded convolutions was because he had to split his large input images into smaller tiles anyway, so overlapping the tiles to avoid padding was a natural choice. The images in the current analysis are small enough to be processed in full at once, thus making padded convolutions necessary for the prediction to cover the whole image area.

Fourthly, since the network was only trained for two classes (selected structure and background), the sigmoid function was used instead of soft-max in the last layer of the network.

The convolutional neural network was implemented using Keras library [27] with TensorFlow backend [28].

2.4. Training objective

In the analyzed dataset, the area to be segmented covered only approximately 11% of the total image area. Class-imbalance like this may result in a network that only learns the more common class and ignores the minority class, resulting in the current case in a prediction of only background pixels. To avoid this effect, Ronneberger introduced pixel weighting into his cross-entropy loss function, with the weights depending on both the class frequency and the distance to the nearest structure border. Through the use of a different loss function that only depended on the sizes of the ground truth and the predicted structures but not the total image size, we were able to forego the creation of weighting maps.

Our loss function is a differentiable version of Jaccard distance. Differentiability is required for calculating the gradient, which is used to adjust the weights of the neural network during training. Jaccard similarity coefficient of two sets A and B is defined as intersection over union:

J(A,B)=|AB||AB|=|AB||A|+|B||AB|

The Jaccard distance is obtained by subtracting the Jaccard coefficient from 1:

dJ(A,B)=1J(A,B)=1|AB||A|+|B||AB|

In order to be applied in a loss function, it needs to be made differentiable:

L=1xl(x)p(x)xl(x)+xp(x)xl(x)p(x)+ϵ

Here, xΩ with Ω2 denotes the pixel position, l:Ω{0,1} is the true label of each pixel and p(x):Ω[0,1] is the prediction of the network. In order to avoid division by zero, a small number ϵ>0 is added to the denominator.

We used stochastic gradient descent with a very small learning rate of 105, a high Nesterov momentum of 0.999 and a batch size of eight.

2.5. Training data and image augmentation

To acquire training data for the neural network, 100 volumetric OCT data sets recorded from 67 different dermal filler depots in 24 mice were manually annotated. Because of the rather long acquisition time of approximately 14 seconds, motion artifacts appeared in the images due to the breathing of the mice. To minimize these artifacts, all slices of the volume image were registered using enhanced correlation coefficient maximation [29] implemented in OpenCV [30]. The registration algorithm was restricted to only rigid transformations, i.e. translation and rotation, in order to preserve the size of the filler. Figure 3 gives an example of a severe case of motion artifacts together with the corrected image.

 figure: Fig. 3

Fig. 3 (a) Example of strong motion artifacts. Image shows a cut through the volume in the slow scanning direction of the OCT system. (b) Same image after motion correction.

Download Full Size | PDF

Dermal fillers were manually annotated every 8 th frame and interpolated in between using morphological closing, i.e. dilation followed by erosion. The neural network was trained on 2D slices of the volumetric data set in two orthogonal directions. Only images with a certain minimum filler size (5000 pixels, i.e. ∼ 14% of the maximum filler size) were used for training and 2D validation, because manual annotation was not pixel perfect and slices that cut the filler depot close to the edge of the three-dimensional structure tended to have annotations of lower quality (particularly in the direction orthogonal to the original annotation planes). In most images, the dermal filler stretches over roughly half of the horizontal image plane in both lateral directions, resulting in on average nearly 2 × 250 annotated slices per volume image and 45509 annotated slices in total.

Data augmentation, a process that generates new samples from existing training data, was performed in order to provide more variability than the training set originally encompassed. Using extensive image augmentation, Ronneberger [16] was able to train his model from only 30 annotated transmission electron microscopy images of Drosophila neuronal structures. While our training data set was not as limited in size, many of the images from intradermal fillers are very similar, as they were recorded adjacent to each other. The main difference in adjacent images is in the speckle noise pattern. Image augmentation allowed to train the network with a larger variance of shapes. We used Augmentor [31] to implement three kinds of transformations: Mirroring, random distortion using a 10 × 6 displacement grid to increase shape variability, and random translation to avoid training the network to only recognize fillers near the center of the image area. Figure 4 presents several examples of image augmentation.

 figure: Fig. 4

Fig. 4 Examples for image augmentation. Manual filler annotation in green. (a) Original image before augmentation. (b–d) Three examples for the same image with a mixture of random deformation, translation and occasional mirroring.

Download Full Size | PDF

We applied six-fold cross validation and split the 45509 annotated 2D images into six subsets of 6714 to 8036 images. Six networks were trained on alternating five of the six subsets of the data, using the remaining sixth for validation. Great care was taken to put all images of the same mouse into the same subset, even when recorded at different time points, so that network performance would always be validated only with images from different mice than those used for training. The results are presented as the averages over the six trained models.

3. Results

We manually annotated 100 volumetric data sets for training and validation and applied six-fold cross-validation. The network was trained on 2D slices extracted from the OCT volume data sets in both fast and slow scan direction, excluding slices that showed no or only a very small filler area. Training was carried out for 62500 iterations with a batch size of eight, i.e. 500000 images were shown to the network during training. Because of the strong regularization introduced by image augmentation, we observed hardly any overfitting. Training a model from scratch took approximately six hours on an Nvidia Geforce GTX 1080 Ti graphics card. Prediction of the intradermal filler on a single cross-sectional image required approximately 15 ms, resulting in a total processing time of approximately 8 s for one three-dimensional data set.

We calculated six different measures from the segmentations predicted by the neural networks: The accuracy, which is widely used for assessment of segmentation tasks, describes the number of correctly predicted voxels. For our data set with a large share of background pixels, facilitating correct prediction of the majority of the pixels, accuracy will usually give numbers close to one, with only small differences between good and bad segmentations. Two alternatives that are more sensitive to the quality of the segmentation are the Dice coefficient (also known as F1 score) and the Jaccard similarity coefficient (also known as intersection over union). Both are independent of the size of the correctly predicted background. Furthermore, sensitivity, i.e. the proportion of correctly predicted structure voxels, and specificity, i.e. the proportion of correctly predicted background voxels, are given. Finally, since the main focus was on the quantification of the volume of the dermal filler, the ratio of predicted to manually annotated volume was calculated.

The results of this analysis are summarized in Table 1 for three different kinds of evaluations. Firstly, we evaluated the networks on the same kind of images as used for training, i.e. 2D slices with a filler size above the threshold of 5000 pixels. Secondly, we present results for full 3D data sets including filler sizes below 5000 pixels, with the network applied on the different slices without any post-processing. As application of the network on slices in fast and slow OCT scan direction produced very similar metrics, the values given here are the averages over all slices in both directions. Thirdly, we applied the same networks in both directions independently and averaged the two predicted probabilities, which further improved the segmentation quality.

Tables Icon

Table 1. Six-fold cross-validated results. Data is presented for 2D slices with fillers of a minimum size as used for training, 3D (one direction) full volumes with the network only applied in one direction and 3D (two directions) full volumes with the network applied in two directions independently and the averages over the predicted probabilities used in order to reduce false positives. The last column Human presents intra-operator variability between two human annotators. Numbers in brackets are the standard deviations calculated for 100 volumes (neural networks) or 10 volumes (human), respectively. TP number of true positives, TN number of true negatives, FP number of false positives, FN number of false negatives. Key results discussed in the text are highlighted in bold.

Considering the results for the 2D slices, all measures were calculated once for each of the six folds of cross-validation and then averaged to avoid outliers skewing the results. With respect to the results for the 3D volumes, all measures were calculated for each of the 100 volumes individually instead of once per fold. Averaging over the 100 volumes instead of the six folds enabled us to also calculate the standard deviations as a measure of statistical spread.

In order to compare the neural network results with human performance, a training data subset of 10 volumes was independently annotated by two equally experienced operators. The result of this intra-operator comparison is also given in Table 1.

Simple prediction of the neural network in one direction revealed a cross-validated accuracy of 0.9927 and a Jaccard similarity coefficient of 0.860. Using the technique of averaging two orthogonal predictions of the same networks, we were able to improve accuracy to 0.9938 and the Jaccard similarity coefficient to 0.879. The deviation between predicted and manually annotated filler volumes was not significant. For comparison, intra-operator comparison revealed an accuracy of 0.9959, a Jaccard similarity coefficient of 0.926 and a volume difference that was also not significant.

To visualize network performance, comparisons between manual and automatic dermal filler segmentations are given in Figures 57. Figure 5 shows five typical examples. In the first one, image (a), manual annotation was performed in orthogonal direction and thus resulted in an uneven surface line – in this case, the automatic result is superior to the manual one. For images (b) and (c), the neural network again provides satisfactory results. Example (d) shows a slightly too large filler area on the bottom left of the automatic segmentation, where the posterior border of the depot provided very little contrast compared to the artifacts in the OCT image. In image (e), the neural network is slightly distracted by a pattern within the dermal filler depot at one position of the posterior boundary, but still provides very accurate results for the rest of the boundary.

 figure: Fig. 5

Fig. 5 Typical cross-sectional images with annotated filler. (a1)–(e1) OCT image. (a2)–(e2) Manual annotation. (a3)–(e3) Automatic segmentation by the neural network without any post-processing.

Download Full Size | PDF

 figure: Fig. 6

Fig. 6 Examples of insufficient automatic segmentation. (a1)–(c1) OCT image. (a2)–(c2) Manual annotation. (a3)–(c3) Automatic segmentation by the neural network without any post-processing. Images (a3) and (b3) give examples where the neural network did not correctly assess the posterior border of the filler area. Image (c3) correctly delineates the filler area, but also contains false positives.

Download Full Size | PDF

 figure: Fig. 7

Fig. 7 Worst-case examples of incorrect annotation of small filler areas. As images with a small filler area are mostly cut near the edge of the three-dimensional filler volume, the not-pixel-perfect manual segmentation shows the most frequent errors here. Therefore, images with a small filler area below a chosen threshold were excluded from training and 2D validation and included only in the final 3D validation set. (a1)–(b1) OCT image. (a2)–(b2) Manual annotation. (a3)–(b3) Automatic segmentation by the neural network without any post-processing.

Download Full Size | PDF

Figure 6 gives three examples where the segmentation using the neural network exhibits larger errors: In images (a) and (b), the neural network is not able to correctly delineate the full posterior boundary and thus fails to capture some of the filler area. In image (c), the neural network successfully recognizes the filler, but also shows small false positive regions at the bottom of the image. For both kinds of mistakes, our technique of dual application of the neural network in two orthogonal directions will very likely eliminate these inaccuracies.

For completeness, Fig. 7 also gives two examples of images with a very small filler area. Even though images of this filler size were not used for training as they more commonly exhibited an unsatisfactory manual annotation quality, the neural network is still able to correctly recognize the boundaries of these small fillers (Fig. 7(a3) and 7(b3)).

4. Discussion and conclusion

We built a custom OCT system able to visualize intra-dermally injected soft-tissue fillers. We then trained a convolutional neural network to segment the filler depots in OCT images recorded by our system.

Due to the fact that OCT uses infrared light for detection, the images have a limited depth range of 1–2 mm and the signal-to-noise ratio changes with depth. The penetration depth is limited by two effects, both of which depend on the used wavelength: While scattering of light in biological tissue decreases with increasing wavelength, water absorption roughly increases with the wavelength for the near infrared range [32]. The 1300 nm range—where our OCT system operates—provides a comparatively high penetration depth [33]. When compared to other imaging technologies like ultrasound or X-ray, OCT images provide a lower contrast and contain speckle noise that cannot be easily avoided. On the other hand, since our system uses an akinetic swept light source, it lacks the typical sensitivity roll-off of spectrometer-based OCT systems.

A custom-written segmentation algorithm for OCT images needs to explicitly deal with the limitations of OCT images. On the contrary, a machine learning approach has the advantage that the artificial neural network automatically learns to handle the characteristics of the images in the training data set. Thus the main requirement for a machine learning approach is a sufficiently large training data set that adequately represents the whole range of possible images. To meet this requirement, we manually annotated the dermal filler depots in 100 OCT volume data sets and applied image augmentation using random deformation, translation and mirroring. As training is performed by the computer without human intervention, a complicated set of rules in order to adapt to the wide range of imaging features appearing in in vivo images might be learned. An equally elaborate custom-written algorithm would be very difficult to realize and, if feasible at all, would take much longer to implement than a machine learning approach producing results of comparable quality.

Our convolutional neural network uses a modified u-net architecture and works with 2D slices of the recorded volumetric data sets. A three-dimensional extension of u-net has been demonstrated to work with confocal microscopy images of Xenopus kidney [34], but to keep the model size viable, the network had to work on downsampled versions of the original images and process the volumes in multiple small tiles instead of the full image at once. In contrast, two-dimensional neural networks can work with images of significantly higher resolution, but run the risk of misclassification due to the loss of context in the third dimension. As our segmentation task does not have a preferred direction, we were able to partially mitigate this disadvantage with the following technique: We trained our convolutional neural network on 2D images in two orthogonal directions and also performed predictions in both directions independently. The final probability was then calculated as the average over the two orthogonal predictions. Using this technique, we were able to reduce both false positives and false negatives and improve segmentation quality.

In the results given in Table 1, sensitivity is the proportion of correctly predicted structure voxels [truepos./(truepos.+falseneg.)] and specificity the proportion of correctly predicted background voxels [trueneg./(trueneg.+falsepos.)]. While the numbers given for sensitivity are significantly lower than those for specificity, the only reason for this dissimilarity is the substantial difference between the number of filler and background pixels. In fact, the number of false positives is nearly equal to the number of false negatives.

Our data set originates from a pre-clinical study performed on dermal fillers in mice. Application in humans should also be possible provided that the fillers are injected at a depth within the penetration limit of the OCT probe beam. As the network architecture itself is not specifically optimized for dermal fillers, use of the method on other dermal structures such as cysts could be viable.

Manual annotation is a time-consuming process. On average, it took 20 minutes to manually segment the filler in one three-dimensional OCT data set. Application of the trained network for the same task takes approximately 16 seconds per volume. For a study involving 100 depots monitored over 10 time points to determine the absorption rate of the filler material, manual segmentation would be a full-time endeavor taking a few months. In addition, manual segmentation is too time-consuming to be applied clinically. In contrast, neural network processing is fast enough that it could be directly integrated into the image acquisition workflow and provide results in nearly real-time.

Our approach of combining OCT with automated segmentation using a convolutional neural network to repeatedly quantify the residual volume of intradermal filler depots and thus determine the absorption rate of the applied filler material has significant potential for future studies and clinical applications in this field.

Funding

Christian Doppler Research Association; Austrian Federal Ministry for Digital and Economic Affairs; National Foundation for Research, Technology and Development; Croma-Pharma GmbH, 2100 Leobendorf, Austria, as industrial partner of the Christian Doppler Laboratory for Ocular and Dermal Effects of Thiomers.

Acknowledgments

Different formulations of dermal fillers were kindly provided by Croma-Pharma GmbH. We thank Christine Hohenadl for critically reviewing the manuscript.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. J. L. Herrmann, R. K. Hoffmann, C. E. Ward, J. M. Schulman, and R. C. Grekin, “Biochemistry, Physiology, and Tissue Interactions of Contemporary Biodegradable Injectable Dermal Fillers,” Dermatol. Surg. p. 1 (2018).

2. W. Drexler and J. G. Fujimoto, eds., Optical Coherence Tomography: Technology and Applications (SpringerInternational Publishing, 2015), 2nd ed. [CrossRef]  

3. D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito, and A. Et, “Optical coherence tomography,” Science 254, 1178–1181 (1991). [CrossRef]   [PubMed]  

4. E. A. Swanson, J. A. Izatt, M. R. Hee, D. Huang, C. P. Lin, J. S. Schuman, C. A. Puliafito, and J. G. Fujimoto, “In vivo retinal imaging by optical coherence tomography,” Opt. Lett. 18, 1864–1866 (1993). [CrossRef]   [PubMed]  

5. A. F. Fercher, C. K. Hitzenberger, W. Drexler, G. Kamp, and H. Sattmann, “In Vivo Optical Coherence Tomography,” Am. J. Ophthalmol. 116, 113–114 (1993). [CrossRef]   [PubMed]  

6. F. Rosenblatt, “The perceptron, a perceiving and recognizing automaton,” Tech. Report 85-460-1, Cornell Aeronautical Laboratory, Inc. (1957).

7. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Advances in Neural Information Processing Systems, vol. 25 (2012), pp. 1090–1098.

8. K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” arXiv:1502.01852 [cs] (2015).

9. G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, B. Kingsbury, and T. Sainath, “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Process. Mag. 29, 82–97 (2012). [CrossRef]  

10. M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to End Learning for Self-Driving Cars,” arXiv:1604.07316 [cs] (2016).

11. Y. M. Assael, B. Shillingford, S. Whiteson, and N. de Freitas, “LipNet: End-to-End Sentence-level Lipreading,” arXiv:1611.01599 [cs] (2016).

12. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” arXiv:1712.01815 [cs] (2017).

13. P. M. Burlina, N. Joshi, M. Pekala, K. D. Pacheco, D. E. Freund, and N. M. Bressler, “Automated Grading of Age-Related Macular Degeneration From Color Fundus Images Using Deep Convolutional Neural Networks,” JAMA Ophthalmol. 135, 1170 (2017). [CrossRef]   [PubMed]  

14. K.-H. Yu, C. Zhang, G. J. Berry, R. B. Altman, C. Ré, D. L. Rubin, and M. Snyder, “Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features,” Nat. Commun. 7, 12474 (2016). [CrossRef]   [PubMed]  

15. R. Poplin, A. V. Varadarajan, K. Blumer, Y. Liu, M. V. McConnell, G. S. Corrado, L. Peng, and D. R. Webster, “Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning,” Nat. Biomed. Eng. 2, 158–164 (2018). [CrossRef]  

16. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, eds. (Springer International Publishing, 2015), pp. 234–241. [CrossRef]  

17. L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu, “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search,” Biomed. Opt. Express 8, 2732–2744 (2017). [CrossRef]   [PubMed]  

18. A. Shah, L. Zhou, M. D. Abrámoff, and X. Wu, “Multiple surface segmentation using convolution neural nets: Application to retinal layer segmentation in OCT images,” Biomed. Opt. Express 9, 4509–4526 (2018). [CrossRef]  

19. C. S. Lee, A. J. Tyring, N. P. Deruyter, Y. Wu, A. Rokem, and A. Y. Lee, “Deep-learning based, automated segmentation of macular edema in optical coherence tomography,” Biomed. Opt. Express 8, 3440–3448 (2017). [CrossRef]   [PubMed]  

20. T. Schlegl, S. M. Waldstein, H. Bogunovic, F. Endstraßer, A. Sadeghipour, A.-M. Philip, D. Podkowinski, B. S. Gerendas, G. Langs, and U. Schmidt-Erfurth, “Fully Automated Detection and Quantification of Macular Fluid in OCT Using Deep Learning,” Ophthalmology 125, 549–558 (2018). [CrossRef]  

21. Y. Xu, K. Yan, J. Kim, X. Wang, C. Li, L. Su, S. Yu, X. Xu, and D. D. Feng, “Dual-stage deep learning framework for pigment epithelium detachment segmentation in polypoidal choroidal vasculopathy,” Biomed. Opt. Express 8, 4061–4076 (2017). [CrossRef]   [PubMed]  

22. Y. L. Yong, L. K. Tan, R. A. McLaughlin, K. H. Chee, and Y. M. Liew, “Linear-regression convolutional neural network for fully automated coronary lumen segmentation in intravascular optical coherence tomography,” J. Biomed. Opt. 22, 126005 (2017). [CrossRef]  

23. Z. Chen, M. Liu, M. Minneman, L. Ginner, E. Hoover, H. Sattmann, M. Bonesi, W. Drexler, and R. A. Leitgeb, “Phase-stable swept source OCT angiography in human skin using an akinetic source,” Biomed. Opt. Express 7, 3032 (2016). [CrossRef]   [PubMed]  

24. Safety of laser products - Part 1: Equipment classification and requirements (IEC 60825–1:2014),” Standard, International Electrotechnical Commission (2014).

25. S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167 [cs] (2015).

26. M. Thieme and T. Reina, “Biomedical Image Segmentation with U-Net,” https://ai.intel.com/biomedical-image-segmentation-u-net/ (2018).

27. F. Chollet, et al., “Keras,” https://keras.io (2015).

28. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems,” https://tensorflow.org (2015).

29. G. Evangelidis and E. Psarakis, “Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization,” IEEE Transactions on Pattern Analysis Mach. Intell. 30, 1858–1865 (2008). [CrossRef]  

30. G. Bradski, “The OpenCV Library,” Dr. Dobb’s J. Softw. Tools (2000).

31. M. D. Bloice, C. Stocker, and A. Holzinger, “Augmentor: An Image Augmentation Library for Machine Learning,” arXiv:1708.04680 [cs, stat] (2017).

32. S. L. Jacques, “Optical properties of biological tissues: A review,” Phys. Medicine Biol. 58, R37–R61 (2013). [CrossRef]  

33. A. Aneesh, B. Považay, B. Hofer, S. V. Popov, C. Glittenberg, S. Binder, and W. Drexler, “Multispectral in vivo three-dimensional optical coherence tomography of human skin,” J. Biomed. Opt. 15, 026025 (2010). [CrossRef]  

34. Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: Learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2016), pp. 424–432.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1
Fig. 1 Schematic representation of the optical coherence tomography system.
Fig. 2
Fig. 2 U-net-like architecture of the neural network. The contracting path forms the left side of the Ü" and the expanding path the right side. The horizontal arrows are the skip connections. The numbers h × w × f denote the shape of the layer with h the height and w the width of the image and f the number of feature channels.
Fig. 3
Fig. 3 (a) Example of strong motion artifacts. Image shows a cut through the volume in the slow scanning direction of the OCT system. (b) Same image after motion correction.
Fig. 4
Fig. 4 Examples for image augmentation. Manual filler annotation in green. (a) Original image before augmentation. (b–d) Three examples for the same image with a mixture of random deformation, translation and occasional mirroring.
Fig. 5
Fig. 5 Typical cross-sectional images with annotated filler. (a1)–(e1) OCT image. (a2)–(e2) Manual annotation. (a3)–(e3) Automatic segmentation by the neural network without any post-processing.
Fig. 6
Fig. 6 Examples of insufficient automatic segmentation. (a1)–(c1) OCT image. (a2)–(c2) Manual annotation. (a3)–(c3) Automatic segmentation by the neural network without any post-processing. Images (a3) and (b3) give examples where the neural network did not correctly assess the posterior border of the filler area. Image (c3) correctly delineates the filler area, but also contains false positives.
Fig. 7
Fig. 7 Worst-case examples of incorrect annotation of small filler areas. As images with a small filler area are mostly cut near the edge of the three-dimensional filler volume, the not-pixel-perfect manual segmentation shows the most frequent errors here. Therefore, images with a small filler area below a chosen threshold were excluded from training and 2D validation and included only in the final 3D validation set. (a1)–(b1) OCT image. (a2)–(b2) Manual annotation. (a3)–(b3) Automatic segmentation by the neural network without any post-processing.

Tables (1)

Tables Icon

Table 1 Six-fold cross-validated results. Data is presented for 2D slices with fillers of a minimum size as used for training, 3D (one direction) full volumes with the network only applied in one direction and 3D (two directions) full volumes with the network applied in two directions independently and the averages over the predicted probabilities used in order to reduce false positives. The last column Human presents intra-operator variability between two human annotators. Numbers in brackets are the standard deviations calculated for 100 volumes (neural networks) or 10 volumes (human), respectively. TP number of true positives, TN number of true negatives, FP number of false positives, FN number of false negatives. Key results discussed in the text are highlighted in bold.

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

J ( A , B ) = | A B | | A B | = | A B | | A | + | B | | A B |
d J ( A , B ) = 1 J ( A , B ) = 1 | A B | | A | + | B | | A B |
L = 1 x l ( x ) p ( x ) x l ( x ) + x p ( x ) x l ( x ) p ( x ) + ϵ
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.