Physics-guided neural network for tissue optical properties estimation

Kian Chee Chong; Manojit Pramanik

doi:10.1364/BOE.487179

1. Introduction

The optical properties of tissue are important for both diagnostic and therapeutic applications of light [1]. From the diagnostic perspective, the optical properties affect the reflectance or transmittance of light that could be utilized for diagnostic purposes such as monitoring blood oxygenation, monitoring tissue metabolism, skin imaging, and tracking of particles in the tissue [2,3]. From the therapeutic perspective, the energy deposition of light in the tissue is the key for effective treatment strategies, such as photodynamic therapy, photo-thermal therapy, and low-level laser therapy [4,5]. For those therapeutic applications, accurate optical properties are required for effective and safe pre-treatment planning [6]. Hence, the research for a more accurate optical parameter estimation method has always been a primary interest of researchers. However, tissue optical properties measurement is difficult due to the intrinsically coupled nature of scattering and absorption [7]. The light transport in biological tissue is mainly characterized by scattering and absorption. Scattering being the dominant transport process, the light propagation profile in biological tissue can be characterized by the reduced scattering coefficient ($\mu _s^{\prime})$ and the absorption coefficient (${\mu _a})$. Due to the importance of these properties in characterizing the light propagation profile in the tissue, numerous methods for estimating $\mu _s^{\prime}$ and ${\mu _a}$ have been proposed. Commonly, there are two approaches to estimate these properties: 1) estimate optical properties by minimizing the difference between the measured spectra and data produced by a forward model of light transport in tissue [8–10], 2) using an inverse model which is able to directly regress the optical properties after obtaining measured spectra [11–17].

The first method has been extensively researched in the past. This direction of research often involves modelling the forward model of light propagation of tissue and subsequent properties estimation by minimizing the measured signal and the model output. Common forward models such as diffusion approximation (DA) and Monte Carlo (MC) simulation are often deployed to estimate optical properties from the diffused reflectance/transmittance spectra obtained at the tissue boundary [9,10,18,19]. While Monte Carlo simulation is considered the gold standard for simulating light transport for a given set of optical properties [20–23], it is computationally expensive, and a long computation time is needed for reducing statistical noise. Hence, direct deployment of the Monte Carlo model for properties estimation is often considered to be impractical. The diffusion approximation on the other hand is more computationally tractable. It approximates light transport when the reduced scattering coefficient ($\mu _s^{\prime}$) is much larger than the absorption coefficient $({{\mu_a}} )$. Under this condition, scattering is the dominant transport process. Interestingly, multiple directional scattering steps can be mimicked as an isotropic scattering process and the underlying radiation transport equation (RTE) can be reduced to a simpler diffusion equation (DE) [24]. However, even when $\mu _s^{\prime} \gg \;{\mu _a}$, the diffusion theory is inaccurate near the light source, which could be a source of error for optical parameter estimations [25].

The second method, which directly models the inverse relationship between the measured signal (usually at the tissue boundary, for non-invasive measurements) has recently gained more popularity, mostly by using the machine learning method. The analytical solution of the forward model is difficult to obtain, let alone inverting the whole model analytically. Hence, most of the inverse models are data-driven models, spanning across the determination of the empirical equations [26] to the usage of more advanced algorithms such as random forest regressions [13], multi-layer perceptron neural network (MLP) or artificial neural network (ANN) [7,15–17], and convolutional neural network (CNN) [12,14]. These models are often trained in a supervised manner where a set of labelled training data will be used to train a model which maps the labelled input to the output by minimizing some cost function. The surrogate models, especially for those with highly complex model architecture such as MLP and CNN are often called ‘black box models.’ They consist of large numbers of model parameters and the relationship between the parameters is highly complex hence not interpretable to humans. Although it is a relatively convenient way to model the inverse model, one of the major limitations of this black box model is the lack of consistency of its predictions concerning known laws of physics [27]. In the context of tissue optical properties estimation, it is totally possible for a model to predict negative values. Also, these black box models often show poor our-domain generalizability.

Physics-guided neural network (PGNN) has been shown to be an effective approach to incorporate the physics knowledge into the neural network [27–32]. The physics knowledge can be incorporated into the network by either incorporating physics-based models’ results as additional input features [27,28,30] or by incorporating physics-based loss function to constrain the training process [27,29]. It was shown that the resulting network shows superior prediction performance and better physical consistency compared to the conventional machine learning models.

In this work, three Physics-Guided Neural Networks are proposed to improve the estimation accuracy of tissues’ optical properties. The proposed networks are multi-layer perceptron networks with four hidden layers which take diffuse reflectance as inputs and estimate the $\mu _s^{\prime}$, ${\mu _a}$. Diffuse reflectance is defined as the scattered light emitted from the tissue surface upon irradiated by a pencil light source, as shown in Fig. 1(a). The proposed method measures the diffuse reflectance at different radial locations on the tissue surface and is able to measure the optical properties non-invasively. Inspired by the PGNN concept [27], the networks also take diffusion approximation results as additional inputs and physics-based loss is introduced in the training process to regularize the networks for better physical consistency. The train and test datasets are obtained by Monte-Carlo simulation performed using the open-source Pyxopto library in Python [33]. The dataset is set to be matched boundary, single layer slabs where the optical properties fulfil the diffusion approximation criteria ($\mu _s^{\prime} \gg \;{\mu _a}$). This is the first work that adopted the idea of PGNN into the field of tissue optical properties estimation. We demonstrate that PGNN shows significant improvement in the estimation generalizability and the incorporation of physics-based loss enhanced the physical consistency of the neural network.

Fig. 1. (a) The schematic of diffuse reflectance measurement at different radial locations upon pencil light beam excitation. (b) The simulation setup for dataset generation using Monte-Carlo simulation. The simulation geometry was set to fulfill the refractive-index-matched-boundary case where the ambient medium (water) and the tissue have the same refractive index. (c) Conversion of single-slab tissue diffuse reflectance measurement to diffuse reflectance measurement using diffusion approximation theory.

Download Full Size | PDF

2. Methods

2.1 Dataset preparation

For the training of the network, we need to have data sets with different optical properties and the corresponding diffuse reflectance. Since experimental data is not available at the moment, we used numerical simulations to generate the data set. In these simulations, the geometry of tissues was restricted to single layer slabs with infinite width. Moreover, the tissues were restricted to have a constant refractive index and were assumed to fulfill the condition that the non-scattering ambient medium and the scattering medium have the same refractive index $n = 1.33$, as shown in Fig. 1(b). The light source was set to be an infinitely narrow pencil beam and normally incident on the single-layer slabs. This setting ensured all the simulation samples fulfilled the conditions for matched boundary diffusion approximation described in [25], and their optical properties could subsequently be estimated using the derived diffuse reflectance equation described in [25]. Diffuse reflectance at 20 different locations which lied between 0 mm to 10 mm away from the light source along the radial axis were recorded.

Two datasets, the in-domain dataset and the out-domain dataset were created using Monte Carlo simulation. For the in-domain dataset, the reduced scattering $\mu _s^{\prime}$ was varied from 5-60 $c{m^{ - 1}}$ while ${\mu _a}$ was varied from 0.01-0.5 $c{m^{ - 1}}$. These values were chosen because they fulfill the diffusion approximation condition and were commonly found in tissues [26]. 50000 runs of Monte Carlo simulation were performed to generate the in-domain dataset where the optical properties for each run were randomly sampled from the domain of interest. For each run, 1 million photons were simulated to ensure low statistical noise. The Monte Carlo simulations were performed using open source Pyxopto Python library [33]. The GPU-accelerated simulations provided by Pyxopto greatly accelerated the sample preparation process. Running on NVIDIA GeForce GTX 1650, it took around 5 hours to complete all 50000 Monte Carlo simulations. For the out-domain dataset, the reduced scattering $\mu _s^{\prime}$ was varied from 70-90 $c{m^{ - 1}}$ while ${\mu _a}$ was varied from $0.25 - 0.8\;c{m^{ - 1}}.$ 2000 runs of Monte Carlo simulations were used to generate the out-domain dataset and the procedures were the same as the one described for the in-domain dataset.

The in-domain dataset was split into training and test set with a ratio of 80:20, and the training set was further split into training set and validation set with a ratio of 80:20. Block split strategy was used for both splitting process and the rationale of this will be discussed in section 2.4. The out-domain dataset was an additional test set which accounts for the networks’ capabilities to generalize their performance beyond sampling locations.

2.2 Estimation of optical properties by diffusion approximation

The physics-based results of the tissue optical properties were estimated by minimizing the diffusion approximation equation described in [25]. As described in [25], the diffuse reflectance from a matched-boundary semi-infinite scattering medium in response to an infinitely narrow pencil beam illumination can be calculated by the formula:

(1)$${R_d}(r )= \frac{{a^{\prime}z^{\prime}({1 + {\mu_{eff}}{\rho_1}} )\textrm{exp}({ - {\mu_{eff}}{\rho_1}} )}}{{4\pi \rho _1^3}}\; \; + \frac{{a^{\prime}({z^{\prime} + 4D} )({1 + {\mu_{eff}}{\rho_2}} )\textrm{exp} ({ - {\mu_{eff}}{\rho_2}} )}}{{4\pi \rho _2^3}}$$

where, $a^{\prime} = \frac{\mu _s^{\prime}}{{{\mu _a} + \mu _s^{\prime}}}$ is the transport albedo, ${\mu _{eff}} = \sqrt {\frac{{{\mu _a}}}{D}}$ is the effective attenuation coefficient of the light in response to an infinitely narrow beam illumination in an infinite homogenous scattering medium, $D = \frac{1}{{3({{\mu_a} + \mu_s^{\prime}} )}}$ is the diffusion coefficient, $z^{\prime} = l_t^{\prime} = \frac{1}{{{\mu _a} + \mu _s^{\prime}}}$ is the transport mean free path, ${\rho _1}$ is the distance between the observation point $({r,0,0} )$ and the original source point $({0,0,z^{\prime}} )$ and ${\rho _2}$ is the distance between the observation point $({r,0,0} )$ and the image source point $({0,0, - z^{\prime} - 4D} )$. The illustration of the original source point and observation point are shown in Fig. 1(c).

As shown in Fig. 1(c), the diffuse reflectance can be approximated by diffusion approximation through several conversions. The unit-power pencil beam was converted to an equivalent isotropic point source (original source) with power equal to transport albedo $a^{\prime}$ at $z = l_t^{\prime}$. Moreover, an additional image source (mirror-symmetric with the original point source) was added above the surface at $z = ({ - l_t^{\prime} + 2{z_b}} )$. Also, the scattering coefficient of the medium was converted to $\mu _s^{\prime} = ({1 - g} ){\mu _s}$, while the light was thought to scatter isotropically in the medium with the converted $\mu _s^{\prime}$. This conversion allowed us to easily evaluate the diffuse reflectance of the tissue upon stimulated by a unit power pencil beam by applying Eq. (1) which is derived based on these conversions [25].

In short, Eq. (1) is derived from the principle of diffusion approximation when an infinitely narrow photon beam incident normally on a semi-infinite scattering medium. It takes ${\mu _a},\;\mu _s^{\prime}$ and radial distance r as inputs and produces the relative diffuse reflectance measured at that radial distance. The optical properties can be estimated by minimizing the difference between the diffuse reflectance in Monte Carlo simulation and the diffuse reflectance obtained through Eq. (1). In principle, the more accurate the estimated optical properties are, the smaller the difference between the reflectance obtained through the two different methods (MC and diffusion approximation).

In this work, the optimization was performed by minimizing the mean squared log error (MSLE) between the diffuse approximation reflectance and Monte Carlo reflectance. MSLE is defined as follow:

(2)$$\frac{1}{n}\mathop \sum \limits_{i = 1}^n ({\log ({{R_{MC}} + 1} )- \log ({{R_{DA}} + 1} )} ){\;^2}$$

The rationale for choosing MSLE instead of the more commonly used mean absolute error (MAE) or mean squared error (MSE) was to avoid the relatively large reflectance near the light source to saturate the function value. In this context, MSLE worked more desirably as it measured the relative error between the two signals as shown in Eq. (3) and hence gave more equal weightage to all reflectance measured.

(3)$$\frac{1}{n}\mathop \sum \limits_{i = 1}^n {\left( {\log \frac{{{R_{MC}} + 1}}{{{R_{DA}} + 1}}} \right)^2}{\;^{}}$$

The minimization was performed using the differential evolution algorithm provided by the open-source Python library Scipy. Differential evolution was chosen as the minimization algorithm due to two reasons: 1) Differential evolution has exhibited good performance in a variety of optimization problems and does not make any assumption about the problems to be optimized, 2) Differential evolution is a global optimizer which could help reduce the risk of getting into poor local minima despite global convergence is not always guaranteed [34,35].

2.3 PGNN implementation and physics-based Loss

As one of the aims of this work is to incorporate the physics prior which is obtained through diffusion approximation into the neural network, three PGNN networks inspired by [27] were adopted to achieve this objective. The pure artificial neural network (ANN) which takes no DA inputs was set as the baseline model. The detailed architecture of the ANN will be further discussed in section 2.4. Having the pure ANN as the model backbone, three variants of PGNN proposed are shown in Fig. 2.

Fig. 2. Schematic of the physics guided neural networks. ANN network is a conventional neural network which takes Monte-Carlo diffuse reflectance as input to estimate the optical properties. Model D1, D2, D3 are variants of the pure ANN to incorporate optical properties estimated using the diffusion approximation. Model D1 takes the DA-estimated optical properties as additional inputs. Model D2 adds the DA-estimated optical properties to the output regressed by ANN to give final outputs. Model D3 is a combination of D1 and D2.

Download Full Size | PDF

Model D1 is a hybrid-physics-data model which takes the DA inputs as additional input features. As there is discrepancy in DA-based diffuse reflectance compared to the MC-based diffuse reflectance [25] which could lead to estimation error, it was envisioned that this error could be complemented by the data-driven optimization. On the other hand, it was envisioned that the incorporation of DA-based diffuse reflectance could help improve the model’s physical consistency. The second model, Model D2 is called as “Residual Model” [27]. Instead of modelling the optical properties directly, the neural network models the residuals of the results obtained from diffusion approximation. The third model, model D3 can be seen as a combination of model D1 and D2. It models the residuals of the diffusion approximation similar to D2, however with the diffusion approximation results cascaded as additional input features similar to D1.

In addition, the physics-based loss was incorporated as part of the training loss to incorporate domain knowledge into the neural networks’ training process. Specifically, as the optical properties are always positive values, a simple RELU function was implemented to penalize negative estimations as these predictions violate the strictly positive nature of tissue optical properties. The physics-based loss is shown as follow:

(4)$$Los{s_{PHY}}({\mu_s^{\prime},{\mu_a}} ) = \frac{1}{2}({ReLU({ - \mu_s^{\prime}} )+ ReLU({ - {\mu_a}} )} )$$

where, ReLU function is simply:

(5)$$ReLU(x ) = \left\{ {\begin{array}{ll} {x\; ,\; \; if\; x > 0}\\ {0,\; \; if\; x \le 0\; } \end{array}} \right.$$

As shown in Eq. (4), the negative signs in the ReLU function invert the sign of the predictions such that only negative values will have non-zero values and hence being penalized.

2.4 Baseline model and experimental design

In this study, three preliminary experiments were performed to respectively determine the optimal number of hidden layers, the optimal block-splitting strategy for the dataset split, and the optimal hyperparameter for physics-based training loss. Subsequently, two experiments were performed to respectively evaluate the effect of incorporating diffusion approximation results into the network and the effect of incorporating physics loss in the training process. To ensure result consistency and interpretability, the hyperparameters for all five experiments were kept same and were listed in Table 1.

Table 1. The training hyperparameters for all experiments in this study.

View Table

As described in section 2.3, the pure ANN was set to be the baseline model of this study and three PGNN variants were evaluated compared to this baseline model. All four networks would adopt the same backbone structure. Hence, it was important to design a good baseline model ANN. As the problem domain in this study was relatively small and the number of training samples was only around 50000, the ANN was set to be a small neural network to avoid overfitting. To determine the optimal network architecture, a preliminary experiment was performed by setting the number of nodes in each hidden layer to 10 and varying the depth of the hidden layer from 3-7. At this stage, the networks were trained with the in-domain dataset split by 0.125 block spacing. For all five model architectures with different depths of hidden layers, the total parameters were kept below 1000. Each model architecture was run 5 times to account for the stochastic nature of neural network optimizations. The ANN with 4 hidden layers was found to be the most optimal one in terms of in-domain performance and out-domain generalizability (See Fig. S1 in the Supplement 1) and hence was chosen as the baseline model.

Next, we performed another preliminary experiment to determine the optimal block split for our in-domain dataset. The block-splitting strategy was performed to avoid data leakage between the train, validation, and test data set. As the sampling frequency was dense relative to the sampling domain, uniform splitting as shown in Fig. 3(a) would lead to serious data leakage. More precisely, the train and test dataset would be highly correlated and would lead to highly optimistic evaluations of predictive power [36,37]. This would mask overfitting as superior results would be obtained for the test set due to the correlation between the train and the test set. The most direct consequence of this flaw was the inability of the trained networks to predict inputs originating beyond the initial dataset. To avoid this, a block-splitting strategy across the in-domain dataset was applied. Splitting strategies with different block spacing (0.025, 0.05, 0.075, 0.1, 0.125, 0.15, 0.175, 0.2, as shown in Fig. S2 in the Supplement 1) relative to the normalized parameter range, where each parameter was normalized using min-max normalization with respect to its maximum and minimum values, were evaluated. The splitting was done in the normalized parameters domains instead of the real parameter domains as the ${\mu _a}$ and $\mu _s^{\prime}$ had different scales in magnitudes, in which each parameter would have different contributions to the spatial distance and would result in non-optimal splitting. Each strategy was evaluated by training a 4-hidden-layer neural network 5 times and subsequently in-domain and out-domain test set performance were evaluated (as shown in Fig. S3 in the Supplement 1). The spatial split strategy with 0.05 spacing, as shown in Fig. 3(b) was chosen as it yielded the best performance for both in-domain and out-domain test sets.

Fig. 3. (a) Uniform random split would lead to significant data leakage between the train, validate and test data set. This made the test set evaluation to be inappropriate as the spatially correlated data would lead to an overly optimistic interpretation. (b) The final spatial block split for the overall dataset. This set of train, validation and test data were used across all experiments in this work. The optical parameters ${\mu _a}$ and $\mu _s^{\prime}$ were normalized using min-max normalization with respect to their respective minimum and maximum values.

Download Full Size | PDF

With the baseline model and data splitting strategy determined, two experiments were conducted to evaluate if the incorporation of domain knowledge could help improve model performance and generalizability. The first experiment was designed to evaluate the effect of incorporating the physics prior without integrating the physics-based loss. In this experiment, all four network architectures including pure ANN and PGNNs were trained solely with the empirical loss – Mean Squared Loss (MSE). Each architecture was trained 30 times to account for the stochastic nature of training neural network and the mean and standard deviation of the performance metrics across 30 runs were reported [38]. In the second experiment, the effect of physics-based loss was evaluated. The training loss for this experiment was replaced with Eq. (6) with hyperparameter $\lambda = 0.7$ (a preliminary experiment was performed to determine the optimal hyperparameter, as shown in Fig. S4 in the Supplement 1). The training protocol was the same as the first experiment. For both experiments, the hyperparameters were set as shown in Table 1.

(6)$$\begin{array}{c} Loss = \lambda \;Los{s_{Emp}} + ({1 - \lambda } )\;Los{s_{PHY}}\\ Loss = \frac{1}{N}\mathop \sum \limits_{i = 1}^N \lambda {({y - \hat{y}} )^2} + ({1 - \lambda } )\frac{1}{2}({ReLU({ - \mu_s^{\prime}} )+ ReLU({ - {\mu_a}} )} ) \end{array}$$

It is worth mentioning that the stochasticity of the experiments came from the random weight initialization and the random mini-batch optimization process. Also, all the neural networks in both preliminary and real experiments were trained using the Pytorch library (version 1.13.0) on an NVIDIA GeForce GTX 1650 GPU. It took around 10 minutes to complete one round of training and evaluation.

2.5 Evaluation Metrics

The evaluation of the networks was done by evaluating metrics as shown below on both the in-domain dataset and the out-domain dataset. While the in-domain dataset showed the network prediction performance, the out-domain dataset showed the generalizability of the network.

$\boldsymbol{\mu }_{\boldsymbol{s}}^{\prime},\;{\boldsymbol{\mu }_{\boldsymbol{a}}}$ mean relative error: The mean relative error of the respective parameter was evaluated by taking the average of the relative error across the whole test set. This metric was chosen instead of RMSE as it gave an unbiased relative error metric across data of all ranges while in RMSE, the high relative error of small predictions would be masked.

$\boldsymbol{\mu }_{\boldsymbol{s}}^{\prime},\;{\boldsymbol{\mu }_{\boldsymbol{a}}}$ number of negative values predicted: This metric accounted for the networks’ physical inconsistency. As the optical properties should only be positive values, negative values that predicted by the networks were to be seen as violation of underlying physics law.

3. Result and discussion

3.1 Optical properties estimation using diffusion approximation

Due to the selection of the data domain which adheres to the diffusion approximation (DA) conditions, a reasonably acceptable result was obtained using the DA minimization method, especially for the prediction of $\mu _s^{\prime}$. Evaluated on the whole dataset, the average relative error for $\mu _s^{\prime}$ was 4.09% while for ${\mu _a}$ was 9.55%. For $\mu _s^{\prime}$, the maximum relative error was around 20% and happened in the range of 10-20 $c{m^{ - 1}}$ as shown in Fig. 4(a) and Fig. S6(a) (in the Supplement 1). However, the method did not perform equivalently well in predicting ${\mu _a}$. The maximum relative error in ${\mu _a}$ was around 350% (as shown in Fig. S6(b) in the Supplement 1) which occurred at small values of ${\mu _a}$. This was reasonably expected as a small deviation resulted in a huge relative error for small value ground truths. For large values ${\mu _a}$ ground truth, the deviation spread was huge as shown in Fig. 4(b), showing that diffusion approximation was not able to accurately predict ${\mu _a}$ and there was room for using data-driven method to complement the diffusion approximation method. To enable fair comparison for subsequent neural network performance evaluation, the test set data with 0.05 block spacing were isolated and evaluated. As the test set was a good representation of the whole data domain, the performance metrics were similar to the ones evaluated using the whole dataset. The DA method had an average relative error of 4.75% and 8.98% for $\mu _s^{\prime}$ and ${\mu _a}$, respectively tested on the in-domain test set.

Fig. 4. (a) Prediction of $\mu _s^{\prime}$ obtained through diffusion approximation minimization versus the ground truth. (b) Prediction of ${\mu _{a}}$ obtained through diffusion approximation minimization versus the ground truth. The 45-degree yellow line corresponds to perfect prediction.

Download Full Size | PDF

It is also important to note that the source of prediction error was due to the reflectance deviation between the diffusion approximation model and the Monte Carlo simulation model. It was found that the DA signals evaluated with the minimized results had smaller deviations from the MC signals compared to DA signals evaluated with ground truth properties. This indicates that the minimization algorithm performed reasonably well, and the source of error indeed came from the deviation between DA and MC modelling of diffuse reflectance. Note that, the Diffusion Approximation method has an inherent limitation, displaying low accuracy at locations where the distance from the light source is shorter than one transport mean free path ($l_t^{\prime}$). The transport mean free path is defined as: $l_t^{\prime} = \frac{1}{{\mu _t^{\prime}}} = \frac{1}{{{\mu _a} + \; \mu _s^{\prime}}}$. Hence, the reflectance at the first few radial locations (0-1.5 mm) were omitted in the optimization process. However, those locations contain rich information as they consist of strong signal response hence removal of these reflectance led to the reduced prediction accuracy of the DA model.

Hence, we propose that the data-driven prediction method could complement DA model by correcting two of its main limitations: 1) Data-driven model can leverage on near light source reflectance to yield superior properties differentiability; 2) Data-driven model can leverage on ground truth training samples to alleviate reflectance deviations between different models.

3.2 Effects of incorporating physics prior to ANN

Figure 5 shows the neural networks’ prediction performance over 30 different runs. The x-axis corresponds to the baseline neural network and its physics-guided variants as explained in section 2.3, while the y-axis shows the relative error of the network prediction for the respective parameter. Note that the midpoint numbers and the error bars represent the mean standard deviation of the whole test set’s relative error across 30 runs. In other words, each neural network outputs a total mean relative error for both optical properties, by averaging across the whole test dataset, and Fig. 5 corresponds to the mean and standard deviation of this total mean relative error across 30 different trained networks, not the mean and standard deviation of relative error of the optical properties across the test dataset evaluated in one realization of a trained network. It is also important to note that the DA method, where performance for respective parameters was shown as green triangular marker in Fig. 5, was a deterministic method. Hence, it only had one deterministic value for the performance of respective parameter.

Fig. 5. The mean relative errors between predicted properties and the ground truths. Top row (a,c) corresponds to the mean relative error of $\mu _s^{\prime}$ for both in-domain and out-domain data. Bottow row (b,d) corresponds to the mean relative error of ${\mu _a}$ for both in-domain and out-domain data. The midpoint numbers and error bars correspond to mean and standard deviation of relative errors for each specified model. In all figures, the green markers and lines correspond to the relative error of DA method.

Download Full Size | PDF

As shown in Fig. 5, the neural networks showed superior performance for in-domain data compared to the diffusion approximation method, even for the pure ANN which was driven solely by empirical data. When only considering the in-domain data, the ANN reduced the mean prediction error by 66.38% in $\mu _s^{\prime}$ and 34.07% in ${\mu _a}$ as compared to the DA method. However, it was noticeable that the pure ANN had large performance variance across the 30 different runs, which might result in inconsistent prediction accuracies when trained using different weight initialization. Moreover, the pure ANN showed low prediction accuracy when evaluating the out-domain data, showing poorer performance than the ANN by having almost 2 times and 3 times more error compared to DA method in $\mu _s^{\prime}$ and ${\mu _a}$, respectively. This indicates low generalizability of the network towards out-domain data. The incorporation of DA prior as a part of the network architecture had shown to be beneficial for both the prediction of in-domain and out-domain data. Specifically, all three PGNN variants were shown to reduce the performance variability for both in-domain and out-domain predictions. This shows that the incorporation of physics information was beneficial as it provided a more uniform training result regardless of the choice of weight initialization. More importantly, all three PGNN variants were shown to increase the out-domain generalizability of the network. Quantitatively, the PGNN models D1, D2 and D3 reduced the prediction error by 32.28%, 76.99%, 48.26% in $\mu^{\prime}_s$ and 50.54%, 83.19%, 62.21% in ${\mu _a}$, respectively when comparing to the ANN. Although all three networks were shown to improve the network performance, it was interesting that different networks showed reasonably distinct performance, which posed a further question on which PGNN variant should be considered as the best network candidate in the scope of this research study.

In this experiment, the model D2 seemed to be the best PGNN variant in predicting the optical properties. It was the only neural network that showed superior performance compared to diffusion approximation in all four scenarios (2 properties and 2 sets of test data). While it had comparatively similar results compared with the other variants in predicting in-domain data, it showed much better accuracy in predicting out-domain data. Interestingly, it did not utilize physics prior as additional input for the neural network. It is a residual modelling strategy that capture the residual in the physics-based model, instead of estimating the complete functional mapping from input to output [27]. This might indicate a couple of things: 1) Modelling the systematic bias of the diffusion approximation is easier than modelling the whole inverse mapping, 2) Model D3 as a fusion of model D1 and D2 did not show superiority for out-domain generalizability. This might be because of the increased variance due to the incorporation of physics prior as an input driver compared to model D2, which leads to better in-domain prediction but worse out-domain generalizability.

More interestingly, as shown in Fig. 6, the incorporation of physics prior significantly reduced the physical inconsistency of the neural network predictions. Across 10,000 in-domain test set predictions, on average, only less than 5 predictions resulted in negative values for all PGNN networks while the pure ANN had significantly more negative predictions. Note that the neural networks at this stage were trained solely with empirical loss function MSE. Hence, it was fascinating to see how the incorporation of physics prior could help the network to give better physical consistency.

Fig. 6. The number of negative values predicted by each neural network variant across 30 runs. The numbers represent the mean while the error bar represents the standard deviation. Note that the numbers correspond to the occurrence of negative predictions across 10,000 in-domain predictions and 2,000 out-domain predictions. In all figures, the green markers and lines correspond to the number of negative predictions made by DA method.

Download Full Size | PDF

3.3 Effects of incorporating physics loss into training process

By incorporating physics prior, we were able to boost neural network performance and generalizability, however, the effect of incorporating physics knowledge through introducing physics-based loss function had not yet been evaluated. As explained in section 2.3, a ReLU-based loss function was introduced as our physics-based loss function to penalize negative predictions which were treated as physical inconsistency. In this second experiment, all the ANN and PGNN variants were trained according to the loss function Eq. (6) with hyperparameter $\lambda $ equals 0.7. As shown in Fig. 7, there was no substantial improvement in the prediction accuracy, in fact in some cases, even deterioration of performance was observed. This might be because the number of negative values predicted initially without physics- based loss was already relatively less, hence penalizing the negative predictions did not a yield perceivable boost in performance. On the other hand, having $\lambda = 0.7$ decreased the penalizing effect of MSE compared to the loss function without physics-based loss, this reduced the training speed of the network, and hence more iterations were needed to yield the same performance as the counterparts without physics-based loss. However, as the maximum iteration was set to be 250, this might prematurely truncate the training process of neural networks which were trained with physics-based loss. Nonetheless, as shown in Fig. 7(c), physics-based loss indeed enforced the networks to have lesser negative predictions which made them more physically consistent, justifying the usefulness of ReLU function in enhancing neural network physical consistency in the context of optical properties estimation.

Fig. 7. The prediction performance of the neural network across 30 runs. The annotated numbers are the mean while error bars are the standard deviation of respective metrics. The green lines (with square markers) correspond to the prediction performance of neural networks trained with physics-incorporated loss function. The lines with other colours (with round markers) are prediction performance of neural networks trained solely with empirical loss function MSE. (a,d) The mean relative error for $\mu _s^{\prime}$ for both in-domain and out-domain data. (b,e) The mean relative error for ${\mu _a}$ for both in-domain and out-domain data. (c,f) The number of negative predictions predicted by different network variants tested on in-domain and out-domain data, respectively.

Download Full Size | PDF

4. Conclusion and future works

In this work, we adopted the idea of physics guided neural network in the field of tissue optical properties estimation. Specifically, we adopted the three PGNN variant models described in [27] to incorporate diffusion approximation results as prior into the artificial neural network architecture. Moreover, a physics-based loss which relied on ReLU function to penalize negative predictions was introduced to enhance the physical consistency of the networks. The PGNN networks had shown to give better prediction performance for in-domain data compared to the diffusion approximation method and much better generalizability for out-domain data compared to pure ANN. Moreover, it was also shown to have better physical consistency whereas lesser negative predictions were obtained compared to its ANN counterpart. These benefits came with negligible computational cost as the training of both ANN and PGNN took around the same 10 minutes and the evaluation of one sample could be done in milliseconds. This work could easily be extended to more complicated cases of tissue optical parameter estimations such as extending it to mismatched boundary single-layered tissue samples. However, we also note that the extendibility of this strategy is dependent on the currently available solvable physics model. While it might be possible to extend this strategy to slightly more complex geometry such as multi-layered slab geometry [39], it is not infinitely extensible to arbitrary complex geometries such as multi-organ mouse model [40] due to the lack of underlying solvable physics model.

Disclosures

The authors declare that they have no conflict of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. S. L. Jacques, “Optical properties of biological tissues: a review,” Phys. Med. Biol. 58(11), R37–R61 (2013). [CrossRef]

2. H. Liu, D. A. Boas, Y. Zhang, A. G. Yodh, and B. Chance, “Determination of optical properties and blood oxygenation in tissue using continuous NIR light,” Phys. Med. Biol. 40(11), 1983–1993 (1995). [CrossRef]

3. R. C. Mesquita, T. Durduran, G. Yu, E. M. Buckley, M. N. Kim, C. Zhou, R. Choe, U. Sunar, and A. G. Yodh, “Direct measurement of tissue blood flow and metabolism with diffuse optics,” Philos. Trans. R. Soc., A 369(1955), 4390–4406 (2011). [CrossRef]

4. H. Chung, T. Dai, S. K. Sharma, Y.-Y. Huang, J. D. Carroll, and M. R. Hamblin, “The nuts and bolts of low-level laser (light) therapy,” Ann. Biomed. Eng. 40(2), 516–533 (2012). [CrossRef]

5. S. H. Yun and S. J. Kwok, “Light in diagnosis, therapy and surgery,” Nat. Biomed. Eng. 1(1), 008 (2017). [CrossRef]

6. A. N. Bashkatov, E. A. Genina, V. I. Kochubey, and V. V. Tuchin, “Quantification of tissue optical properties: perspectives for precise optical diagnostics, phototherapy and laser surgery,” J. Phys. D: Appl. Phys. 49(50), 501001 (2016). [CrossRef]

7. B. H. Hokr and J. N. Bixler, “Machine learning estimation of tissue optical properties,” Sci. Rep. 11(1), 6561 (2021). [CrossRef]

8. A. Zidouk, Recovering the Optical properties of a Tissue using Maximum A Posteriori Based Estimation, (University of Birmingham, 2015).

9. G. J. Mueller and S. Jacques, “Video reflectometry to specify optical properties of tissue in vivo,” presented at Medical Optical Tomography: Functional Imaging and Monitoring 1993.

10. A. Kienle, L. Lilge, M. S. Patterson, R. Hibst, R. Steiner, and B. C. Wilson, “Spatially resolved absolute diffuse reflectance measurements for noninvasive determination of the optical scattering and absorption coefficients of biological tissue,” Appl. Opt. 35(13), 2304–2314 (1996). [CrossRef]

11. M. S. Patterson, B. Chance, and B. C. Wilson, “Time resolved reflectance and transmittance for the noninvasive measurement of tissue optical properties,” Appl. Opt. 28(12), 2331–2336 (1989). [CrossRef]

12. S. Sabir, S. Cho, Y. Kim, R. Pua, D. Heo, K. H. Kim, Y. Choi, and S. Cho, “Convolutional neural network-based approach to estimate bulk optical properties in diffuse optical tomography,” Appl. Opt. 59(5), 1461–1470 (2020). [CrossRef]

13. S. Panigrahi and S. Gioux, “Machine learning approach for rapid and accurate estimation of optical properties using spatial frequency domain imaging,” J. Biomed. Opt. 24(07), 1–6 (2018). [CrossRef]

14. M. Zhang, S. Li, Y. Zou, and Q. Zhu, “Deep learning-based method to accurately estimate breast tissue optical properties in the presence of the chest wall,” J. Biomed. Opt. 26(10), 106004 (2021). [CrossRef]

15. D. Warncke, E. Lewis, S. Lochmann, and M. Leahy, “A neural network based approach for determination of optical scattering and absorption coefficients of biological tissue,” Journal of Physics: Conference Series178 (2009). [CrossRef]

16. L. Zhang, Z. Wang, and M. Zhou, “Determination of the optical coefficients of biological tissue by neural network,” J. Mod. Opt. 57(13), 1163–1170 (2010). [CrossRef]

17. Y. W. Chen and S. H. Tseng, “Efficient construction of robust artificial neural networks for accurate determination of superficial sample optical properties,” Biomed. Opt. Express 6(3), 747–760 (2015). [CrossRef]

18. G. M. Palmer and N. Ramanujam, “Monte Carlo-based inverse model for calculating tissue optical properties. Part I: Theory and validation on synthetic phantoms,” Appl. Opt. 45(5), 1062–1071 (2006). [CrossRef]

19. R. M. Doornbos, R. Lang, M. C. Aalders, F. W. Cross, and H. J. Sterenborg, “The determination of in vivo human tissue optical properties and absolute chromophore concentrations using spatially resolved steady-state diffuse reflectance spectroscopy,” Phys. Med. Biol. 44(4), 967–981 (1999). [CrossRef]

20. S. Wang, J. Chen, F. Zhang, M. Zhao, X. Cui, and S. Chen, “Accelerating Monte Carlo simulation of light propagation in tissue mimicking turbid medium based on generative adversarial networks,” Med. Phys. 49(2), 1209–1215 (2022). [CrossRef]

21. A. Liemert, D. Reitzle, and A. Kienle, “Analytical solutions of the radiative transport equation for turbid and fluorescent layered media,” Sci. Rep. 7(1), 3819 (2017). [CrossRef]

22. L. Wang, S. L. Jacques, and L. Zheng, “CONV—convolution for responses to a finite diameter photon beam incident on multi-layered tissues,” Comput. Methods Programs Biomed. 54(3), 141–150 (1997). [CrossRef]

23. L. Wang and S. L. Jacques, “Monte Carlo modeling of light transport in multi-layered tissues in standard C,” Comput. Methods Programs Biomed. 47(2), 131–146 (1995). [CrossRef]

24. T. Vo-Dinh, Biomedical Photonics Handbook: (Fundamentals, Devices, and Techniques) (CRC Press, 2014).

25. L. V. Wang and H. Wu, Biomedical Optics: Principles and Imaging (Wiley, 2012).

26. M. Johns, C. A. Giller, D. C. German, and H. Liu, “Determination of reduced scattering coefficient of biological tissue from a needle-like probe,” Opt. Express 13(13), 4828–4842 (2005). [CrossRef]

27. A. Daw, A. Karpatne, W. Watkins, J. Read, and V. Kumar, “Physics-guided neural networks (pgnn): An application in lake temperature modeling,” arXiv, arXiv:1710.11431 (2017). [CrossRef]

28. N. Muralidhar, J. Bu, Z. Cao, L. He, N. Ramakrishnan, D. Tafti, and A. Karpatne, “Physics-guided deep learning for drag force prediction in dense fluid-particulate systems,” Big Data 8(5), 431–449 (2020). [CrossRef]

29. N. Muralidhar, M. R. Islam, M. Marwah, A. Karpatne, and N. Ramakrishnan, “Incorporating prior domain knowledge into deep neural networks,” in 2018 IEEE international conference on big data (big data) (IEEE, 2018), pp. 36–45.

30. S. Pawar, O. San, B. Aksoylu, A. Rasheed, and T. Kvamsdal, “Physics guided machine learning using simplified theories,” Phys. Fluids 33(1), 011701 (2021). [CrossRef]

31. X. W. Jia, J. Willard, A. Karpatne, et al., “Physics guided RNNs for modeling dynamical systems : A case study in simulating lake temperature profiles,” SIAM International Conference on Data Mining, p. 558–566.

32. R. Zhang, Y. Liu, and H. Sun, “Physics-guided convolutional neural network (PhyCNN) for data-driven seismic response modeling,” Eng. Struct. 215, 110704 (2020). [CrossRef]

33. P. Naglič, Y. Zelinskyi, F. Pernuš, B. Likar, and M. Bürmen, “PyXOpto: an open-source Python library with utilities for fast light propagation modeling in turbid media,” in European Conference on Biomedical Optics (Optical Society of America, 2021), p. EM3C. 2.

34. R. Storn and K. Price, “Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces,” J. Glob. Optim. 11(4), 341–359 (1997). [CrossRef]

35. P. Virtanen, R. Gommers, T. E. Oliphant, et al., “SciPy 1.0: fundamental algorithms for scientific computing in Python,” Nat. Methods 17(3), 261–272 (2020). [CrossRef]

36. P. Ploton, F. Mortier, M. Rejou-Mechain, N. Barbier, N. Picard, V. Rossi, C. Dormann, G. Cornu, G. Viennois, N. Bayol, A. Lyapustin, S. Gourlet-Fleury, and R. Pelissier, “Spatial validation reveals poor predictive performance of large-scale ecological mapping models,” Nat. Commun. 11(1), 4540 (2020). [CrossRef]

37. D. R. Roberts, V. Bahn, S. Ciuti, M. S. Boyce, J. Elith, G. Guillera-Arroita, S. Hauenstein, J. J. Lahoz-Monfort, B. Schröder, W. Thuiller, D. I. Warton, B. A. Wintle, F. Hartig, and C. F. Dormann, “Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure,” Ecography 40(8), 913–929 (2017). [CrossRef]

38. S. Lathuilière, P. Mesejo, X. Alameda-Pineda, and R. Horaud, “A comprehensive analysis of deep regression,” IEEE Trans. Pattern Anal. Mach. Intell. 42(9), 2065–2081 (2020). [CrossRef]

39. X. Wang, “The diffusion equation in multilayered rectangular biological tissue with finite thickness,” Optik 180, 144–150 (2019). [CrossRef]

40. H. Shen and G. Wang, “A tetrahedron-based inhomogeneous Monte Carlo optical simulator,” Phys. Med. Biol. 55(4), 947–962 (2010). [CrossRef]

Hyperparameters
Input Normalization	Min-max normalization
Output Normalization	Min-max normalization
Activation Function	Leaky ReLU
Optimizer	AdamW
Learning Rate	0.0001
Weight Decay	0.0001
Empirical Loss Function	Mean Squared Error (MSE)
Early Stopping	Early stopping was triggered after validation MSE was not reducing for 50 consecutive runs.
Max Epoch	250
Batch Size	128 (Random mini-batch)

Physics-guided neural network for tissue optical properties estimation

Abstract

1. Introduction

2. Methods

2.1 Dataset preparation

2.2 Estimation of optical properties by diffusion approximation

2.3 PGNN implementation and physics-based Loss

2.4 Baseline model and experimental design

2.5 Evaluation Metrics

3. Result and discussion

3.1 Optical properties estimation using diffusion approximation

3.2 Effects of incorporating physics prior to ANN

3.3 Effects of incorporating physics loss into training process

4. Conclusion and future works

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (1)

Equations (6)

Biomedical Optics Express