Blood species identification based on deep learning analysis of Raman spectra

Shan Huang; Shan Huang; Shan Huang; Peng Wang; Peng Wang; Yubing Tian; Yubing Tian; Pengli Bai; Pengli Bai; DaQing Chen; Ce Wang; Ce Wang; JianSheng Chen; JianSheng Chen; ZhaoBang Liu; ZhaoBang Liu; Jian Zheng; Jian Zheng; WenMing Yao; WenMing Yao; JianXin Li; Jing Gao; Jing Gao

doi:10.1364/BOE.10.006129

1. Introduction

The discrimination between human and nonhuman blood and the identifications of blood interspecies play a vital role in customs inspection, forensic science, veterinary purpose and wildlife preservation [1]. Several techniques have been developed for this purpose, such as high performance liquid chromatography (HPLC) [2,3], mass spectroscopy (MS) [4,5], quantitative PCR [6], and DNA profiling. HPLC methods have been developed to determine blood species and present good sensitivity and resolution. Human and nonhuman species of fresh blood and blood stains were analyzed by Inoue et al [2]. MS method was used by Espinoza et al. with bloodstains and blood mixtures from over 16 different animal species analyzed and quantitated [4]. MS was demonstrated as an effective tool in identifying species. Minor interspecies molecular mass differences in α- and β-chains (α/β-pairs) of hemoglobin from 62 different species were detected by MS. HPLC and MS are of high performance. However, these methods need reagents and complex operation. They are time-consuming and destructive to samples. Moreover, they are dangerous for inspectors because of the contact with the blood samples.

The Raman spectroscopy and Fourier Transform Infrared Spectroscopy (FTIR) have shown effectiveness in analyzing blood droplets and stains in recent years. Vibrational spectroscopy is a rapid and noninvasive method exhibiting the fingerprint profile. The first study focused on blood identification using FTIR and Raman spectroscopy was done by K.D. Wael et al. in 2008 [7]. They were able to identify bloodstains on different substrates but fail to differentiate cat, dog and human bloodstains for not using multivariate statistic method. I. K. Lednev group used Raman spectroscopy and principal component analysis (PCA) to distinguish between human, cat, and dog blood with 99% confidence in 2009 [8]. Discrimination of bloods from the same three species was successful by attenuated total reflectance Fourier-transform infrared (ATR FT-IR) spectroscopy and partial least squares discriminant analysis (PLS-DA) [9]. The application of statistical models to Raman spectra enhances the selectivity of Raman spectroscopy and similar classification models have been built extensively. PLS-DA of blood spectra collected using Raman spectroscopy, diffuse reflectance spectroscopy, or spatially resolved near-infrared transmission spectroscopy have been used to discriminate human and animal blood [10–16]. Later, I. K. Lednev group expanded the animal species to 16 and built a binary model for discrimination of human and nonhuman blood [17]. Up to now the most frequently studied blood samples are blood droplets or bloodstains in forensic science. The nondestructive and noncontact detection method needs to be established to discriminate the liquid whole blood in vacuum blood tube directly without sampling, because it’s not only time-saving but also safe to the inspectors. In our previous work, the discrimination of fresh blood droplet and whole blood in vacuum blood tube using combination of Raman spectroscopy and PLS were studied [18–21].

As a branch of machine learning, deep learning uses multi-layer network to transform data features. The concept of deep learning was first reported by G. Hinton in 2006 [22]. With the improvement of algorithm and the enhancement of computer processing ability, deep learning opened a new era in 2012. Using convolutional neural network (CNN), effective features data can be extracted from complex spectral or image data and the inner structure of feature data can be learned for classification. CNN have been used to analyze time series signals [23], face recognition [24] in industrial field and EGG signals [25], CT&MRI images and pathological images in biomedical field. CNN has been applied to CT images to detect tumors such as mammographic lesions [26,27] and pulmonary nodule [28,29]. CNN has also been used to help diagnosing histological images or stimulated Raman scattering microscopy images [30]. Based on the successful application of CNN in above cases, it was illustrated that CNN has more powerful modeling capability compared with traditional models.

To the best of our knowledge, the CNN has not been used for classification of blood species. In this study, the CNN models were built combined with Raman spectroscopy to identify blood interspecies, including both discrimination between human and nonhuman blood and interspecies of animal bloods. Firstly, Raman spectra of human and animal bloods in blood collection tubes were adopted by Raman spectrometer equipped with large focal length microscope objective lens directly. This way facilitates the operator and decreases the opportunity of making contact with blood. Secondly, the CNN models with convolutional layers, pooling layers and full connectional layers were constructed and trained with a total of 2177 Raman spectra of human and 19 animals. Among the animals there are domestic fowl, livestock, experimental animals and wildlife animals. This choice ensures the application scope, robustness and specificity of the CNN model. Thirdly, blood components change slightly over time, which affects the quality of trained model. The Raman spectra measurements were then taken at time point of 8, 24, 48, 72 h, this was a second method to ensure the robustness of the model. The CNN model was then evaluated and the structure and parameters were optimized for accuracy, sensitivity and specificity. The CNN model trained with enough data showed higher accuracy.

2. Materials and methods

2.1 Sample preparation

Human blood and 19 kinds of animal blood were collected. The animal species include: chicken, duck, geese, pigeon, Bama Xiang pig, dog(beagles), Oriental Short-tailed cat, New Zealand rabbit, SD rat, Kunming mice, monkey (rhesus & cynomolgus), sika deer, fallow deer, cattle, carp, Argali sheep, Asian swamp eel, alpaca and alpaca (Suri). Human blood was provided by Dongzhu hospital, the chickens, ducks, goose and pigeons were provided by the poultry market in Dongzhu Town, Suzhou city. Bama Xiang pig, beagles, Oriental Short-tailed cat, New Zealand rabbit, SD rat and Kunming mice were provided by experimental breeding center of modern agriculture, Shanghai Jiao Tong University. Monkey blood was provided by Suzhou Xishan Zhongke Laboratory Animal Co., Ltd. The blood samples of fallow deer, sika deer, argali, alpaca and alpaca (Suri) were provided by Suzhou Zoo. The blood samples of cattle, carp and Asian swamp eel were provided by Guangzhou Hongquan Biological Co., Ltd. All the above sources were licensed and meet the safety and quarantine standards.

All the samples were fresh whole blood and contained in 2 mL EDTA-K2 glass blood collection tubes uniformly with labels. There are 2-20 tubes of blood collected for each kind of animal. The blood samples need to be fresh and prevented from deterioration, which would have an impact on the Raman spectra. Therefore, the blood samples were cryopreserved immediately after collection. The temperature was kept as 4 degrees Celsius.

2.2 Equipment and instrument

A Renishaw inVia confocal Raman spectrometer was equipped with a long focal length Leica microscope (50× objective, with a numerical aperture of 0.35, focal length ∼7.5 mm), and a WiRE 4.3 software. The 532 nm laser has a maximum power of ∼50 mW. The instrument was calibrated with a silicon basis (the central peak at 520.5 cm⁻¹) before each group of experiments. The spectrum range was set to be from 100 to 2000cm⁻¹, the exposure time was 10 s, and the laser power at the sample was about 4.8 mW. The experiments were conducted in a clean booth (the room temperature was 20∼25 degrees Celsius and the atmospheric humidity was less than 45%). The data analysis softwares were Matlab R2018b and Pycharm2017.

The experimental device is shown in Fig. 1. The glass test tube containing fresh blood was put on the fixture. The focal point was on the surface between blood and the glass tube, in order to avoid the interference from fluorescence to the maximum extent. The remaining fluorescence interference induced by glass tube could be removed through data preprocessing of baseline correction. In this way, the practitioner could take the measurement in a direct and non-contact way.

Fig. 1. Diagram of the experimental device.

Download Full Size | PDF

2.3 Experimental procedure

Blood components may change slightly over time, which could affect the quality of experimental data. Therefore, different measurement time points of 8, 24, 48, 72 h were set in our experiment. Each group of samples measured 5 sets of spectra at each time point, that is, a total of 20 sets of spectra were obtained for each group. This method could increase the non-uniformity of Raman spectra and improve the robustness of the model. For the data acquisition of each sample, remove the tube from the fixture after the test, roll the tube or shake it to remix the ingredients, and then proceed to the next measurement to ensure the Raman spectrum of blood in each measurement was relatively independent.

2.4 Data preprocessing

Raman spectrum data preprocessing could remove the interference of invalid information in Raman spectrum, including background fluorescence, instrument noise, and environmental noise and so on. For example, in the measurement of the spectrum, sharp peaks with narrow bands and abnormally high peaks may occur by chance. Such peaks may be identified as cosmic ray peaks and need to be eliminated. In this study, the Raman signal of blood would be interfered by the background fluorescence generated from vacuum glass tube. The background fluorescence spectra could be deducted by the 3^rd order polynomial fitting algorithm, which was used to remove baseline. And the function of WiRE software could realize it. Consequently, all the spectral data corresponding to the category labels were integrated into the Matlab software to construct a matrix. Then all data was 0-1 normalized, which allows sample data of different spectral intensities to reach a uniform metric. The effects of the preprocessing are shown in Fig. 2.

Fig. 2. The effects of baseline correction and normalization. (a) Raman spectrum before baseline correction (b) Raman spectrum after baseline correction, (c) Raman spectrum after baseline correction and normalization

Download Full Size | PDF

2.5 Training and testing

It is a systematic process to establish a training model of convolution neural network and identify blood samples of unknown species by using the model. Firstly, the Raman spectral data of the blood were pre-processed uniformly to form the characteristic data with labels, and then the whole training data set was sent to the initialization network for iterative training. During each round of training, 20% of the data in the training set was selected as the validation set to monitor the training effect, this process was cross-validation. After preliminary obtaining the network model, the test set was pretreated by the same method, and the whole test set was fed into the model. The network model could judge the Raman spectrum data of unknown blood species and give the recognition result. The flow chart is shown in Fig. 3.

Fig. 3. Training and test flow charts

Download Full Size | PDF

In this study, there were Raman spectral data of 20 kinds of blood as training data set. In order to know the external validation accuracy of the training model, we use the known blood Raman spectrum data as test set to detect the external validation accuracy of the model. A total of 3138 sets of Raman spectral data were collected in the experiment. Among them, 2177 Raman spectral data were divided into training set and validation set, and 961 sets of data were test sets. The number of blood samples of each kind and the number of spectra of training and testing sets are shown in Table 1.

Table 1. Sample size of all species and partitions of the all spectral data

View Table | View all tables in this article

2.6 Analytical method

The deep learning method was realized by constructing a network model. Due to the data of Raman spectrum was one-dimensional, in this study, a one-dimensional convolution neural network (1D-CNN) model was proposed to identify blood species, which could realize multi-class identification. Figure 4 is a hierarchical construction of convolution neural network applied to blood classification. It consists of input layer, hidden layer, full connection layer and output layer. The hidden layer consists of two convolution layers and two pooling layers. The preprocessed data was loaded into the input layer of the network, there were 1030 features in a spectrum. The feature extraction was performed after two convolution operation and two pooling operation, and the data is input to the fully connected layer after being flattened. Considering the problem of over-fitting, we introduced the dropout method in the network, which randomly discarded some neurons in the full connectivity layer. After the two full connection layers, the output value is obtained through the softmax function, and the probability distribution of various results could be given by the softmax function. The formula is as follows:

(1)$$Soft\max ({x_i}) = \frac{{{e^{{x_j}}}}}{{\sum\limits_{j = 1}^\textrm{j} {{e^{{x_j}}}} }}(i = 1, \ldots, j)$$

Fig. 4. Architecture diagram of blood recognition based on one-dimensional convolutional neural network model.

Download Full Size | PDF

In the training process of network, the backward propagation algorithm was used to perform training of convolution kernel, weights between neurons, and bias value. Backward propagation (BP) algorithm is an iterative algorithm. In each iteration, generalized perceptual machine learning rules were used to update and estimate parameters. Under continuous iterative training, the network parameters would change in a direction that loss function value decreases gradually. The value of loss function is the error between the predict value and the real value. When the training is finished, the value of loss function reaches a very low level, and the network training is regarded as convergence, which means that the predicted value is close to the real value. At this moment, the newly generated network weight structure forms a network model. However, there may be a problem of local convergence, which would lead to lower accuracy of actual prediction, thus we should judge the convergence of model further by external verification. External validation is used to test the prediction effect of the model. That is to predict the Raman spectra of blood outside the training set, which can test the generalization performance of the model.

3. Result and discusion

3.1 Time effect

It is important to determine whether Raman spectra of the blood would change gradually with time. For liquid blood stored at low temperature, the Raman spectra of blood at different times (up to 100 days) were measured. The results of dog blood are shown in Fig. 5. The position and intensity of Raman signal have hardly changed until 100 days, and no significant changes were observed in other animals (rabbit, mouse, cat, etc.). Nevertheless, we cannot directly judge that the Raman spectra of blood do not change over time. This merely illustrate that primitive blood stored at low temperatures can be preservation for a long time. It is also possible that the blood changes slightly but could not be observed, which requires a longer experimental time to verify the problem. For this study, it could be determined that Raman spectra of blood collected in 72 hours can be used for research.

Fig. 5. Raman spectra of cryopreserved liquid blood at different times.

Download Full Size | PDF

3.2 Biochemical analysis

Fresh blood contains the most primitive information of blood components. In previous customs quarantine security incidents, there are much of illegal bloods were passed through port including human blood. Faced with this problem, the identification and detection of blood should be targeted to ensure the legitimate blood transport. As we know, the composition of blood components is extremely complex. Blood components mainly include plasma, red blood cells, white blood cells and platelets [31]. These components have important functions in the blood system, among which there are many factors affecting the composition. In previous studies, blood compositions are different among different species, even among different subgroups of the same species. There are differences in blood composition between donors of different ages and between donors of different genders, even between donors of different races [13–15]. And patients with certain diseases (such as diabetes, malaria, etc.) also have different blood components from normal people [31]. The above differences can be characterized by Raman spectroscopy combined with stoichiometric analysis. The average Raman spectrum of the blood shows the comprehensive information of the blood components of multiple samples of a certain species. In this paper, in order to observe the position and intensity of Raman characteristic peaks of different blood species, we averaged hundreds of data of each blood species and compared the average Raman spectra. Figure 6 is the average Raman spectrum of 20 kinds of blood species.

Fig. 6. Average Raman spectrum comparison of 20 kinds of blood species. The locations of the main characteristic peaks are shown. According to these locations, the corresponding vibration modes can be known and the blood components can be judged, thus realizing biochemical analysis.

Download Full Size | PDF

The band range of Raman spectra given in Fig. 6 is 300∼1800cm⁻¹. It can be observed that the characteristic peaks are concentrated in 650∼1650 cm⁻¹. There is difference in the position and intensity of the characteristic peaks between any two different kinds of blood in the figure. As shown in the figure, the main characteristic peaks are 672, 752, 974, 1000, 1083, 1169, 1210, 1300, 1335, 1355, 1373, 1393, 1423, 1544, 1582, 1603, 1636 cm⁻¹. There are similarities and differences of Raman spectra between different blood species can be observed roughly. Among them, it is obvious that the Raman spectrum of the chicken blood has two characteristic peaks at the 1155 and 1521 cm⁻¹, which are significantly different from the other blood. This indicated that chicken blood may have components distinctly different from other kinds of blood. Compared with mature mammalian blood, mature red blood cells in chicken blood contain cellular nucleus. The vibration modes corresponding to the positions of characteristic peaks observed in the figure include A_1g (ν₄, ν₅, ν₇), B_1g (ν₁₀, ν₁₁, ν₁₅, ν₁₈, ν₂₁), A_2g(ν₂₀), B_2g (ν₂₈, ν₃₀), E_u (ν₃₇, ν₄₁), as shown in Table 2 [14,32,33]. There vibration modes were derived from some components of the blood, such as hemoglobin, albumin, globulin, amino acid, glucose, cholesterol, triglyceride, etc. According to verifiable literature reports, hemoglobin accounts for more than 95% of the dry weight of the red blood cells, and most of the Raman spectra obtained from whole blood were contributed by hemoglobin [31,33,34]. In addition, some studies have shown that the Raman spectra obtained by oxygenated hemoglobin (oxyHb) and deoxygenated hemoglobin (deoxyHb) under the same experimental conditions showed that some bands have shifted [33], which needs to be considered in the analysis of Raman spectra for blood. Similarly, the location and intensity of the characteristic peaks obtained by choosing different excitation light also have different results obviously [35].

Table 2. Band position, assignment and local coordinate for Raman spectra of whole blood

View Table | View all tables in this article

The characteristic peaks position of a certain blood is like the others, while there are visible differences in spectral intensity. However, only these minor differences are still difficult to directly determine the blood attributions. Therefore, it is necessary to use statistical analysis methods to help identify the blood species by extracting features and establishing the statistical models.

3.3 Network optimization and design

The convolutional neural network model obtained by training is not optimal by default. The recognition ability of the model can be improved by modifying the parameters and network structure. The contents worthy of adjusting include network structure and parameters: the size of training set batches, training rounds, convolution kernel size, learning rate of optimizer, number of neurons in the full connection layer, and the numbers of hidden layers, optimizer, activation function, loss function, etc. The adjusting records of each hyperparameter and corresponding accuracies are shown in Fig. 7.

Fig. 7. Record of Adjusting Parameters

Download Full Size | PDF

By modifying the hyperparameters, the recognition effect would change obviously. There were two kinds of hyperparameters in this study. One is numerical hyperparameter, the method of optimization was to control other parameters as default values firstly, and a series of gradient values for the parameters were tuned, such as parameter “learning rate” could be set as listed in the Fig. 7. The corresponding blind test accuracy under each hyperparameter can be obtained by training the model in turn, the corresponding value of hyperparameter with the highest accuracy is selected as the optimal value of the parameter. The second is the non-numerical hyperparameter, such as the selection of the activation function. Adjust other hyperparameters in the same way. We also adjusted the parameters by increasing the structure of feature layer, adding Dropout mechanism, selecting the most appropriate loss function and so on. By synthesizing all the above optimal single hyperparameter, this model will be globally optimal. This process can be adjusted with a small amount of data firstly, because it can save much time. Under the optimal conditions, the model has excellent generalization performance. The optimal parameters and default parameters in our study are listed in Table 3. Using default network parameters, the prediction accuracy of network model was 89.94%. After designing the network structure and setting more appropriate hyperparameters, the network model with better performance was finally obtained under the optimal network structure, and the prediction accuracy was improved to be 97.33%.

Table 3. Default and Optimal Parameters for CNN models of Blood Recognition

View Table | View all tables in this article

3.4 Classification results

There are similarities and differences among different blood components, which are reflected in the position and intensity of the characteristic peaks in Raman spectra. For different kinds of animals, their blood components may be similar, and the characteristic peaks of Raman spectra may be similar. This will lead to errors in predicting unknown blood samples, caused confusion in blood identification. The confusion matrix is established to determine whether there is recognition confusion among different species. Figure 8a shows the recognition statistics of all kinds of samples. There are a few confusions in the identification of blood species. Figure 8b is the normalized confusion matrix of the classification result, which shows the proportion of samples.

Fig. 8. Multiple classification of confusion matrix and normalized confusion matrix. (a) The abscissa corresponds to the predicted blood species and the ordinate corresponds to the actual blood species. The number of samples appearing on the diagonal line, that is, the predicted values were consistent with the actual values, which can be judged as the number of samples correctly identified, and the data appearing outside the diagonal line, that is the number of samples confused by recognition. The legend on the right indicates that darker the color is, the more samples occupied. (b) The coordinate interpretation as figure (a). Data were normalized. The proportions that appear on the diagonal line were predicted correctly, and those outside the diagonal line were confused proportions.

Download Full Size | PDF

As a result, confusion of identification between different bloods occurred. 10% (1/10) of cat blood samples were mistaken for dog blood. About 4% (2/45) of cat blood samples were mistaken for dog blood. About 14% (8/57) of dog blood samples were mistaken for cat blood. About 2% (2/101) of monkey blood samples were mistaken for cat blood. Separately, about 2% (1/49) of pig blood samples were misidentified as dog blood and about 4% (2/49) were misidentified as monkey blood. 4% (1/25) of rat blood sample were mistaken for monkey blood. The categories listed above are confusion of identification between different animal bloods. About the recognition of human blood, there are two situations: one is that human blood is mistaken for animal blood, in this study, about 1% (1/195) and 3% (6/195) of human blood samples were misidentified as argali blood and monkey blood separately. The other situation is that animal blood is mistaken for human blood, about 2% (1/49) of rabbit blood samples were misidentified as human blood in this result. Other than these, a large number of samples were still recognition correctly with high accuracy.

All those confusions may be caused by the following reasons: firstly, the blood components of the confused species are close, so the Raman spectra between them are similar; secondly, the Raman spectra data of individual samples are abnormal in the measurement process; thirdly, there is a deviation in fitting the data when training the model, because the random mechanism of training has influence on the model. Therefore, it is particularly important to ensure a unified data acquisition standard, data preprocessing, and adjusting the hyper-parameters of the model, which may improve the recognition ability.

3.5 Model assessment

The evaluation results of multi-classification blood recognition are shown in Fig. 8. By calculating macro-precision rate, we can know the precision of blood recognition. By dividing the multivariate classification into several binary classifications, it is recorded as (P₁, R₁), (P₂, R₂), ······, (P_n, R_n), we can calculate the average precision. The formula is defined as:

(2)$$Macro - P = \frac{1}{n}\sum\limits_{i = 1}^\textrm{n} {{P_i}}$$

The macro-recall rate is calculated to reflect the ability of the classifier to find all positive blood samples. It is defined as follows:

(3)$$Macro - R = \frac{1}{n}\sum\limits_{i = 1}^n {{R_i}}$$

Macro- F_β score represents the results of weighted average calculation of precision and recall rate. It can measure the relative importance between precision and recall rate. When β<1, it focuses on the impact of precision, when β >1, it focuses on the impact of recall rate, and when β = 1, it is the standard situation. Definitions are as follows:

(4)$$Macro - {F_\beta } = \frac{{(1 + {\beta ^2}) \times (Macro - P) \times (Macro - R)}}{{({\beta ^2} \times Macro - P) + (Macro - R)}}$$

To evaluate the consistency checking method of multi-classification model, kappa coefficient can be used to characterize the precision of model. The consistency degree of different κ-value is shown in Table 4. If there is a 3×3 confusion matrix as shown in Table 5, the kappa coefficient is defined as follows [35]:

(5)$$\kappa = \frac{{{P_o} - {P_e}}}{{1 - {P_e}}},({P_o} = \frac{{\sum {{a_{ii}}} }}{N},{P_e} = \frac{{\sum {{A_i}{B_i}} }}{{{N^2}}})(i = 1,2,3)$$

Table 4. Kappa coefficient consistency scale

View Table | View all tables in this article

Table 5. 3×3 contingency tables

View Table | View all tables in this article

The performance measurement of the blood species identification was obtained according to the above methods. The results are shown in Table 6. For the evaluation of the model, we analyze the performance of the model from the following aspects: precision rate, recall rate, F_β-score, consistency test, Receiver Operating Characteristic (ROC) and Area Under ROC Curve (AUC).

Table 6. Performance measurement of the classification model

View Table | View all tables in this article

For the identification of blood species, it is important to ensure the precision of the classifier, because it has great practical significance in various situations such as judicial investigation, customs security and quarantine, wildlife protection investigation. For example, in customs safety and quarantine, human blood is regarded as positive and non-human as negative, so the precision rate is the proportion of how many of the human blood samples were correctly identified as human blood, and the recall rate is the proportion of how many human blood samples were correctly identified of the all human sample. As we know, in customs security inspection, the precision rate is relatively important. The recall rate is relatively important when it occurs in wildlife protection investigation, and other application scenes can choose different focus according to needs. F_β-score can be compared comprehensively, and different emphases can be selected by different β value. In this paper, the corresponding F_β-score values of β=0.5, 1 and 2 were given respectively. Then, the kappa coefficient is used to characterize the classification consistency of the whole model and check the recognition precision of various blood. By calculating the κ value, corresponding to the degree of consistency level in Table 4, it could be evaluated that the accuracy of this model is of almost prefect consistency. The ROC and AUC of 20 classifiers are drawn in the Fig. 9, and the AUC value was reserving only three decimal digits after the decimal point. The ROC can be used to evaluate the generalization performance of the classifier [36]. The abscissa is False Positive Rate (FPR), and the ordinate is True Positive Rate (TPR). In this study, the ROC of all classifiers is plotted in the same coordinate system to identify the advantages and disadvantages intuitively. The ROC near the upper left corner represents the most accurate work of the subjects. The AUC can compare the performance of different classifiers, that is, the larger the AUC (The range of values is [0.5, 1]) is, the better the performance of the classifier is. When the area equals 1, it is an ideal state. There two performance metrics reflect intuitive classification accuracy and the performance comparison of different classifiers for the multi-classification recognition of blood species.

Fig. 9. ROC curve and AUC of multiple blood classification.

Download Full Size | PDF

4. Conclusions

In this paper, Raman spectrum technology and deep-learning analysis method are used to realize multi-identification blood species. It was mainly based on the actual situation of customs security inspection. Moreover, the combination of Raman spectroscopy technology and deep learning method can also be applied in other fields. Considering the particularity of Raman spectra, we proposed to establish a 1D-CNN model to recognition different blood species. With a total of 3138 spectra from 20 blood species, a recognition accuracy as high as 97.33% was achieved by data preprocessing, model training, optimization and evaluation. In addition, time effect and biochemical analyses of the Raman spectra of bloods were studied. Compared with the reported studies, this study can not only recognize human or non-human blood, but also identify the species of blood, which has more intelligent performance. Finally, future work needs to supplement more animal blood species and build a comprehensive classification model.

Funding

National Key R&D Program of China (2018YFF01011104); National High-tech Research and Development Program (2015AA021105); Natural Science Foundation of Jiangsu Province (BK20180220); Six Talent Climax Foundation of Jiangsu (SWYY-285).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. S. C. Renner, D. Neumann, M. Burkart, U. Feit, P. Giere, A. Gröger, A. Paulsch, C. Paulsch, M. Sterz, and K. Vohland, “Import and export of biological samples from tropical countries–considerations and guidelines for research teams,” Org. Divers. Evol. 12(1), 81–98 (2012). [CrossRef]

2. H. Inouel, F. Takabe, O. Takenaka, M. Iwasa, and Y. Maeno, “Species identification of blood and bloodstains by high-performance liquid chromatography,” Int. J. Legal Med. 104(1), 9–12 (1990). [CrossRef]

3. J. Andrasko, “The estimation of age of bloodstains by HPLC analysis,” J. Forensic Sci. 42(4), 14171J (1997). [CrossRef]

4. E. O. Espinoza, N. C. Lindley, K. M. Gordon, J. A. Ekhoff, and M. A. Kirms, “Electrospray ionization mass spectrometric analysis of blood for differentiation of species,” Anal. Biochem. 268(2), 252–261 (1999). [CrossRef]

5. H. Yang, B. Zhou, M. Prinz, D. Siegel, and H. Deng, “Body fluid identification by mass spectrometry,” Int. J. Legal Med. 127(6), 1065–1077 (2013). [CrossRef]

6. E. Sauer, A. K. Reinke, and C. Courts, “Differentiation of five body fluids from forensic samples by expression analysis of four micrornas using quantitative pcr,” Forensic Sci. Int.: Genet. 22, 89–99 (2016). [CrossRef]

7. W. K. De, L. Lepot, F. Gason, and B. Gilbert, “In search of blood–detection of minute particles using spectroscopic methods,” Forensic Sci. Int. 180(1), 37–42 (2008). [CrossRef]

8. V. Kelly, “Blood species identification for forensic purposes using raman spectroscopy combined with advanced statistical analysis,” Anal. Chem. 81(18), 7773–7777 (2009). [CrossRef]

9. E. Mistek and I. K. Lednev, “Identification of species’ blood by attenuated total reflection (ATR) Fourier transform infrared (FT-Ir) spectroscopy,” Anal. Bioanal. Chem. 407(24), 7435–7442 (2015). [CrossRef]

10. G. Mclaughlin, K. C. Doty, and I. K. Lednev, “Discrimination of human and animal blood traces via raman spectroscopy,” Forensic Sci. Int. 238, 91–95 (2014). [CrossRef]

11. G. Mclaughlin, K. C. Doty, I. K. Lednev, and A. Chem, “Raman spectroscopy of blood for species identification,” Anal. Chem. 86(23), 11628–11633 (2014). [CrossRef]

12. C. K. Muro, K. C. Doty, L. D. S. Fernandes, and I. K. Lednev, “Forensic body fluid identification and differentiation by raman spectroscopy,” Forensic Chem. 1, 31–38 (2016). [CrossRef]

13. E. Mistek, L. Halámková, K. C. Doty, C. K. Muro, and I. K. Lednev, “Race differentiation by raman spectroscopy of a bloodstain for forensic purposes,” Anal. Chem. 88(15), 7453–7456 (2016). [CrossRef]

14. A. Sikirzhytskaya, V. Sikirzhytski, and I. K. Lednev, “Determining gender by Raman spectroscopy of a bloodstain,” Anal. Chem. 89(3), 1486–1492 (2017). [CrossRef]

15. K. C. Doty and I. K. Lednev, “Differentiating donor age groups based on raman spectroscopy of bloodstains for forensic purposes,” ACS Cent. Sci. 4(7), 862–867 (2018). [CrossRef]

16. J. Fujihara, Y. Fujita, T. Yamamoto, N. Nishimoto, K. Kimura-Kataoka, S. Kurata, Y. Takinami, T. Yasuda, and H. Takeshita, “Blood identification and discrimination between human and nonhuman blood using portable raman spectroscopy,” Int. J. Legal Med. 131(2), 319–322 (2017). [CrossRef]

17. K. C. Doty and I. K. Lednev, “Differentiation of human blood from animal blood using raman spectroscopy: A survey of forensically relevant species,” Forensic Sci. Int. 282, 204–210 (2018). [CrossRef]

18. H. Bian and J. Gao, “Error analysis of the spectral shift for partial least squares models in raman spectroscopy,” Opt. Express 26(7), 8016–8027 (2018). [CrossRef]

19. H. Bian, P. Wang, N. Wang, Y. Tian, P. Bai, H. Jiang, and J. Gao, “Dual-model analysis for improving the discrimination performance of human and nonhuman blood based on raman spectroscopy,” Biomed. Opt. Express 9(8), 3512–3522 (2018). [CrossRef]

20. H. Bian, Y. Zhang, W. Gao, and J. Gao, “Fourier based partial least squares algorithm: New insight into influence of spectral shift in “frequency domain”,” Opt. Express 27(3), 2926–2936 (2019). [CrossRef]

21. P. Bai, J. Wang, H. Yin, Y. Tian, W. Yao, and G. Jing, “Discrimination of human and nonhuman blood by raman spectroscopy and partial least squares discriminant analysis,” Anal. Lett. 50(2), 379–388 (2017). [CrossRef]

22. G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Comput. 18(7), 1527–1554 (2006). [CrossRef]

23. C. L. Liu, W. H. Hsaio, and Y. C. Tu, “Time series classification with multivariate convolutional neural network,” IEEE Trans. Ind. Electron. 66(6), 4788–4797 (2019). [CrossRef]

24. S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: A convolutional neural-network approach,” IEEE Trans. Neural Netw. Learn. Syst. 8(1), 98–113 (1997). [CrossRef]

25. K. Tsiouris, V. C. Pezoulas, M. Zervakis, S. Konitsiotis, D. D. Koutsouris, and D. I. Fotiadis, “A long short-term memory deep learning network for the prediction of epileptic seizures using eeg signals,” Comput. Biol. Med. 99, 24–37 (2018). [CrossRef]

26. T. Kooi, G. Litjens, G. B. Van, A. Gubern-Mérida, C. I. Sánchez, R. Mann, H. A. Den, and N. Karssemeijer, “Large scale deep learning for computer aided detection of mammographic lesions,” Med. Image Anal. 35, 303–312 (2017). [CrossRef]

27. A. J. Bekker, M. Shalhon, H. Greenspan, and J. Goldberger, “Multi-view probabilistic classification of breast microcalcifications,” IEEE T. Med. Imaging 35(2), 645–653 (2016). [CrossRef]

28. A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J. V. Riel, M. M. W. Wille, M. Naqibullah, C. I. Sánchez, and B. V. Ginneken, “Pulmonary nodule detection in CT images: False positive reduction using multi-view convolutional networks,” IEEE T. Med. Imaging 35(5), 1160–1169 (2016). [CrossRef]

29. Q. Dou, H. Chen, L. Yu, J. Qin, and P. A. Heng, “Multi-level contextual 3d cnns for false positive reduction in pulmonary nodule detection,” IEEE Trans. Biomed. Eng. 64(7), 1558–1567 (2017). [CrossRef]

30. L. Zhang, Y. Wu, B. Zheng, L. Su, Y. Chen, S. Ma, Q. Hu, X. Zou, L. Yao, and Y. Yang, “Rapid histology of laryngeal squamous cell carcinoma with deep-learning based stimulated raman scattering microscopy,” Theranostics 9(9), 2541–2554 (2019). [CrossRef]

31. C. G. Atkins, K. Buckley, M. W. Blades, and R. F. Turner, “Raman spectroscopy of blood and blood components,” Appl. Spectrosc. 71(5), 767–793 (2017). [CrossRef]

32. B. R. Wood, P. Caspers, G. J. Puppels, S. Pandiancherri, and D. Mcnaughton, “Resonance raman spectroscopy of red blood cells using near-infrared laser excitation,” Anal. Bioanal. Chem. 387(5), 1691–1703 (2007). [CrossRef]

33. P. Lemler, W. R. Premasiri, A. Delmonaco, and L. D. Ziegler, “NIR Raman spectra of whole human blood: Effects of laser-induced and in vitro hemoglobin denaturation,” Anal. Bioanal. Chem. 406(1), 193–200 (2014). [CrossRef]

34. H. Sato, H. Chiba, H. Tashiro, and Y. Ozaki, “Excitation wavelength-dependent changes in Raman spectra of whole blood and hemoglobin: Comparison of the spectra with 514.5-, 720-, and 1064-nm excitation,” J. Biomed. Opt. 6(3), 366–370 (2001). [CrossRef]

35. J. Cohen, “A coefficient of agreement for nominal scales,” Educ. Psychol. Meas. 20(1), 37–46 (1960). [CrossRef]

36. T. Fawcett, “An introduction to roc analysis,” Pattern Recognit. Lett. 27(8), 861–874 (2006). [CrossRef]

Parameters	Default model	Value or option
Hide-layer structure	Conv1 + Pool1 + Conv2	Conv1 + Pool1 + Conv2 + Pool2
Losses	Categorical hinge	Categorical hinge
Optimizer	SGD	SGD
Activation	ReLu	Tanh
Batch Size	10	60
Epoch times	20	45
Filter1	256×1	2048×1
Filter2	256×1	512×1
Dense1×Dense2	256×128	64×128
Learning rate	0.01	0.01
Accuracy	89.94%	97.33%

Predict	0	1	2
Label				Total	Ratio
0	a₁₁	a₁₂	a₁₃	A₁	α₁
1	a₂₁	a₂₂	a₂₃	A₂	α₂
2	a₃₁	a₃₂	a₃₃	A₃	α₃
Total	B₁	B₂	B₃	N
Ratio	β₁	β₂	β₃

Evaluation Index	Value
Precision rate	97.33%
Recall rate	97.69%
F_β score = 2	97.54%
F_β score = 1	97.40%
F_β score = 0.5	97.33%
Kappa coefficient	0.9715

Parameters	Default model	Value or option
Hide-layer structure	Conv1 + Pool1 + Conv2	Conv1 + Pool1 + Conv2 + Pool2
Losses	Categorical hinge	Categorical hinge
Optimizer	SGD	SGD
Activation	ReLu	Tanh
Batch Size	10	60
Epoch times	20	45
Filter1	256×1	2048×1
Filter2	256×1	512×1
Dense1×Dense2	256×128	64×128
Learning rate	0.01	0.01
Accuracy	89.94%	97.33%

Predict	0	1	2
Label				Total	Ratio
0	a₁₁	a₁₂	a₁₃	A₁	α₁
1	a₂₁	a₂₂	a₂₃	A₂	α₂
2	a₃₁	a₃₂	a₃₃	A₃	α₃
Total	B₁	B₂	B₃	N
Ratio	β₁	β₂	β₃

Blood species identification based on deep learning analysis of Raman spectra

Abstract

1. Introduction

2. Materials and methods

2.1 Sample preparation

2.2 Equipment and instrument

2.3 Experimental procedure

2.4 Data preprocessing

2.5 Training and testing

2.6 Analytical method

3. Result and discusion

3.1 Time effect

3.2 Biochemical analysis

3.3 Network optimization and design

3.4 Classification results

3.5 Model assessment

4. Conclusions

Funding

Disclosures

References

Cited By

Figures (9)

Tables (6)

Equations (5)

Biomedical Optics Express

Species	Number of samples	Training	Testing	Total
chicken	11	157	61	218
fallow deer	2	19	6	25
carp	8	105	45	150
monkey	20	300	101	401
alpaca	2	30	10	40
dog	10	133	57	190
rabbit	9	127	48	175
pigeon	10	143	56	199
Asian swamp eel	5	70	30	100
cat	10	155	45	200
human	20	110	195	305
argali	6	90	30	120
mouse	4	61	19	80
duck	10	143	58	201
geese	11	150	65	215
cattle	8	105	45	150
rat	5	76	25	101
alpaca suri	3	39	11	50
pig	10	144	49	193
sika deer	2	20	5	25
Total	166	2177	961	3138

Position (cm⁻¹)	Assignment	Local coordinate
672	ν₇	δ(pyr deform)_sym
752	ν₁₅	ν(pyr breathing)
974		γ(C_aH=)
1000	ν₄₇	(C_bC₁)_asym
1083	δ(=C_bH₂)₄	δ(=C_bH₂)₄
1169	ν₃₀	ν(pyr half-ring)_asym
1210	ν₅,ν₁₈	δ(C_mH)
1300	ν₂₁	δ(C_mH)
1335	ν₄₁	ν(pyr half-ring)_sym
1355		ν(pyr half-ring)_sym
1373	ν₄	ν(pyr half-ring)_sym
1393	ν₂₀	ν(pyr quater-ring)
1423	ν₂₈	ν(C_αC_m)_sym
1544	ν₁₁	ν (C_βC_β)
1582	ν₃₇	ν (C_βC_m)_asym
1603	ν (C = C)_vinyl	ν (C_a=C_b)
1636	ν₁₀	ν (C_βC_m)_asym