Machine learning methods for identification and classification of
events in <i>ϕ</i>-OTDR systems: a review

Deus F. Kandamali; Deus F. Kandamali; Deus F. Kandamali; Xiaomin Cao; Xiaomin Cao; Manling Tian; Manling Tian; Zhiyan Jin; Zhiyan Jin; Hui Dong; Kuanglu Yu; Kuanglu Yu

doi:10.1364/AO.444811

1. INTRODUCTION

Phase sensitive optical time-domain reflectometer ($\varphi$-OTDR) [1] is a distributed fiber optic sensing technique based on detection of Rayleigh backscattered signals (RBSs), which some researchers would refer to as distributed acoustic sensing (DAS), or as distributed vibration sensing (DVS) [1] according to the application. Here and after in this paper, we will use $\varphi$-OTDR to refer to this sensor. Capable of monitoring acoustic signals over a long distance, the fiber optic acoustic system has been very suitable for identifying various external disturbances [2]. Distributed fiber optic sensors use optical fibers as their sensing unit capable of measuring hundreds of thousands of points simultaneously [3]. Over the years, there have been various sensing technologies for fiber optical acoustic sensing ranging from quasi-distributed optical fiber sensing technology to distributed optical fiber technology aspects in the likes of, among others, Fiber Bragg grating (FBG) [2,4–7], Michelson interferometry (MI) [4,6,8–11], Fabry–Perot interferometer (FPI) [12–14], Mach–Zehnder interferometry (MZI) [15–17], Sagnac interference (SI) [11,18,19], and $\varphi$-OTDR, each with their own merits ranging from implementation cost, complexity, and performance. However, recent years have witnessed the rise of $\varphi$-OTDR as the most extensively used sensing technology due to its capability in achieving distributed monitoring in a relatively effective way over the course of a long distance [3]. Also $\varphi$-OTDR systems have attracted more interest due to several reasons, including high sensitivity nature, high dynamic range, full distribution, and relatively easy processing scheme as compared to most optical fiber sensors [20]. Over the past few years, there have been tremendous increases in interest for researchers both in academics and industries to jump into $\varphi$-OTDR systems by applying various data processing methods to ensure efficient and effective event recognition and classification; those methods are reviewed and discussed further in this paper.

A. Significance of $\phi$-OTDR Systems

Recently, there have been tremendous advancements of $\varphi$-OTDR systems in a number of applications including perimeter security surveillance [21], seismic waves prediction [22], airports runaways monitoring for takeoff and landing aircrafts replacing the common radar systems [23], oil and gas pipelines safety integrity [24–31,31–43], and monitoring of underground tunnels, sub-marine power cables [44,45], engineering structures [46], railways [1,34,47–49], bridges [50], underwater seismic signals [14], and others. According to Timofeev [51], $\varphi$-OTDR systems are useful in areas with high electromagnetic interferences (EMIs) due to their material and inherent ability. They are stable and can operate in dynamic operating environments regardless of the weather conditions like rain, fog, wind, snow, and many others [51]. Long-distance monitoring capability is demonstrated in a study from a 2014 paper [28]; Peng et al. revealed an ultimately long high sensitivity $\varphi$-OTDR system with over 131.5 km, which can be used in military bases and national borders security due to its high sensitivity and relatively low nuisance alarm rates (NARs).

$\varphi$-OTDR systems are cost effective considering the monitoring length, and they can be integrated and used to detect crack in civil structures, therefore, suitable for healthy monitoring of engineering buildings [52]. They can also be used in detection of engine anomalies for disaster preventions, railway safety monitoring, chemical leakage monitoring for oil-gas pipelines [53], and intrusion detection for perimeter security [52], even in places with high EMI [54]. Along with so many other benefits, $\varphi$-OTDR systems have a simple deployment structure they also offer higher position accuracy as well as multipoint vibration detections as compared to similar systems [55,56].

B. Challenges Facing $\phi$-OTDR Systems

However, the $\varphi$-OTDR techniques have not always been performing smoothly due to dynamic nature of events and the noises in the environments they operate in; thus, more robust and sophisticated approaches must be deployed to emphasize accuracy of event detection [32]. There have been numerous challenges hindering the effectiveness and efficiency of the $\varphi$-OTDR event detections, and we need to employ better denoising techniques to reduce NAR and apply better classification methods in order to accurately identify events and separate them from other events.

According to Federov et al. in 2016, the key challenges facing $\varphi$-OTDR are signal interferences caused by excessive phase noise during photo detection, attenuation as signals travel over longer distances, and false positives (or false alarms) caused by incorrect event identification, of which they all may lead to inefficient allocation of resources or delay in processing time, which can lead to destruction of properties and/or fatal accidents [57]. According to Wu [58], $\varphi$-OTDR is constantly affected by dynamic environmental changes like rapid air movements, laser frequency drifts, transient acoustic reference, environmental noises, and so forth, which can result into high NARs. In 2019, Adeel et al. [59] argues that the frequency drift of the laser source and laser linewidth results in Rayleigh noise as well as non-coherent addition of RBS interference. These are some provisions of random noise relation between the input and fiber response responsible for certain noise levels in differential RBS signals [59]. In [52], the authors conceded that detection in coherent $\varphi$-OTDR systems is highly affected by the effect of the fading noise and that even some available techniques like signal averaging and differentiating may not be adequate when it comes to high frequency event detection. Shao et al. [60] argued that most $\varphi$-OTDR systems are affected by the problem of low signal to noise ratio (SNR) and improvements in SNR are vital in order to increase accuracy in identifying and locating external intrusions. According to Wu et al. [61], most $\varphi$-OTDR systems are affected by the problem of intrinsic weak backscattered signals. In order to mitigate this challenge, Wu et al. [61] proposed the fabrication of ultra-weak fiber Bragg grating (UWFBG) in single-mode fibers through Ti-doped silica outer cladding for use in $\varphi$-OTDR. Also Butov et al. [62] proposed a high Rayleigh scattering fiber (HRF) in order to increase further sensitivity of $\varphi$-OTDR by introducing a nitrogen-doped single-mode fiber with enhanced Rayleigh scattering properties; however, those techniques would shorten the measurement length.

Toward achieving an optimal pattern recognition method in $\varphi$-OTDR systems, the main hindering factor is the absence of a best feature extraction method due to dynamic nature of data in different environments [55].

Generally, there have been several attempts toward mitigating the aforementioned $\varphi$-OTDR challenges including hardware improvements like parallel computing [56], which are extremely costly, and other methods like canny edge detections [49], which are not based on artificial intelligence (AI), machine learning (ML), or deep learning (DL). According to many studies, data processing methods based on AI, ML, and DL are more advanced approaches as they do not modify but can be integrated into the current $\varphi$-OTDR data acquisition systems. They can provide better accuracy and lower NAR, provided with large sets of data from the field. If systems are trained smartly, they can be dynamic enough on the varying environmental conditions. So in this review paper, we will provide a deep survey of most possible works done regarding ML methods (DL included) used for classifying events in $\varphi$-OTDR systems to the best of our knowledge.

C. Brief Introduction to This Review

We have briefly introduced $\varphi$-OTDR, the significances of $\varphi$-OTDR systems, and the challenges facing $\varphi$-OTDR systems in Section 1. Section 2 of this work briefly presents the underlying working principle of $\varphi$-OTDR. The main objective of this review is to explore and evaluate ML and DL algorithms used for event classification in $\varphi$-OTDR systems in a wide range of $\varphi$-OTDR application domains as presented in Sections 5 and 6. Since the performance of an $\varphi$-OTDR system requires well-structured and denoised data signals to act as input feature vectors in order to increase efficiency and effectiveness of a classifier; therefore, we have briefly introduced various signal preprocessing methods used in $\varphi$-OTDR in Section 3. Evaluation metrics are briefly introduced in Section 4, while the discussion and summary are presented in Section 7. Finally, in Section 8, we provide a general conclusion and recommendations for possible future research directions.

In total, we covered over 100 papers, and many of them are very recent (2020 to date) research discoveries regarding event classification methods in $\varphi$-OTDR systems from different domain areas. In Section 7, we present pros and cons of each discussed ML/DL method used for event identification in $\varphi$-OTDR under different circumstances. A summary table (Table 1) is also organized, which shows event classification methods, the number of events, the length of the fiber optic cable, the signal processing methods used for feature extraction, and the performance results; the performance of the methods is evaluated based on average classification accuracy, $f$-measure, identification time (IDT), and NAR.

Table 1. Comparative Analysis of the Events Classification Methods in $\varphi$-OTDR

View Table | View all tables in this article

Classification Methods	No. of Events	Fiber Length	Application Field	Spatial Resolution	Preprocessing Method	Accuracy	Precision	Recall	f-score	NAR	IDT
1DCNN + RF [37]	5	34 (35) km	Pipeline monitoring	8 (10) m	WPD	98%	96.98%	95.39%	96.13%	–	–
1DCNN [26]	5	–	Oil pipeline safety monitoring	–	WPD	95.5%	94.97%	94.89%	95.63%	–	–
1DCNN [15]	5	40 km	Urban safety monitoring	5 m	No preprocessing	92.9%	92.76%	92.86%	92.70%	–	–
2DCNN [26]	5	–	Oil distance safety pipeline monitoring	–	WPD	89.12%	87.96%	85.76%	86.66%	–	–
2DCNN [15]	5	40 km	Urban safety monitoring	5 m	No preprocessing	94.90%	94.66%	94.90%	94.76%	–	–
CLDNN [33]	3	33 km	Oil pipeline monitoring	8 m	No preprocessing	97.2%	–	–	–	–	–
ATCN-BiLSTM [70]	3	–	–	–	No preprocessing	99.6%	–	–	–	0%	–
DPN [34]	7	–	Railway safety monitoring	10 m	–	97%	99.29%	99.28%	99.27%	–	–
SVM [71]	5	50 km	Perimeter security monitoring	–	FFT	92.62%	98.6%	91.2%	–	–	–
SVM [72]	3	–	Vehicle detection	–	PCA	88.9%	–	–	–	–	–
SVM [30]	5	37.5 (34) km	Pipeline safety monitoring	10 (8) m	MFCC	91.9%	–	–	–	–	–
SVM [73]	–	40 km	Real-time train tracking	10 m	PCA	98%	–	–	–	–	–
SVM [74]	4	40 km	Long perimeter monitoring	20 m	Spectral subtraction	93.3%	–	–	–	–	0.6 s
LSVM [54]	3	–	–	–	WD, VMD	79.5%	–	–	–	–	–
RVM [75]	3	20 km	Pipeline safety monitoring	–	WPT	97.8%		–	–	–	< 1 s
RVM [76]	3	10 km	Near-ground military target detection	20 m	Wavelet energy spectrum analysis	88.6%	88.6%	88.9%	88.7%	–	–
NC-SVM [97]	5	25.05 km	Long-distance perimeter monitoring	50 m	WPD	94.3%	–	–	–	5.62%	0.55 s
Multiclass-SVM [51]	7	5 (1.5) km	Seismic waves prediction	3–10 m	Spectral subtraction, FFT	98%	–	–	–	–	–
CNN-SVM [55]	4	40 km	–	20 m	Spectral subtraction, STFT	93.3%	–	–	–	–	–

Classification Methods	No. of Events	Fiber Length	Application Field	Spatial Resolution	Preprocessing Method	Accuracy	Precision	Recall	f-score	NAR	IDT
Gradient Boosting [51]	7	5 (1.5) km	Seismic waves prediction	3–10 m	Spectral subtraction, FFT	98.67%	–	–	–	–	–
XGBoost [51]	7	5 (1.5) km	Seismic waves prediction	3–10 m	Spectral subtraction, FFT	99%	–	–	–	–	–
XGBoost [77]	5	25.05 km	Perimeter monitoring	50 m	EMD energy analysis	95.90%	95.96%	95.95%	95.93%	4.1%	0.093 s
XGBoost [30]	5	37.5 (34) km	Pipeline safety monitoring	10 (8) m	–	93.7%	–	–	–	–	–
RF [78]	4	25.05 km	–	50 m	–	96.58%	–	–	–	–	–
RF [30]	5	37.5 (34) km	Pipeline safety monitoring	10 (8) m	–	92.8%	–	–	–	–	–
RF [79]	2	–	Perimeter monitoring	–	Filter method	98.67%	–	95.5%	–	–	–
F-ELM [80]	5	25.05 km	Airport safety surveillance	50 m	Fisher score	95%	–	–	–	4.67%	< 0.1%
GMM [39]	8	45 km	Pipeline safety monitoring	5 m	ST-FFT	68.11%	–	–	–	55.6%	–
GMM [57]	2	few km	Perimeter monitoring	10–50 m	MFCC	90%	–	–	–	–	–
GMM [38]	8	45 km	Pipeline safety monitoring	5 m	Contextual feature extraction	69.7%	–	–	–	31.2%	–
mCNN + HMM [81]	4	34 (18) km	Long-distance monitoring	20 m	–	98.1%	98.07%	98.07%	98.05%	–	–
HMM [30]	5	37.5 (34) km	Pipeline safety monitoring	10 (8) m	WPD	98.2%	–	–	–	–	–
GMM-HMM [35]	3	45 km	Pipeline safety monitoring	5 m	ST-FFT	91%	–	–	–	53.7%	–
DT [30]	5	37.5 (34) km	Pipeline safety monitoring	10 (8) m	–	89.2%	–	–	–	–	–
BN [30]	5	37.5 (34) km	Pipeline safety monitoring	10 (8) m	–	78.3%	–	–	–	–	–
LSTM [82]	5	50 km	Long-distance monitoring	20 m	Spectral subtraction	90.6%	–	–	–	–	0.87 s
ALSTM [82]	5	50 km	Long-distance monitoring	20 m	Spectral subtraction	94.3%	–	–	–	–	0.91 s
ConvLSTM [47]	3	40 km	High-speed railway	10 m	–	85.6%	–	69.3%	85.7%	8%	8.25 s
SSAE [24]	4	85 km	Long-distance pipeline safety monitoring	20 m	–	94.47% (100 Hz) 97.06% (500 Hz)	–	–	–	–	0.68 ms 1.73 ms

2. UNDERLYING $\phi$-OTDR WORKING PRINCIPLE

$\varphi$-OTDR systems are used to detect faults or intrusion events by analyzing changes in vibration signals of the fiber optic sensors then identifying the precise locations of these events through measuring the Rayleigh backscattered light from across the entire fiber optic spectrum [75]. Any disturbance by some perturbation events will lead to amplitude/phase changes of the backscattering light; hence, by measuring the changes of the intensity or phase, the distributed acoustic sensors based on $\varphi$-OTDR can provide important information about the position, frequency, event’s pattern, and so forth [3].

$\varphi$-OTDR systems were originally developed based on OTDR, which uses a broadband light source, while now $\varphi$-OTDR uses narrow linewidth laser (NLL, for long coherence length) as the light source [83,84]. Basically, there are two major detection schemes employed in $\varphi$-OTDR, namely direct and coherent detection. The direct detection scheme usually straightforwardly relies on the registration of local changes in the backscattered intensity over time [48], whereas in coherent detection the backscattered signal is mixed with a local oscillator [85] to enhance the scattered light’s SNR.

In $\varphi$-OTDR, the function generator (FG) generates pulse, and the coherent probe light from an NLL is sent into an acoustic optical modulator (AOM), which converts the continuous wave (CW) light into optical pulse signals, which are directed to an erbium-doped fiber amplifier (EDFA) in order to boost the input power. The amplified signal is sent to the sensing fiber [or fiber under test (FUT)] through the circulator. The Rayleigh backscattering light is then routed to a photoelectric detector (PD) through the same circulator, which is then recorded by the data acquisition card (DAQ) ready to be processed by a computer for further analysis [84,86,87]. The architecture for $\varphi$-OTDR with the direct detection scheme is shown in Fig. 1.

Fig. 1. $\Phi$-OTDR Architecture for direct detection scheme ([49], Fig. 2). (NLL, narrow linewidth laser; FG, functional generator; AOM, acoustic optical modulator; EDFA, erbium-doped fiber amplifier; PC, personal computer; DAQ, data acquisition card; PD, photoelectric detector.)

Download Full Size | PDF

Figure 2 shows sample architecture for an $\varphi$-OTDR with the coherent detection scheme. In this scheme, the light from an NLL source is split into a lower and an upper part through a coupler. The lower branch is then served as the local light to implement heterodyne detection while the upper is employed as probe light as in direct detection scheme [88]. The Rayleigh backscattered light coming from the fiber is then mixed with the local light through another coupler then sent to a BPD and finally received by the DAQ before being sent to a computer for further processing the backscattered light’s information [88].

Fig. 2. $\phi$-OTDR architecture for coherent detection scheme ([88], Fig. 5). (NLL, narrow linewidth laser; FG, functional generator; AOM, acoustic optical modulator; EDFA, erbium-doped fiber amplifier; PC, personal computer; DAQ, data acquisition card; BPD, balanced photoelectric detector.)

Download Full Size | PDF

3. SIGNAL PREPROCESSING TECHNIQUES IN $\phi$-OTDR

Signal preprocessing mainly includes denoising and feature extraction methods in $\varphi$-OTDR systems; we have mainly focused on the latter in this paper. Classical ML algorithms require raw input signals to be initially processed so that their features can be extracted properly and be fed as inputs into a classifier. However, for end-to-end DL networks, feature extraction is not a required task since DL networks can automatically learn by themselves. During feature extraction, important signal features are extracted before being fed as the inputs to a desired traditional ML classification algorithm. Thus, in $\varphi$-OTDR systems, when vibration signals are initially recorded, they are not always so meaningful and sparse as they usually contain some additional random noises and, hence, may produce unsuitable results. Lucky for DL algorithms, they have an ability to take on raw data, then intelligently learn the patterns through backward and forward propagations, and finally converge into an output(s). However, for legacy ML algorithms, we may not get desired results, plus it may take abundant time to be processed by the classifier especially when we are dealing with large sets of un-normalized data. Thus, we need to apply some good signal processing methods and techniques in order to extract better quality features ready for accurate event identification during the classification stage [28].

Principally, there exist a number of signal processing techniques also known as feature extraction methods or signal denoising methods. Among many others, these are some commonly and regularly used signal processing techniques including, but not limited to, fast Fourier transform (FFT), wavelet packet transform (WPT), discrete wavelet transform (DWT), continuous wavelet transform (CWT), wavelet decomposition (WD), and wavelet packet decomposition (WPD). Usually these methods exist either in time-domain, frequency-domain, or both time- and frequency-domain. FFT is perhaps the most known as it can transform signals from time-domain into frequency-domain and inverse FFT (IFFT) can be used to convert the frequency-domain signals back to time-domain signals after denoising or likewise [56]. DWT is a feature extraction method that uses the mother wavelet function in order to simultaneously analyze a signal in time and frequency-domains [89].

However, some conventional signal processing methods like FFT work better with stationary signals, and in practice the events that cause most intrusion signals inside the structures are transient and non-stationary. Thus, the conventional FFT may not be so suitable [90]. As per recent studies, more convenient methods that can process and interpret signals in time-frequency-domain are highly recommended. According to [15,50,58], conventional methods like short-time Fourier transform (STFT) and CWT are time consuming, especially when multiple intrusion events occur simultaneously as they first need to pinpoint the exact location of the event before extraction of its time-domain signals for the recognition process. So, both methods are often prone to an unnecessarily longer recognition time problem because pinpointing the precise location of the intrusion signal is usually difficult as mostly intrusion(s) occurs within a given range and not a just a single point [75], hence leading to an increase in the number of false alarms. Due to a series of non-linear and dynamic nature of $\varphi$-OTDR signals, the WPT is commonly suggested to mitigate the number of false alarms [15,75]. In the signal processing phase [82], Chen and Xu suggest that for each frame of disturbance signal features, mel-frequency cepstral coefficients (MFCCs) should be extracted as frequency-domain features while the short-time energy ratio as well as the short-time level crossing (LC) rate should be extracted as time-domain features.

Non-linearity, dynamism, and the unpredictable nature of $\varphi$-OTDR signals have accounted to the introduction of so many different signal processing techniques mostly derived from the traditional methods above. An MFCC-related algorithm is proposed to reduce NAR [91]. Other algorithms like WD and WPD have been thoroughly compared before being fed to the neural network with similar datasets and performance results proving that WPD is more convenient due to higher identification rate and accuracy with lower NARs for practical applications for pre-warning in oil pipeline safety monitoring [40]. In [77], an empirical mode decomposition (EMD) energy analysis method is proposed and argued to be advantageous over WD as it does not require the prior setting of the basis function. In 2019, Zhao et al. [29] proposed a multi-dimensional feature extraction method algorithm based on polynomial least squares for removing trend terms from vibration signals and wavelet threshold denoising for reducing noise interferences, where by the multi-dimensional features of the signals are extracted using a combination of short-time analysis (in time-domain) and wavelet analysis (in wavelet domain). Cubical smoothing [49] and spectral subtraction [56] are good and effective algorithms for signal denoising with the latter being so popular and widely used to enhance signal features. A power spectrum estimation [72] and wavelet energy spectrum analysis [76] approach have also been utilized to extract the feature vectors generated by acoustic signals.

In summary, there have been several signal preprocessing techniques/methods applied in $\varphi$-OTDR in which most of them have been discussed in this section. There is not a single superior method suitable for all $\varphi$-OTDR applications as each method is preferred more under different circumstances. The performance or efficiency of each method depends on several factors including, but not limited to, the application scenario, nature of the data, goal of the $\varphi$-OTDR application, and/or the classification algorithm used.

However, according to our paper, WPD has been mostly used in our review paper. Most of its usage comes from oil-gas pipeline safety monitoring and a few other areas, and it has proved to perform better than most signal processing methods. In [40], the authors compared WPD and WD to extract frequency-domain features for three events in a 65 km oil pipeline. According to the authors, WPD is a suitable feature identification method for oil pipelines. WPD applies decomposition to both the approximations and details; as a result, it offers a far richer frequency analysis than a WD, which only applies the decomposition to the approximations. The authors added that WPD is a far better frequency-spectrum analysis method because it can obtain more accurate frequency-band decomposition compared to WD. In [37], the authors also urges that WPD offers a much richer frequency analysis as compared to WD. According to the paper, a three-level WPD with a db6 mother wavelet accurately specifies useful signal separately from the noise signals as compared to WD. This helps the classifiers to yield higher accuracy. In [26], the author suggests the use of WPD over WD because WPD can divide the high frequency parts of the signals finer than a WD. In [30], the authors presents a WPD for extracting time-frequency-domain signals including the energy entropy and energy spectra of WPD.

Spectral subtraction and FFT have also appeared in several similar cases, but mostly in long perimeter monitoring and seismic waves prediction achieving a very high accuracy (up to 99% in seismic waves). Other methods perform fairly good, maintaining high accuracy above 90%, while poor performers include short-time (ST)-FFT 68.11% and 55.6% NAR, as well as contextual feature extraction 69.7% and 31.2% (see Table 1).

To finalize, although WPD has been widely used and has performed better in most instances from our review paper, we believe better or poor results may or may not be due to a signal preprocessing method used. However, we believe that a good signal preprocessing technique helps to reduce noise, which further helps the classifier to make better decisions/recognition. An overall summary showing the signal preprocessing methods along with other parameters along with their performance results is shown in Table 1.

4. EVALUATION METRICS FOR MACHINE/DEEP LEARNING METHODS IN $\phi$-OTDR

After the signal preprocessing, we need to input our data into ML models/classifiers for data analysis. In order to evaluate the efficiency and effectiveness of any classifier, we need to build strong and sound performance metrics. Usually, the performance of ML/DL algorithms is measured using the confusion matrix parameters namely: accuracy, recall, precision, and $f$-measure (also called $f1$-score or $f$-score) as explained in the Table 2 below. However, for $\varphi$-OTDR, we need to include necessary parameters like NAR, and some studies have gone further by adding some additional parameters like IDT.

Table 2. General Structure of a Confusion Matrix

View Table | View all tables in this article

A. Nuisance Alarm Rates

NAR is an erroneous or deceptive report of non-event disturbance causing unnecessary attention, which results in the misuse of resources. $\varphi$-OTDR suffers from high NAR [92; therefore, abundant efforts have been applied into alleviating NAR in $\varphi$-OTDR event classification systems [59]. Equation (1) shows the general formula for computing NAR in terms of recall while Eq. (2) further breaks down recall in terms true positives (TP) and false negatives (FN),

(1)$${\rm NAR} = 1 - {\rm Recall}.$$

Since recall can be expressed in terms of the confusion matrix parameters (TP and FN), therefore,

(2)$${\rm NAR} = \frac{{\rm FN}}{{(\rm TP + FN)}}.$$

B. Identification Time

IDT actually explains how fast the classifier can process and recognize signals before classification. It is basically the time taken to identify the classes to which a given signal belongs.

C. Confusion Matrix

In supervised ML, we usually measure the efficiency of a classifier in terms of accuracy, recall, precision, and $f$-measure. Together these parameters are illustrated in terms of a table-like matrix called the confusion matrix or the error matrix as presented in Table 2. This helps to visualize the performance of an algorithm in terms of its efficiency and effectiveness. A binary classifier can classify instances as either positives or negatives. Table 2 shows a confusion matrix with predicted and actual classes. The confusion matrix parameters TP, FP, FN, and TN are used to compute the precision, recall, $f$-measure, and accuracy of a classifier as shown in Eqs. (3)–(6),

(3)$${\rm Precision} = \frac{{\rm TP}}{{\rm TP + TN}},$$

(4)$${\rm Rcecall} = \frac{{\rm TP}}{{\rm TP + FP}},$$

(5)$$F -{\rm measure} = 2 \times \frac{{(\rm Precision \times Recall)}}{{(\rm Precision + Recall)}},$$

(6)$${\rm Accuracy} = \frac{{\rm TP + TN}}{{\rm TP + TN + FP + FN}}.$$

5. MACHINE LEARNING ALGORITHMS FOR EVENTS CLASSIFICATIONS IN $\phi$-OTDR

ML is a science of training a computer on how to act on newly incoming feature-sets based on the given input features (datasets). Usually in computer science, ML is categorized into four main parts, namely: supervised learning, semi-supervised learning, unsupervised learning, and reinforcement learning. Once given sufficient learning samples also known as training datasets, the computer can learn and understand data patterns in the process called training, and then based on the training datasets a classifier (classification model) it can identify a newly unknown feature and place it to its most relevant category (class). In supervised learning, the machine uses labeled data while in unsupervised learning the machine uses unlabeled data during the training process. Semi-supervised learning is a mixture of both labeled and unlabeled training data while reinforcement learning is basically based on trials and errors whereby the system is rewarded if the error is low and punished otherwise.

Some common algorithms for data-preprocessing and classification as discussed in this paper are demonstrated in Fig. 3. Classification algorithms can be divided into traditional ML and deep neural networks, which accomplish the same task (classification) but mainly differ in their structures and the way they process data. As a matter of fact, both are useful, but their applications are different depending on the size of datasets, nature of datasets, classification task ahead, and so forth.

Fig. 3. Showing some of current machine learning methods for events classifications in $\phi$-OTDR. (GAN, generative adversarial network; PCA, principal component analysis; ANN, artificial neural networks; KNN, $k$-nearest neighbors; SVM, support vector machine; RF, random forest; GMM, Gaussian mixture model; XGBoost, extreme gradient boosting; ELM, extreme learning machine; DT, decision tree; HMM, hidden Markov model; CNN, convolution neural network; LSTM, long short-term memory.)

Download Full Size | PDF

As far as event classification in $\varphi$-OTDR systems is concerned, many methods have been proposed for different kinds of environments, each with different contributions to the performance of $\varphi$-OTDR in the aforementioned real-life applications. Some papers interchangeably refer to these ML classification algorithms as pattern recognition systems, and several methods have been explained for event identifications. However, some studies reviewed in this paper have proposed different approaches toward solving identification problems in $\varphi$-OTDR, which are contrary to ML including Canny edge detection [49] and GPU-based parallel computing to reduce time consumption [56].

A. Artificial Neural Networks

Artificial neural networks (ANNs), or simply neural networks, are a branch of ML with the ability to intelligently learn on their own how to extract relevant features once trained. They can be considered as smart computational algorithms with a unique ability to extract meaningful information from a range of imprecise or complex datasets, then draw out the patterns, and finally detect the trends that are otherwise too convoluted for other simple ML techniques or algorithms. As they have been commonly used in so many domains, $\varphi$-OTDR systems have also witnessed a significant impact in performance by applying the ANN in a wide range of its applications including event localizations and classification. Multilayer perceptron (MLPs) have been commonly adapted by several researchers in $\varphi$-OTDR [38,41,63], and probability-based neural networks (PNNs) have also been incorporated with $\varphi$-OTDR systems [65,93–95]; despite their huge popularity, most studies have preferred some DL approaches like convolutional neural networks (CNNs) and long short-term memory (LSTM), as they have appear recently in the last one or two years. A typical architecture for ANN used in $\varphi$-OTDR is shown in Fig. 4. This ANN has one input layer to begin the workflow by taking initial data (feature vectors from the wavelet packet energy feature space) then performing some calculations via its neurons before sending its outputs to the subsequent layers called hidden layers (${{{\rm h}}_1}$ and ${{{\rm h}}_2}$) of the ANN for further processing. Finally, the output layer takes the results of the hidden layers to provide the final results for event classifications.

Fig. 4. General ANN architecture with one input layer, two hidden layers (${{{\rm h}}_1}$ and ${{{\rm h}}_2}$) and one output layer, where ${{{\rm O}}_1}$ and ${{{\rm O}}_2}$ are the output neurons ([40], Fig. 3).

Download Full Size | PDF

Principally an ANN is a layered approach with three different categories of layers, namely: the input layer, hidden layer(s)m and output layer. The network can have as many hidden layers as necessary due to the complexity nature of the datasets; however, too many layers can lead to a slower network with longer processing time, or even overfitting. Below we have presented ANN and related algorithms as applied in $\varphi$-OTDR.

1. Basic ANN

In 2017, Wu et al. [40] presented a four layered backward propagation (BP) ANN training model for event identification in order to reduce NARs in a 65 km long oil pipeline. The ANN classifier was fed the input feature vectors extracted using two commonly used feature extraction methods, namely: WD and WPD, which have been comparatively analyzed for three different signals collected from the field as the testing datasets. In the experimental results, it is vivid that WPD outperformed its counterpart by margins as it recorded up to 94.4% identification rate, with a 5.6% NAR. On the other hand, WD recorded only 91.1% average identification rate and a considerably higher NAR of 8.9%. Therefore, these experimental results suggest that WPD is a better option for $\varphi$-OTDR signal processing in oil pipeline safety monitoring applications.

2. MLP

Classical ANNs are fully connected (FC) feed-forward networks where each neuron in one layer is connected to all neurons of the previous layer. This kind or ANN is also referred to as an MLP, which is a classical ML algorithm with good performance and low execution time [41]. In 2017, Tejedor et al. [38] proposed an MLP-based method for $\varphi$-OTDR, which has achieved a fair result of 61.8% accuracy for eight class events on a 45 km long fiber. In 2021, Bublin [63] reported an MLP plus feature extraction method; its accuracy is 99.88% with a processing time of 0.55 ms and ${\lt}{{1}}$ false alarms per month.

3. Probability-Based Neural Network

PNN is basically an implementation of the “kernel discriminant analysis” statistical algorithm. It was introduced in 1990 by Specht based on Bayesian probability (BP) theory [93]. It works by marking the input patterns into a number of different class levels, and its network can be organized into a multilayered feed-forward neural network with four major layers, namely: (1) input layer, (2) pattern layer, (3) summation layer, and (4) output layer [94]. Wu [65] presented a PNN with a successful 1.5% NAR and over 98% recognition accuracy with 0% leakage alarm rate using a combination of FFT, power spectrum estimation, and WD feature extraction methods. The major pros of PNN include faster training capability compared to BP, no local minima problem, and an ability to guarantee coverage to an optimal classifier even with larger training datasets. Meanwhile some of the key disadvantages for the PNN include slow network execution due to composition of several layers and high memory requirements for their networks [95].

ANNs has been extensively used, and experimental results show an increase in performance by achieving higher event identification rate with essentially low NAR. It can be discovered that ANN training models or any of their derivatives like PNN work better with another conventional ML algorithm like ${{k}}$-nearest neighbors (KNNs), support vector machines (SVMs), and others of the like combined with a good feature extraction method(s) that can be used to increase recognition rate and to lower the NAR.

B. Support Vector Machines

SVMs are supervised ML techniques that can perform both classification and regression. Linear SVMs were originally designed for binary classification problems [96]. By the use of the special decision boundary called the hyperplane, an SVM model can distinguish between two classes based on the closest datasets from the hyperplane. Basically, SVM is originated from research of the optimal separating hyperplane, which is required to separate all the samples exactly while making the margin between two sizes of hyperplane maximal [72]. SVM is one of the most commonly used event classification methods in $\varphi$-OTDR and has achieved suitable results. In the subsections below, we provide a review of SVM algorithms and their variations as used in $\varphi$-OTDR. Figure 5 shows the architecture for a standard SVM used for event classifications in $\varphi$-OTDR that takes input features ${x_1}$ to ${x_n}$ through the kernel function, which helps to standardize the inputs for smooth calculations.

Fig. 5. General SVM architecture ([71], Fig. 3).

Download Full Size | PDF

1. Basic SVM

A perimeter security monitoring system using an SVM classifier and a FFT signal processing algorithm was presented [71] in order to classify five different events, namely: a stable state, walking on the lawn, vibration exciter, shaking the fence, and fence exposed to the wind in a 50 km long $\varphi$-OTDR system. During signal processing phase, three-dimensional (3D) feature vectors expressed in terms of low frequency to total energy ratio (Feature 1), total energy (Feature 2), and peak value to mean value ratio (Feature 3) were extracted as feature vectors to the SVM classifier using the radial basis function (RBF) kernel function. The SVM achieved an average identification rate of 92.62%, intrusion detection rate of up to 98.6%, and event classification rate of 91.2%. The SVM classifier was also employed in a paper [73] by Wiesmeyr et al. to monitor and extract active positions of a train with a 40 km long fiber optic cable. In this case, a FFT was used for signal processing and principle component analysis (PCA) was used to remove dimensionality from 10 feature values to 2 feature values. The experimental results recorded over 98% accuracy.

The acoustic signals generated by three different types of vehicles—cars, trucks, and tractors—were classified using Library-SVM (LIBSVM) with RBF kernel function [72]. The feature vectors were generated using power spectrum estimation method, and then PCA of the normalized spectrum lines was finally implemented in MATLAB to form four principle components with accumulative contributive rate of above 90% to be selected as final feature vectors. The average identification accuracy of the training datasets is 95.5% while the average accuracy of testing samples is 88.9%, and those results showed the effectiveness of the PCA-SVM approach for automatic vehicle type detection using acoustic signals.

Aiming at improving SNR, another multiclass SVM with spectral subtraction feature processing method is presented [74]. Over 800 samples generated by four vibration events (taping, shaking, striking, and crushing) were processed using spectral subtraction method to reduce wideband background noise and to enhance the time-frequency properties of the generated signals. More than 90% recognition accuracy was achieved with only under 0.6 s recognition time in a 20 km long $\varphi$-OTDR system.

2. Near Category Support Vector Machine

As an attempt to mitigate the rate of nuisance alarms, a paper [97] suggests near category support vector machines (NC-SVM) as an improvement of the legacy binary SVM classifier in order to support multiclass classifications using the KNN algorithm. In their experiments, five different event types, i.e., watering, climbing, pressing, knocking, and false disturbance were trained and tested in the NC-SVM classification model within a 25.5 km range. Experimental results present higher average identification rate of above 94% with 0.55 s IDT and a NAR of 5.62%.

3. Linear Support Vector Machine

The linear SVM can be efficiently used along with DL algorithms by replacing the softmax layer of the CNN to maximize recognition accuracy [96]. In [54], a linear support vector machine (LSVM)-based classifier was proposed to classify three different activities, namely: digging with a shovel, hammer, and pickaxe along the buried fiber. In the first stage, initially, the wavelet denoising method was used to reduce excessive noises from the measured backscattered signal, then high-pass filtering was performed using “difference in time-domain” approach, and finally an autocorrelation was applied to remove uncorrelated signals by comparing each signal to itself. In the second stage, a variation mode decomposition (VMD) technique was used in order to decompose the detected activity’s signals into a band-limited series starting from where the event signals are reconstructed. Finally, higher order statistical features are extracted, which includes variance, skew-ness, and kurtosis. In the classification stage, LSVM is then employed under different levels of SNR. The confusion matrix shows higher accuracy of about 79.5% for higher SNR from ${-}{{4}}$ to ${-}{{8}}\;{\rm{dB}}$ while lower SNR level from ${-}{{8}}$ to ${-}{{18}}\;{\rm{dB}}$ leads to a decrease in accuracy to 75.2%.

4. Relevance Vector Machine

The relevance vector machine (RVM) is a Bayesian-based probability framework, which is generally sparser than the commonly implemented SVM algorithm, but with a shorter recognition time and higher recognition accuracy [98]. Hence, it is more suitable for recognition in fiber optic pre-warning systems [37,76]. Sun et al. [75] analyzed signals in two-dimensional (2D, time and space) domain during feature extraction phase instead of a noisy and time consuming one-dimensional (1D, time) domain and then fed the extracted feature vectors into the proposed RVM classification model. Three events (walking, digging, and vehicle passing) were successfully identified in a 20 km fiber sensing system during the experiment. The RVM yielded 97.8% recognition accuracy with a short (${\lt}{{1}}\;{\rm{s}}$) computation time.

Another study demonstrated the application of the RVM method [76] where a RVM classification algorithm is deployed to classify three events (walking, jogging, and striking through the fiber) in a 10 km sensing fiber. The 10-fold cross validation is applied to ensure standardized results. The results show macro-accuracy of 88.6% and precision of 88.61% with a recall of 88.99% and an $f$-measure 88.79%. Figures 6 and 7 show two major processes for RVM, namely: training and recognition, respectively.

Fig. 6. Training phase for RVM with three classifiers ([75], Fig. 14).

Download Full Size | PDF

Fig. 7. Recognition phase for RVM with three classifiers ([75], Fig. 15).

Download Full Size | PDF

C. ${{K}}$-Nearest Neighbors

KNN is one of the simplest supervised ML algorithms for classification problems, although not in $\varphi$-OTDR. In 2013, George et al. [66] presented an efficient detection and classification method to classify acoustic signals using ANN and KNN algorithms in order to help detect moving vehicles in traffic monitoring. The study shows extraction of MFCC features from different vehicles. However, results show that KNN has a poor classification accuracy of only 50.62%. Although this experiment may not involve $\varphi$-OTDR technology, the approach used is in many ways similar to the ones in $\varphi$-OTDR since both involve event classifications of acoustic signals. Therefore, we believe that this can be an opportunity for future works to adapt the similar approach and improve performance results as far as event classifications in $\varphi$-OTDR.

In 2019, Jia et al. [97] presented a combination of KNN and SVM to form a hybrid classifier called near category SVM (NC-SVM) for five events (watering, climbing, pressing, knocking, and a disturbance event) in a 25.05 km $\varphi$-OTDR. According to the authors, the introduction of the KNN algorithm into SVM helped to effectively boost performance results by attaining higher classification accuracy up to 94%, with 5.2% NAR and a good IDT of 0.55 s. As explained in Subsection 5.B.2 of this paper. The algorithm for computing KNN is shown in Fig. 8.

Fig. 8. Flow chart for KNN algorithm ([97], Fig. 8).

Download Full Size | PDF

D. Random Forest

Random forest (RF) models are ML models that predict the output by combining outcomes from a sequence of decision trees (DTs). Each tree is constructed independently and depends on a random vector sampled from the input data [99–101]. Major advantages of an RF algorithm over a DT algorithm include, but are not limited to, easy to fine-tune hyper-parameters, high accuracy without overfitting problems, no need for feature scaling, resilience to noise, and robustness when it comes to the selection of training samples in training dataset [101]. However, the major disadvantage is RF classifiers are harder to interpret compared to DT classifiers. Figure 9 shows the DT of an RF classifier with two classes.

Fig. 9. RF classifier for two classes ([78], Fig. 3).

Download Full Size | PDF

In 2018, Wang et al. [79] used RF to identify two event signals named digging and normal signals, which were extracted in time- and frequency-domains, respectively. The classifier showed higher accuracy of 98.67% as presented in Table 3. Wang et al. [78] presented an RF event classifier in $\varphi$-OTDR with the aim to reduce NAR. The RF is based on learning time-domain disturbance signal features prior to classification, and the experiment was done using four different events including three disturbance events: watering, pressing, and knocking along with one non-disturbance event. And the experimental results recorded over 96.58% average accuracy with individual accuracies being 93.79% for watering, 97.06% for pressing, 97.36% for knocking, and 98.12% for a non-disturbance event. These results are fairly high and almost equally balanced as there is no huge gap between the highest and the lowest recorded accuracy for individual events. Even though the authors claim to have reduced NAR, the paper did not state the exact NAR obtained from their experiment, but the average identification accuracy (96.58%) is pretty high and convincing.

Table 3. Analysis of Several Classification Algorithms Compared to F-Elm^a

View Table | View all tables in this article

E. Extreme Learning Machine

A combination of an extreme machine learning (ELM) and the feature extraction method called Fisher score is presented to form a method called F-ELM [80]. This method is proposed as an event identification technique to reduce NARs. The F-ELM is basically an improvement to the previously algorithms, namely: between-category to within-category (BW) ELM [102] and distributed generalized and regularized ELM [103]. In 2020, Jia et al. [80] conducted an experiment to identify five kinds of events, namely: watering, climbing, pressing, knocking, and a false disturbance event in a 25.05 km long $\varphi$-OTDR. The results show over 95% average classification accuracy in just less than 0.1 s (IDT) and 4.67% NAR by using 25 selected features. The authors also performed a mini survey from some other articles and demonstrated a comparative analysis of few classification methods including the SVM [75], CNN + SVM [40], RF [79], CNN [55], BP neural network [104], and RVM [76] before comparing their results with their newly proposed F-ELM algorithm. Experimental results suggest F-ELM is more effective only next to RF in terms of identification rate but with a much shorter IDT than SVM and CNN + SVM as shown in Table 3.

In Table 3, the identification rates are arranged in a descending order from the highest to the lowest with the most accurate being the RF with an IR of 98.67%. However, the RF had the fewest number of disturbance events (only two), so it may not be safe to conclude that it is better than other classifiers as the proposed F-ELM achieved a higher identification rate 95.33% of 5 different events in less than 0.1 second, which makes it the most effective under the circumstances.

According to Zhang [105], usually the features with higher intra-class relationship and lower inter-class similarity lead to best classification accuracy and vice versa. The Fisher score is important for removing unrelated eigenvalues; thus, for $m$-identification problems, the Fisher score can be given by Eq. (7) below [80]:

(7)$$f(d) = \sum {_{0 \lt i \lt j \lt m}^m\frac{{{{({\mu _{\textit{id}}} - {\mu _{\textit{jd}}})}^2}}}{{\sigma _{\textit{id}}^2 + \sigma _{\textit{jd}}^2}}} ,$$

where ${\mu _{\textit{id}}}$ and ${\mu _{\textit{jd}}}$ are the means of classes “$i$” and “$j$” corresponding to the ${d^{{\rm th}}}$ feature, and $\sigma _{\textit{id}}^2$ and $\sigma _{\textit{jd}}^2$ are the variances.

According to [80], during feature selection phase, Fisher scores of every feature should be computed first, and then only features with larger scores are selected for the next (classification) phase.

During the practical experiment, five events were used, and the experimental results show that four different types of disturbance events, which are watering, climbing, pressing, and knocking, could effectively be identified and separated from the fifth false disturbance with above 95% average identification rate and less than 0.1 s IDT. Meanwhile, the NAR is about 4.67% using 25 selected features with a 25.05 km long fiber. The architecture for the ELM algorithm is shown in Fig. 10.

Fig. 10. Architecture of an ELM model for multiclass recognition ([80], Fig. 2).

Download Full Size | PDF

F. Extreme Gradient Boosting

In 2019, Timofeev and Groznov [51] presented a classification of seismic-acoustic waves using $\varphi$-OTDR in a time-domain through time-reconstruction of the interference signal phase. In their experiments, they presented three classifiers including multiclass SVM, gradient boosting (GB), and extreme gradient boosting (XGBoost) algorithms with both attaining higher classification accuracies of over 98% in a 20 km system. Alternating between FFT, wavelet denoising, and MFCC feature extraction methods, their experiments showed that the multilayered ANN lags behind with lower accuracy as compared the above-mentioned classifiers, and multiclass SVMs proved to be more robust using a cross validation technique for generalization. In 2020, a paper by Wang et al. indicated that extreme gradient boosting (XGBoost) is superior to most other common classifiers including SVM, RF, and GB especially if using EMD energy analysis method for feature extractions [77].

G. Probabilistic Approach: Gaussian Mixture Models and Hidden Markov Models

A Gaussian mixture model (GMM) is a clustering algorithm based on a probabilistic approach that assumes that each generated samples or data points follow a mixture of finite Gaussian distribution with several unknown parameters [57,106] while a hidden Markov model (HMM) is a statistical model used to describe an evolution of observable events depending on some internal factors that are not directly observable. An observed event is called a “symbol” while an invisible factor underlying the observation is called a “state” [107]. Hidden states of the HMM form a Markov Chain whereas the probability of observed symbols usually depends on the underlying states.

1. HMMs

In 2019, Wu et al. [30] proposed a different approach called dynamic time sequence recognition and a knowledge-mining technique based on the HMMs. According to the authors, this approach can deal with non-linearity of non-stationary vibration signals in long-distance underground pipelines caused by a range of complicated dynamic events. The experimental results using real testing datasets from the field show higher average recognition accuracy of 98.2% for five different commonly encountered events along buried pipelines. Additionally, other related performance metrics like precision, recall, and $f$-score are also better than those traditional ML methods such as RF, XGB, DT, and Bayesian network (BN) as shown in Table 4. Table 4 presents a performance summary of six classifiers for five different events.

Table 4. Performance Comparison of Six Classification Methods Discussed^a

View Table | View all tables in this article

2. mCNN-HMM

In 2021, Wu et al. [81] proposed an end-to-end combined model with a modified multi-scale CNN and HMM for a long-distance safety surveillance. According to the authors, this new approach can effectively identify vibrational signals by simultaneously extracting the multi-scale structural features as well as the sequential information of the signals. In their experiment, mCNN is used to extract local structural features of the DAS signals from a multi-level perspective and their relationship whereas HMM is used for mining of the sequential information of previously extracted features. The experiment was conducted in a 34 km long fiber cable, and the experimental results show 98.1% classification accuracy, 98.07% for both precision and recall, and 98.05% ${\rm{f1}}$-score. The authors then performed a comparison between their proposed method (mCNN-HMM) model and three other models, namely: handcrafted feature with HMM (93.4% average accuracy), CNN-HMM (96% average accuracy), and MS-CNN (75% average accuracy). The comparison results show that the mCNN-HMM method has achieved better results in terms of average accuracy than the rest of the models.

3. GMMs

In 2016, Fedorov et al. [57] used GMM to recognize two event classes (single target passage and digging near the cable) both following the Gaussian distribution. The feature space for their GMM was formed by cepstral coefficients using ${\rm{M}} = {{10}}$ as the optimal value. Experimental results showed that the highest probability of correct event recognition can reach up to 0.94. However, according to the authors, this probability of correct event identification depends on the number and properties of testing samples, which opens a room for future works.

GMM was also applied to cluster two classes (threats and non-threats) by using real data collected from the field in their Fiber Network Distributed Acoustic Sensor (FINDAS) project for energy pipeline surveillance [39]. In their experiments, GMM was used along with the ST-FFT signal processing method to generate the spectral information. Finally the expectation-maximization algorithm was used for GMM training, and then acoustic inputs frames were assigned to the class with highest probability. The experiment recorded a threat classification rate of 68.11%, and more than 55% false alarms (according to the authors) were detected using a six-fold cross validation method in a 45 km long $\varphi$-OTDR system. However, the authors claim that these results are just preliminary, and future work is aimed at reducing noises and creating more robust feature vectors.

In 2016, a study based on GMM to monitor the integrity of gas pipeline was presented [38]. In this paper, the contextual feature extraction based on the tandem approach is employed to produce the tandem feature vectors, and then a three-layer MLP is employed in order to integrate the feature-level contextual information. The length of the fiber optic cable is 45 km, and eight different activities were recognized. The contextual feature extraction results module recorded a 69.7% classification accuracy, which is fairly low, and a 31.2% NAR, which is too high. Furthermore, the paper presented a fair 80.7% threat detection rate; however, the authors did not clearly specify the differences between the two. Overall results are not very good as the results in terms of classification accuracy and NAR are not satisfying and convincing enough.

So, according to the above results, we can fairly say that GMM is a good clustering model due to the fact that it could achieve up to 94% clustering accuracy [57] under normal operating conditions. However, Martins et al.’s [39] results are not as good enough when compared to other classification methods. The authors urge that in the future works should focus more on better noise-reduction methods in order to extract robust feature vectors as well as deploying new strategies to deal with non-linearity behaviors of the sensing system.

4. GMM-HMM

In 2018, Tejedor et al. [35] proposed a system based on GMM-HMM to detect potential threats in a $\varphi$-OTDR. The presented results show an improved performance of over 45.15% as compared to traditional GMM in terms of classification accuracy. The proposed algorithm also shows 91% of threat detection with a very high 53.7% of false alarms, unlike the traditional GMM algorithm that shows 80% of threat detection and 40% of false alarms according to the authors. The false alarm rates demonstrated in this paper appeared to be too high, but the authors did not provide any explanations about that.

Generally, if given enough normalized training samples for fairly easy and common activities, a GMM clustering has the potential of yielding higher clustering probability (i.e., higher classification accuracy) during the pattern recognition process. It can also be used as the learning algorithm during feature extraction for DL algorithms. HMM works best in some complex activities where an activity has more than one behavior since it is capable of recording them in different states in a Markov chain. So a combination of both GMM and HMM provides a better threat detection rate as indicated in the studies above. Even though the GMM-HMM approach has higher detection rate that in GMM and HMM, it suffers from higher NAR as demonstrated in the FINDAS project. Initially the GMM-HMM’s NAR was more than 53.7% whereas as the traditional GMM only had 40% NAR. This suggests that a GMM-HMM pattern recognition technique should be best deployed in areas where event classification is of a higher priority than a low NAR.

6. DEEP LEARNING ALGORITHMS FOR EVENT CLASSIFICATION IN $\phi$-OTDR

Usually most ML algorithms are designed to work on simplified datasets with only up to few hundred features whereas DL algorithms can run the data through numerous layers of neural networks with each layer processing the data to a more simplified form before feeding it to the next layer until the final output. In this section, we have surveyed recent DL methods deployed in $\varphi$-OTDR systems like the recurrent neural network (RNN), CNN, temporal convolutional network (TCN), LSTM, generative adversarial network (GAN), and sparse stacked auto encoder (SSAE), as well as some hybrid approaches, which involve both deep and ML methods, e.g., CNN + SVM, CNN + KNN, CNN + LSTM, CNN + RF, and others. These techniques have proved to do better classification jobs to increase accuracy while they lower NARs as the future of event classification greatly depends on smart DL algorithms with the abilities to handle large sets of data, to learn intelligently, and to train with a higher degree of adaptation. They require less effort in feature processing as they have an ability to intelligently learn and input the raw data into the network hence less manual work needed [31,43,45,98]. Also DL algorithms are able to combine the softmax or the FC layer with a traditional ML classifier to form a single robust algorithm.

An approach on how to develop good DL algorithms for signal recognition in long perimeter monitoring in fiber optic sensors was presented [78]. The author demonstrated how an efficient DL algorithm can be used to accurately identify an activity in long perimeter security using $\varphi$-OTDR. However, the author did not review all the approaches based on DL classification but rather the underlying principles toward developing a strong and robust DL algorithm.

In subsections below, we present some DL approaches from different studies regarding event recognitions in $\varphi$-OTDR systems.

A. Long Short-Term Memory

The LSTM network is the most powerful and a common subset of RNN, which is a kind of feed-forward neural network [82]. LSTM is a useful method in sequence prediction problems since it takes both sequence and time into account [108,109]. LSTMs, especially when combined with CNNs, form robust, intelligent, and more consolidated classification methods for event recognition in $\varphi$-OTDR systems. Bidirectional LSTM (BiLSTM) is a popular variation of a regular LSTM with the ability to feed forward and propagate backwards unlike its counterpart with the ability of forward movement only [110]. So the Bi-LSTM networks are useful in pattern recognition for $\varphi$-OTDR since they can better connect and retrieve features from the sensing points using their bilateral spatial and unidirectional time relationships. Figure 11 shows the architecture for a Bi-LSTM network.

Fig. 11. Basic structure of the BLSTM network ([110], Fig. 3). (IL, input layer; FFL, forward feeding layer; BPL, backward propagation layer; AFL, activation function layer; OL, output layer.)

Download Full Size | PDF

1. Basic LSTM

Manie et al. [108] proposed a regular LSTM classification algorithm integrated with a DWT feature extraction technique for signal denoising in a $\varphi$-OTDR system, and the results show that an LSTM model combined without DWT achieves 92% accuracy while the accuracy can shoot to as high as 98% when DWT denoising is applied.

2. Attention-Based Long Short-Term Memory

Attention-based long-short term memory (ALSTM) is an improved version of the most common traditional LSTM method, which can focus most of its attention to the main or key parts of a signal. It is derived from a RNN feed-forward neural network [82]. Chen et al. [82] introduced an ALSTM in which they compared performances of both ALSTM and the legacy LSTM. In their experiment, they used five different kinds of disturbing activities, namely: digging, walking, vehicle passing, climbing, and digging at different positions in a 50 km long fiber buried 20 cm deep under the ground. Experimental results show that the ALSTM model has a faster convergence speed, a lower training loss, a higher classification accuracy of 94.3%, and a 0.91 s recognition time while traditional LSTM has a classification accuracy of 90.6% and 0.87 s recognition time. Additionally, the paper shows superiority of ALSTM to other well-reputed classification methods: CNN with 89.9% classification accuracy and 0.49 s recognition time and morphologic feature extraction (MFE) with 88.1% accuracy and 0.21 s recognition time. Generally, with the above results, both LSTM and ALTSM with proper signal processing techniques can achieve high accuracy, and while an ALSTM is more accurate, its counterpart is faster in terms of learning and overall network convergence as well as low training loss. However, in both cases, the papers did not explain about NARs or the other confusion matrix parameters.

B. Convolutional Neural Networks

A CNN, also referred to as ConvNet, is an essential class of DL, or deep ANNs, which are most commonly applied in computer vision [111] for visual imagery but also have claimed state-of-the-art performances in an abundantly wide range of tasks including natural language processing and others [112]. However, in recent years, they have been widely employed in various fields of $\varphi$-OTDR for pattern recognitions. Usually a typical CNN architecture can consist of few convolutional blocks, formed by a Conv layer, a pooling layer, a FC layer, and the softmax layer as shown in Fig. 12. Different CNN algorithms used for event classifications in $\varphi$-OTDR are presented in this section.

Fig. 12. Basic CNN architecture. (Conv layer, convolution layer; FC, fully connected.)

Download Full Size | PDF

1. Basic CNNs

Shi et al. [32] proposed a CNN to classify five different kinds of events, namely: background, walking, jumping, beating, and digging with a shovel. In this paper, the main difference in their approach from traditional CNN deployments is the input data matrix. The temporal-spatial data matrix acquired from the fiber optics was directly plugged into the CNN as input feature vectors. In their experiment, 5644 different event samples were processed for under only 7 min and managed to get high classification accuracy of 96.67% with a very short recognition time for a 1 km long fiber. This method was compared to common legacy CNNs like LeNet, AlexNet, VggNet, GoogleNet, and ResNet, and the biggest advantage of the proposed method over the legacy algorithms is the reduction of retrained speed by making a network relatively smaller and faster.

In 2021, Wang et al. [69] proposed a deep CNN to classify four events, namely: switches, highway below the railway, cracking, and beam crevices for a 1.5 km section of a high-speed railway track using $\varphi$-OTDR. Their experimental results yielded 98.04% accuracy. Although the results are pretty satisfying, the authors believe that a large amount of data is needed, and since there is a limited amount of labeled data, future works should involve semi-supervised DL models to further improve performance.

Aktas et al. [67] put forward a deep CNN trained with real sensing data using $\varphi$-OTDR for event classifications in a 40 km long fiber buried 1 m deep underground. The algorithm successfully achieved over 93% accuracy in classifying six events: walking, pickaxe digging, shovel digging, harrow digging, strong wind, and facility noise caused by water pipes, generators, and/or air conditioning. Other metrics from a confusion matrix present over 97.6% precision with a time-frequency signals approach while the precision lowers to 73.7% with a time-domain signals.

Another attempt to protect malicious activities in pipelines was done by Peng et al. [25]. By using CNN, they achieved over 85% accuracy in identifying four different events. In their experiment, they used three convolutional layers with a max pooling layer. However, the paper did not state the length of the fiber cable used.

According to Chen et al. [26], they introduced a 1D CNN capable of intelligently learning and identifying the distinguishable features from different sources of disturbances by raw event signals. According to their experimental results from the real environment in oil pipeline monitoring, it is proven that their proposed 1D-CNN performs slightly better than 2D CNN in terms of recognition metrics and processing speed. According to the authors, a 2D-CNN uses 2D convolution kernels in the convolution layers while a 1D uses a 1D convolution kernel. The authors urge that in order for a 2D-CNN to recognize 1D sensing signals, the signal needs to be transformed into a 2D image through time-frequency analysis or it can be reshaped as a matrix, which in both cases is time consuming and computationally expensive; hence, they introduce a 1D-CNN. So without any transformation need on the raw signals, it reduces the network structure, and as a result the computational efficiency increases. The performance metrics of the confusion matrix show an overall average accuracy of above 95% as compared to only 89.1% accuracy from a 2D-CNN. The results also include an average of 99.5% for precision, recall, and ${\rm{f1}}$-score for a 1D-CNN while the 2D-CNN lags behind again with an average of 93% for precision, recall, and ${\rm{f1}}$-score. The construction of these two algorithms 1D versus 2D-CNNs was based on their inputs, and a 1D-CNN accepted just a single vector while a 2D algorithm should be a 2D matrix. However, some key details like the distance of the fiber covered were not stated.

Another paper also proposed a 1D-CNN [37] over a range of other conventional ML methods. In the first phase, 1D-CNN was employed to extract distinguishable features of the signals from an $\varphi$-OTDR. In the second phase, the softmax layer is replaced by either the vector machine randomly selected from optimal classifiers like SVM, RF, and GB. According to their experiments, this paper suggests a combination of a 1D-CNN and SVM for feature classifications, and this method can achieve up to 98% recognition accuracy using five classes of disturbance signals in the oil/gas pipeline monitoring systems, which proved superior to 2D and most other conventional methods. However, the paper did not state important parameters like the length of the sensing fiber and the types of the events used.

In 2018, Xu et al. [55] employed a combination of a CNN and multiclass SVM to form a hybrid classification method in order to intelligently identify four intrusion events: walking, digging, vehicle passing, and striking along the 40 km fiber. The network contains five convolution layers and two FC layers while the SVM replaces the traditional softmax layer of the CNN. Two sets of experiments were conducted with one using traditional CNN with its softmax layer yielding 88% classification accuracy while the second one is CNN-SVM combined replacing the softmax layer with the SVM yielding up to 93.3% classification accuracy.

Makarenko [68] demonstrated that a well-designed DL classifier that uses three different CNNs as the primary classifier. Each of these CNNs has a separate FC layer and the uses a sigmoid activation function; however, the main differences lay in their convolution layers. The primary classifier achieved up to 91% average detection accuracy, 92.06% average precision, and 91.39% ${\rm{f1}}$-score using complicated real environments data, with seven intrusion events detected with a 50 km long sensing system. In order to stabilize NARs a secondary classifier is added, however, Makarenko did not disclose more details about the secondary classifier as the research is still in progress.

In 2019, Peng et al. [42] presented a two-layer classifier based on CNN. In this case, layer 1 is designed to extract third-party threats from traffic as well as pedestrian noises while layer 2 is aimed at determining specific types of third-party interferences. According to them, reduction of NARs is done by implementing time-space matrix to reduce possible errors in a real-time surveillance system for security safety monitoring reasons in a buried municipal pipeline. In the experiment, the time-space matrix is deployed to correct any possible errors. Six activities—excavation, hammering, electrical hammering, shoveling, pickaxing, and metro passing generated different vibration signals—were identified, and results show 91% recognition accuracy in an 8 km long fiber.

In 2019, Wang et al. [34] presented an algorithm called DPN92. DPN stands for deep neural network, which is an improved version of CNN. A DPN is basically a CNN but with so many layers, hence a deep neural network. The authors customized their DPN to have 92 layer; hence, it was called DPN92. In their experiment, they classified seven disturbance events, namely: excavator operation, concrete fence breaking, pedestrians walking, tamping operation, ambient noise, moving train, and local wind blowing. Their datasets were collected from a real-life Shanghai railway using a fiber optic cable buried just alongside the railway. The experimental results show the efficiency of DPN92 with 97% average classification accuracy and over 99% of both precision, recall, and f-score. However, the paper did not state the length of the cable used.

2. Convolutional Long Short-Term Neural Network

Bai et al. [33] proposed a deep neural network based algorithm called the convolutional long short-term neural network (CLDNN) in a 33 km long fiber optic sensing system to identify external intrusion events for pipeline safety. The authors presented three classes of events called percussive tap (PT), mechanical digging (MD), and normal (non-intrusion) events. PT events includes all harmful labor activities that involve using tools to tap the ground and produce soil vibration, for example, ramming, digging, and drilling. MD events include all activities done by heavy machines like excavation and other heavy machines activities. Both PT and MD are considered harmful for the safety of a pipeline. According to the author, their proposed algorithm is composed of two convolutional layers, one LSTM layer, and one FC layer as shown in Fig. 13. The experimental results suggest CLDNN is an effective and robust algorithm for fast and accurate localization of event signals in complex environments. It works by directly inputting the time series of data into the DL network. Performance results show that this method is a relatively better approach than others used in previous works, especially when dealing with huge volumes of data. Testing results show 97.2% average recognition rate [33].

Fig. 13. CLDNN architecture ([33], Fig. 6).

Download Full Size | PDF

3. One-Dimensional Convolutional Neural Networks and Bidirectional Long Short-Term Memory

In this article [15], a combination of 1D-CNN and a BiLSTM to form a 1DCNN-BiLSTM classification algorithm outperformed both 1D-CNN, 2D-CNN and a regular CNN in terms of accuracy, precision, recall, and number of events as demonstrated in Table 5. In this model, 1D-CNN was used to extract detailed temporal-structural features for each signal node, and then a customized Bi-LSTM network was applied for construction of spatial relationships among several different signal nodes; thus, the proposed method works by creating a relationship between spatial and temporal information [15].

Table 5. Comparison of Experimental Results of Five Events for the Aforementioned Methods^a

View Table | View all tables in this article

In 2021, Yang et al. [27] also proposed a 1DCNN and a BiLSTM method for four event classifications, namely: background noise, manual excavation, mechanical excavation, and vehicle driving in a pipeline early warning system. In their experiment, the input features are first fed into a 1DCNN to extract the spatial features before being fed into BiLSTM to obtain bidirectional and complex relations. The experiment was done in two different frequencies, i.e., 500 Hz yielding 99.26% accuracy and 100 Hz yielding 97.20% accuracy.

In 2021, Tian et al. [70] designed an attention-based TCN with a BiLSTM model (ATCN-BiLSTM) for $\Phi$-OTDR, achieving an average classification accuracy of 99.6% with 0 NAR on three types of events. In the BiLSTM model, two LSTMs are separately used for accepting inputs where the first LSTM takes a forward sequence of raw data while the second LSTM takes a reverse sequence of data. By applying BiLSTM, it helps to obtain the long-term dependence and bidirectional and complex relations in space domain. On the other hand, the attention mechanism can be used to focus on the key features and fiber sections to help reduce the number of parameters and process faster, while the TCN is employed to look for the long-term dependence of signals in the time-domain.

In summary, CNNs and similar networks have recently been widely trained and applied in tens of applications regarding $\varphi$-OTDR systems for event classifications. However, in most cases, studies indicate that, by replacing the traditional softmax layer as the activation function of the model with other classifiers like SVM either multi-cast SVMs or linear SVMs, it usually yields better classification results in terms of accuracy and other metrics of the confusion matrix. More than 93% accuracy was attained by replacing the softmax layer after the FC layer and replacing it by the multiclass SVM in a 40 km fiber optic sensor cable [55]. Wu et al. demonstrated a tremendous rise in accuracy by combining a 1D-CNN with an SVM classifier, which achieved approximately 98% accuracy, and the result is better than a 95% accuracy using 1D-CNN with a regular softmax layer in oil and gas monitoring pipelines [37]. However, one thing to keep in mind is that the size of the network should be as uncluttered and as neat as possible to make the network small and faster, which eventually increases the training and classification speed.

C. Generative Adversarial Network

The GAN was introduced in recent years, but since then it has been applicable in many different applications with $\varphi$-OTDR been one of them. A GAN is a useful tool used for learning in a given data the distribution, usually consisting of two models, namely: a generator as well as a discriminator, of which are trained as adversaries [64,113]. First, the generator is trained for capturing data distribution while the discriminator is trained to be able to differentiate between the generated data and the real data. When the generator generates data that the discriminator fails to discriminate from the real data, then the training is eventually terminated [114].

In 2018, Shiloh et al. [50] introduced a GAN model for efficient training of $\varphi$-OTDR data. In this paper, the GAN was used to transform simulated data in order to mimic genuine data based on relatively small experimental dataset, which is manually labeled. This technique is verified to be effective by yielding up to 94% classification accuracy in a 5 km long $\varphi$-OTDR sensing system; however, the author did not say anything about the NAR.

In 2019, Shiloh et al. [64] demonstrated a modified GAN architecture called C-GAN to perform a classification of three different events which are footsteps, noises, and vehicles in 5 km and 20 km long fibers, respectively. The experimental results show 83% accuracy for the shorter fiber and 80.2% accuracy for the longer. In the 5 km long sensing fiber, the NAR was as high as 54% initially but reduced to 45% only after fine-tuning the network with both experimental and simulation data, while the ${\rm{f1}}$-score climbed from an initial 87.72%–89.85% after refinement. For the 20 km fiber sensing system, the experiments show an evident decrease in SNR and the ${\rm{f1}}$-score reaching to 87.23% from an initial 82.64% when trained with experimental data only. In summary, these experiments were conducted using three different kinds of training data, which are case 1, experimental dataset only; case 2, simulation dataset only; and case 3, experimental dataset with simulation dataset and the refiner for fine-tuning the algorithms, and it is evident that combining a mixture of these all converge the algorithm to the best results. Overall, GANs seems to provide best classification accuracy, but the NAR is way too high and not very convincing for now.

D. Semi-Supervised DL method

In 2021, Yang et al. [24] proposed a novel semi-supervised DL model for long-distance pipeline safety early warning. Due to higher costs of collecting data especially for longer pipes, the method is proposed in order to capitalize on utilization of unlabeled data, as they believe this could reduce experimental costs and also address the model migration problem due to a smaller model size and latency. This event recognition and localization method is based on several components, namely: SSAE, BiLSTM, and self-attention. The SSAE is an event recognizer; it comprises a stacked auto-encoder (for layer-wise training of data) and sparse auto-encoder (for sparsely compression of data), which in the end helps to reduce model size and improve performance. And BiLSTM and self-attention techniques are utilized as independent components in the SSAE model.

The experiment was conducted to recognize four events, namely: background noise, manual excavation, mechanical excavation, and vehicle digging in different sections of an 85 km real-life oil pipeline deployed by PipeChina Northern Pipeline Company. The experimental results show high average recognition accuracies of 94.47% for 100 Hz data and 97.06% for 500 Hz data. Additionally, the average latency for 100 Hz data is 0.68 ms while for 500 Hz it is 1.73 ms. The architecture for SSAE is shown in Fig. 14.

Fig. 14. Sparse stacked auto-encoder model ([24], figure in abstract).

Download Full Size | PDF

The authors suggest to further use the few-shot learning mechanism and change the implementation from tensor flows to C++, by which they believe the model latency can be reduced by 15 times.

7. DISCUSSION

In Table 1, we presented a comparative analysis for different event classification methods in $\varphi$-OTDR systems. The results in this table have been summarized from different experimental results presented in different research papers as discussed in the above sections of this review. We have organized our table such that for every classification method, we have included key details including; the number of disturbance events, the length of the fiber optic cable (measurement range), the application field, the spatial resolution, the feature extraction method used (if any), and experimental results. The experimental results are discussed in terms of recognition accuracy (for most papers) while some papers also presented other performance metrics like precision, recall, ${\rm{f1}}$-score, NAR, and IDT.

The results from Table 1 clearly indicate efficiency and/or effectiveness of an algorithm; however, there may be several key factors influencing these results. Therefore, we should be careful when evaluating the performance of these algorithms. Results are affected by several issues like in some cases there are too small training datasets, which may cause a generalization problem, so we really cannot jump into conclusions. Due to the absence of public open datasets for $\varphi$-OTDR, it is hard to draw fair and consistent results for all discussed experiments in $\varphi$-OTDR systems (with the code of the models not open). Also, in some cases there can be over-fitting issues whereas training datasets may be fitted too perfectly into a classification model, hence leading to higher accuracy. Another issue is the inconsistency in the number of events to be classified, as can be seen from Table 1 some algorithms classified fewer events (lowest being 2) while some classified up to eight events. In such cases, if an algorithm can recognize many types of events with higher accuracy, then it is much more efficient. Another factor to consider is the length of the sensing fiber cable. Some methods are only effective over a very short distance like 1 km, while some are more robust over noises and disturbances as they can still maintain accuracy over a long distance (up to 65 km).

In Table 1, we have seen MLP recording higher accuracy of over 99% with a NAR of less than once per month, and this can be regarded as the most effective method since no other algorithm can come close in terms of NAR; however, a two-event dataset was used. PNN records an accuracy of over 98% with the NAR of just 1.5% using FFT, power spectrum, and wavelet denoising methods during signal processing. ATCN + BiLSTM record a high accuracy of 99.6% with the lowest NAR of 0% for three-event classification, although the dataset is relatively small. SVMs and its variations have performed better as well whether standalone or when combined with CNN to replace the softmax layer. Nevertheless, only NC-SVM has had success in both attaining higher accuracy (94.3%) for five different event classes as well as lowering the NAR (5.62%) in 0.55 s IDT using wavelet packet denoising technique for feature-prepossessing. These are strongly convincing results especially for over 25 km long optical fiber. XGBoost algorithm has proved to be one the best classifiers according to [51] by recording 99% accuracy for seven events in a 20 km cable using spectral subtraction and FFT signal processing methods, while in another study [77] XGBoost recorded over 95% accuracy using the EMD energy analysis signal processing method with a lower NAR of about 4.1% in 0.09 s IDT. The ELM method combined with the Fisher score feature extraction method to form the F-ELM model may not be a very popular method in ML, but it has proved to be very effective by recording 4.67% NAR, which is the third lowest NAR after PNN and XGBoost. The F-ELM has higher classification accuracy (95%) for five different events in under 0.1 s IDT (which is even faster than the PNN) inside a 25.05 km long fiber cable.

Moreover, a semi-supervised DL method SSAE for pipeline surveillances also achieved very good results of over 94% and 97% average identification accuracy for 100 Hz and 500 Hz data, respectively. The method has proved to be very effective since it uses a significant amount of unlabeled data with only a small amount of labeled data, which makes semi-supervised methods useful especially for real-time applications with new data coming and an ever-growing dataset.

Another successful method in our review is the CLDNN, which has a high accuracy of 85% in a 40 km long fiber cable, and an 8% NAR, which is the lowest by any DL algorithm in our review. Other metrics recorded an 85% ${\rm{f1}}$-score with a lower recall of 69% [39].

A combination of CNN and HMM in [81] proved to be very effective by achieving over 98% for all confusion matrix parameters (accuracy, recall, precision, and ${\rm{f}}$-score) in a 34 km long fiber cable. Also a combination of GMM and HMM in [35] performed well by achieving high accuracy of 91% especially for a long-distance (50 km) cable, but with a very low NAR, it is (53.7%).

RF was used in three different occasions, and in both cases it performed well. Its highest recorded accuracy is 98% with the lowest being 92% while its counterpart DT recorded a fair 89% accuracy when classifying five events in a 37.5 km fiber cable.

One GMM instance recorded 90% accuracy, which is pretty satisfying; however, other two instances saw GMM recording the lowest of the results. In one instance, GMM achieved accuracy of 68.11%, which is the lowest in this review with the highest 55.6% NAR by using ST-FFT feature extraction method in a 45 km fiber cable; another instance of GMM using the contextual feature extraction method achieved 69.7% accuracy and 31.2% NAR, which is still poor. For these latter two GMM cases, the results are not convincing compared to most algorithms discussed in our paper, and in both cases, however, the authors did not state the reasons for such poor results in terms of lower accuracy and such higher NARs.

In summary, algorithms presented in this paper have managed to achieve higher classification accuracy of 90%–99.6%, and some of them lie between 80%–90% with few of them lying between 60%–70%. Since these experiments are held in different environments under different conditions, it is unfair to jump into conclusions that higher measurements always ensure better $\varphi$-OTDR system performances. However, the aim is to achieve as much accuracy (along with other confusion matrix parameters) as possible while maintaining the lowest NAR possible. A clear detailed summary of these methods is shown in Table 1.

A. Summary of the Comparative Analysis

Table 1 provides a descriptive analytical comparison between dozens of approaches applied in $\varphi$-OTDR with the main goal being to mitigate the major challenges facing $\varphi$-OTDR systems in different environments. There is no one single better optimal classifier for all $\varphi$-OTDR applications, but usually it depends on a number of factors like the nature of the environment, sensitivity of the sensors, length of the sensing fiber cable, application domain, and many others. However, in order to improve the efficiency and effectiveness of traditional ML algorithms, more attention should be paid to both (during the classification stage and the feature extraction stage), but for DL models it is different as we only need to concentrate on the classification stage, thanks to their automatic feature processing ability.

Currently in order to accomplish the main $\varphi$-OTDR objectives, many attempts have been made, including the addition of intelligent and extra-sensitive sensors, the overall $\varphi$-OTDR architecture improvements, the deployment of better signal processing methods for quality feature extractions from traditional to modern techniques, and many others. However, most importantly, there has been a huge development in terms of event classification algorithms, ranging from conventional ML methods to DL classifiers that are capable of dealing with complex data generated in different application areas. As a matter of fact, the latter has been the basis of this paper.

The technique of combining two or more algorithms to form one coherent and robust classifier seems to work the better by outperforming most of single-algorithm approaches in most cases, e.g., DL algorithms like the CNN in parallel with ML methods like SVM, RF, or LSTM have demonstrated better results as compared to the CNN acting alone provided other conditions remain constant. Even though most research works have presented better performance results, some works still lack adequate information toward attaining particular results while some lack enough performance measures to evaluate the classifiers; for example, some articles claim higher event identification accuracy but do not explain the number of NAR or the IDT, making it so hard to conclude the system’s performance in basing on just accuracy alone. Some papers, despite having higher accuracy, lower NAR, and low IDT, do not state some useful experimental parameters they operated in, like the length of the sensing fiber, the number/types of events used for signal generation, or even the feature extraction method, which are key ingredients when assessing the success of $\varphi$-OTDR systems.

B. Future Works and Recommendation

According to our review, both classic ML and DL methods have demonstrated fair performances in $\varphi$-OTDR systems. Although DL models are not significantly better than their ML counterparts for the time being, they could likely be the future focus as they be greatly improved for unique merits, i.e., no need to design an extra feature extraction module, end-to-end model, better transfer learning abilities, and potential better performance with larger datasets as for now the dataset is relatively small. Lowering NAR while maintain higher accuracy would be still an ongoing major challenge facing $\varphi$-OTDR systems, as most papers have managed to achieve high recognition accuracy but only few of them have managed to record low NARs. We believe more attention might need to be paid to the algorithms that recorded lower NAR rates (below 10%), which includes the PNN [65] by recording just 1.5% while maintaining 98% accuracy, ANN [40], NC-SVM [97], XGBoost [77], F-ELM [80], ConvLSTM [47], and ATCN [70].

In future works, we believe the application of semi-supervised learning techniques should be recommended. Since semi-supervised learning models can work on huge amounts of unlabeled data with only a small amount of labeled data, this might be a huge advantage to improve performances of $\varphi$-OTDR systems especially for real-time applications where there is a limited amount of labeled data. Also semi-supervised learning models might be more useful since they can reduce data collection cost in environments where it is technically hard or very expensive to generate enough labeled data for training. Additionally, $\varphi$-OTDR systems can be deployed in a wide range of applications discussed in Subsection 1.A of this paper. However, from our review, we have observed that most research works have focused on perimeter monitoring, followed by pipelines surveillance features and high-speed railway tracking, while the rest of the areas have not been explored as much. So we encourage future research works to explore other practical areas of $\varphi$-OTDR applications like underwater surveillance, military, airports surveillance, seismic waves prediction, and so forth.

Finally, almost every dataset in this review is small in size considering the requirements for DL methods, and they do not follow a uniform format. Also, unlike in the computer science field, the codes for each model are always accessible outside their respective research groups. It would definitely be very helpful for both scientific researches and industries in this area if a uniform public open dataset base is built and codes of the models are publicized for a fair comparison.

8. CONCLUSION

In this survey paper, we have first introduced $\varphi$-OTDR working mechanism. As our main focus, we have reviewed and presented in details the recent ML and DL methods for event classifications in $\varphi$-OTDR systems and also reviewed feature extraction methods used for signal processing in $\varphi$-OTDR systems. We have prepared a table (Table 1) to summarize the details of each event classification algorithm reviewed along with its important details including the performance results. Finally, we have discussed the performance of event classification algorithms as presented in recent papers, outlined some pros and cons for each of them, and recommended possible future ways forward for the future works in order to improve the performance of $\varphi$-OTDR systems.

Funding

Fundamental Research Funds for the Central Universities (2020JBM024); National Natural Science Foundation of China (61805008); Outstanding Chinese and Foreign Youth Exchange Program of China Association of Science and Technology; National Research Foundation Singapore (NRF) Central Gap Fund (NRF2020NRF-CG001-040).

Acknowledgment

Deus F. Kandamali would like to extend his sincere gratitude to the China Scholarship Council (CSC) scholarship for funding his Ph.D.

Disclosures

The authors declare no conflicts of interest.

Data availability

No data were generated or analyzed in the presented research.

REFERENCES

1. K. Yüksel, J. Jason, and M. Wuilpart, “Development of a phase-OTDR interrogator based on coherent detection scheme,” Uludağ Univ. J. Fac. Eng. 23, 355–370 (2018). [CrossRef]

2. M. M. Sherif, E. M. Khakimova, J. Tanks, and O. E. Ozbulut, “Cyclic flexural behavior of hybrid SMA/steel fiber reinforced concrete analyzed by optical and acoustic techniques,” Compos. Struct. 201, 248–260 (2018). [CrossRef]

3. Y. Wang, H. Yuan, X. Liu, Q. Bai, H. Zhang, Y. Gao, and B. Jin, “A comprehensive study of optical fiber acoustic sensing,” IEEE Access 7, 85821–85837 (2019). [CrossRef]

4. K. O. Hill, Y. Fujii, D. C. Johnson, and B. S. Kawasaki, “Photosensitivity in optical fiber waveguides: application to reflection filter fabrication,” Appl. Phys. Lett. 32, 647–649 (1978). [CrossRef]

5. C. Li, Z. Mei, J. Tang, K. Yang, and M. Yang, “Distributed acoustic sensing system based on broadband ultra-weak fiber Bragg grating array,” in 26th International Conference on Optical Fiber Sensors (Optical Society of America, 2018), paper ThE14.

6. S. K. Ibrahim, M. Farnan, D. M. Karabacak, and J. M. Singer, “Enabling technologies for fiber optic sensing,” Proc. SPIE 9899, 98990Z (2016). [CrossRef]

7. Y.-N. Tan, Y. Zhang, and B.-O. Guan, “Simultaneous measurement of temperature, hydrostatic pressure and acoustic signal using a single distributed Bragg reflector fiber laser,” Proc. SPIE 7753, 77539S (2011). [CrossRef]

8. C. Wang, Y. Shang, X. Liu, C. Wang, H. Wang, and G. Peng, “Interferometric distributed sensing system with phase optical time-domain reflectometry,” Photon. Sens. 7, 157–162 (2017). [CrossRef]

9. M. Chojnacki and N. Palka, “Demodulation of output signals from unbalanced fibre optic Michelson interferometer,” in Modern Problems of Radio Engineering, Telecommunications and Computer Science (IEEE Cat. No.02EX542) (2002), pp. 249–250.

10. X. Liu, C. Wang, Y. Shang, C. Wang, W. Zhao, G. Peng, and H. Wang, “Distributed acoustic sensing with Michelson interferometer demodulation,” Photon. Sens. 7, 193–198 (2017). [CrossRef]

11. O. Kilic, M. J. F. Digonnet, G. S. Kino, and O. Solgaard, “Miniature photonic-crystal hydrophone optimized for ocean acoustics,” J. Acoust. Soc. Am. 129, 1837–1850 (2011). [CrossRef]

12. J. Leng and A. Asundi, “Structural health monitoring of smart composite materials by using EFPI and FBG sensors,” Sens. Actuators A Phys. 103, 330–340 (2003). [CrossRef]

13. L. Liu, P. Lu, S. Wang, X. Fu, Y. Sun, D. Liu, J. Zhang, H. Xu, and Q. Yao, “UV adhesive diaphragm-based FPI sensor for very-low-frequency acoustic sensing,” IEEE Photon. J. 8, 1–9 (2016). [CrossRef]

14. F. Wang, Z. Shao, Z. Hu, H. Luo, J. Xie, and Y. Hu, “Micromachined fiber optic Fabry-Perot underwater acoustic probe,” Proc. SPIE 9283, 52–58 (2014). [CrossRef]

15. H. Wu, M. Yang, S. Yang, H. Lu, C. Wang, and Y. Rao, “A novel das signal recognition method based on spatiotemporal information extraction with 1DCNNs-BiLSTM network,” IEEE Access 8, 119448 (2020). [CrossRef]

16. L. Tie-Gen, Y. Zhe, J. Jun-Feng, L. Kun, Z. Xue-Zhi, D. Zhen-Yang, W. Shunag, H. Hao-Feng, H. Qun, Z. Hong-Xia, and L. Zhi-Hong, “Advances of some critical technologies in discrete and distributed optical fiber sensing research,” Acta Phys. Sin. 66, 070705 (2017). [CrossRef]

17. Q. Chen, C. Jin, Y. Bao, Z. Li, J. Li, C. Lu, L. Yang, and G. Li, “A distributed fiber vibration sensor utilizing dispersion induced walk-off effect in a unidirectional Mach-Zehnder interferometer,” Opt. Express 22, 2167–2173 (2014). [CrossRef]

18. W. Fang, Q. Jia, S. Zhen, J. Chen, X. Cheng, and B. Yu, “Low coherence fiber differentiating interferometer and its passive demodulation schemes,” Opt. Fiber Technol. 21, 34–39 (2015). [CrossRef]

19. M. Zyczkowski, M. Szustakowski, N. Palka, and M. Kondrat, “Fiber optic perimeter protection sensor with intruder localization,” Proc. SPIE 5611, 71–78 (2004). [CrossRef]

20. R. Zinsou, X. Liu, Y. Wang, J. Zhang, Y. Wang, and B. Jin, “Recent progress in the performance enhancement of phase-sensitive OTDR vibration sensing systems,” Sensors (Switzerland) 19, 1709 (2019). [CrossRef]

21. H. Wu, S. Xiao, X. Li, Z. Wang, J. Xu, and Y. Rao, “Separation and determination of the disturbing signals in phase-sensitive optical time domain reflectometry (Φ-OTDR),” J. Lightwave Technol. 33, 3156–3162 (2015). [CrossRef]

22. M. R. Fernández-Ruiz, M. A. Soto, E. F. Williams, S. Martin-Lopez, Z. Zhan, M. Gonzalez-Herraez, and H. F. Martins, “Distributed acoustic sensing for seismic activity monitoring,” APL Photon. 5, 030901 (2020). [CrossRef]

23. S. Merlo, P. Malcovati, M. Norgia, A. Pesatori, C. Svelto, A. Pniov, A. Zhirnov, E. Nesterov, and V. Karassik, “Runways ground monitoring system by phase-sensitive optical-fiber OTDR,” in IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace) (2017), pp. 523–529.

24. Y. Yang, H. Zhang, and Y. Li, “Long-distance pipeline safety early warning: a distributed optical fiber sensing semi-supervised learning method,” IEEE Sens. J. 21, 19453–19461 (2021). [CrossRef]

25. Z. Peng, J. Jian, H. Wen, A. Gribok, M. Wang, H. Liu, S. Huang, Z.-H. Mao, and K. P. Chen, “Distributed fiber sensor and machine learning data analytics for pipeline protection against extrinsic intrusions and intrinsic corrosions,” Opt. Express 28, 27277–27292 (2020). [CrossRef]

26. J. Chen, H. Wu, X. Liu, Y. Xiao, M. Wang, M. Yang, and Y. Rao, “A real-time distributed deep learning approach for intelligent event recognition in long distance pipeline monitoring with DOFS,” in Proceedings International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) (2018), pp. 290–296.

27. Y. Yang, Y. Li, T. Zhang, Y. Zhou, and H. Zhang, “Early safety warnings for long-distance pipelines: a distributed optical fiber sensor machine learning approach,” Proc. AAAI Conf. Artif. Intell. 35, 14991–14999 (2021).

28. F. Peng, H. Wu, X.-H. Jia, Y.-J. Rao, Z.-N. Wang, and Z.-P. Peng, “Ultra-long high-sensitivity Φ-OTDR for high spatial resolution intrusion detection of pipelines,” Opt. Express 22, 13804–13810 (2014). [CrossRef]

29. Z. Zhao, D. Liu, L. Wang, and S. Liu, Feature Extraction and Identification of Pipeline Intrusion Based on Phase-Sensitive Optical Time Domain Reflectometer BT—Wireless and Satellite Systems, M. Jia, Q. Guo, and W. Meng, eds. (Springer International Publishing, 2019), pp. 665–675.

30. H. Wu, X. Liu, Y. Xiao, and Y. Rao, “A dynamic time sequence recognition and knowledge mining method based on the hidden Markov models (HMMs) for pipeline safety monitoring with Φ-OTDR,” J. Lightwave Technol. 37, 4991–5000 (2019). [CrossRef]

31. J. Li, Y. Wang, P. Wang, Q. Bai, Y. Gao, H. Zhang, and B. Jin, “Pattern recognition for distributed optical fiber vibration sensing: a review,” IEEE Sens. J. 21, 11983–11998 (2021). [CrossRef]

32. Y. Shi, Y. Wang, L. Zhao, and Z. Fan, “An event recognition method for Φ-OTDR sensing system based on deep learning,” Sensors (Switzerland) 19, 3421 (2019). [CrossRef]

33. Y. Bai, J. Xing, F. Xie, S. Liu, and J. Li, “Detection and identification of external intrusion signals from 33 km optical fiber sensing system based on deep learning,” Opt. Fiber Technol. 53, 102060 (2019). [CrossRef]

34. Z. Wang, H. Zheng, L. Li, J. Liang, X. Wang, B. Lu, Q. Ye, R. Qu, and H. Cai, “Practical multi-class event classification approach for distributed vibration sensing using deep dual path network,” Opt. Express 27, 23682–23692 (2019). [CrossRef]

35. J. Tejedor, J. Macias-Guarasa, H. F. Martins, S. Martin-Lopez, and M. Gonzalez-Herraez, “A Gaussian mixture model-hidden Markov model (GMM-HMM)-based fiber optic surveillance system for pipeline integrity threat detection,” in Optics InfoBase Conference Papers (2018), Part F124, pp. 3–6.

36. H. Maral and M. Aktaş, “Field independent target classification analysis in distributed acoustic sensing systems,” in 27th Signal Processing and Communications Applications Conference (SIU) (2019), pp. 1–4.

37. H. Wu, J. Chen, X. Liu, Y. Xiao, M. Wang, Y. Zheng, and Y. Rao, “One-dimensional CNN-based intelligent recognition of vibrations in pipeline monitoring with DAS,” J. Lightwave Technol. 37, 4359–4366 (2019). [CrossRef]

38. J. Tejedor, J. Macias-Guarasa, H. F. Martins, D. Piote, J. Pastor-Graells, S. Martin-Lopez, P. Corredera, and M. Gonzalez-Herraez, “A novel fiber optic based surveillance system for prevention of pipeline integrity threats,” Sensors (Switzerland) 17, 355 (2017). [CrossRef]

39. H. F. Martins, D. Piote, J. Tejedor, J. Macias-Guarasa, J. Pastor-Graells, S. Martin-Lopez, P. Corredera, F. De Smet, W. Postvoll, C. H. Ahlen, and M. Gonzalez-Herraez, “Early detection of pipeline integrity threats using a smart fiber optic surveillance system: the PIT-STOP project,” Proc. SPIE 9634, 96347X (2015). [CrossRef]

40. H. Wu, Y. Qian, W. Zhang, and C. Tang, “Feature extraction and identification in distributed optical-fiber vibration sensing system for oil pipeline safety monitoring,” Photon. Sens. 7, 305–310 (2017). [CrossRef]

41. J. Tejedor, J. Macias-Guarasa, H. F. Martins, J. Pastor-Graells, P. Corredera, and S. Martin-Lopez, “Machine learning methods for pipeline surveillance systems based on distributed acoustic sensing: a review,” Appl. Sci. 7, 841 (2017). [CrossRef]

42. R. Peng, Z. Liu, and S. Li, “Perimeter monitoring of urban buried pipeline subject to third-party intrusion based on fiber optic sensing and convolutional neural network,” Proc. SPIE 11209, 112091Z (2019). [CrossRef]

43. J. Tejedor, H. F. Martins, D. Piote, J. Macias-Guarasa, J. Pastor-Graells, S. Martin-Lopez, P. C. Guillén, F. De Smet, W. Postvoll, and M. González-Herráez, “Toward prevention of pipeline integrity threats using a smart fiber-optic surveillance system,” J. Lightwave Technol. 34, 4445–4453 (2016). [CrossRef]

44. Y. Hu, Z. Meng, M. Zabihi, Y. Shan, S. Fu, F. Wang, X. Zhang, Y. Zhang, and B. Zeng, “Performance enhancement methods for the distributed acoustic sensors based on frequency division multiplexing,” Electronics 8, 617 (2019). [CrossRef]

45. A. Lv and J. Li, “On-line monitoring system of 35 kV 3-core submarine power cable based on φ-OTDR,” Sens. Actuators A Phys. 273, 134–139 (2018). [CrossRef]

46. M. L. Filograno, C. Riziotis, and M. Kandyla, “A low-cost phase-OTDR system for structural health monitoring: design and instrumentation,” Instruments 3, 46 (2019). [CrossRef]

47. Z. Li, J. Zhang, M. Wang, Y. Zhong, and F. Peng, “Fiber distributed acoustic sensing using convolutional long short-term memory network: a field test on high-speed railway intrusion detection,” Opt. Express 28, 2925–2938 (2020). [CrossRef]

48. J. Jason, K. Yüksel, and M. Wuilpart, “Laboratory evaluation of a phase-OTDR setup for railway monitoring applications,” in IEEE Photonics Society, 22nd Annual Symposium (2017), pp. 2–6.

49. M. He, L. Feng, and J. Fan, “A method for real-time monitoring of running trains using Φ-OTDR and the improved Canny,” Optik (Stuttgart) 184, 356–363 (2019). [CrossRef]

50. L. Shiloh, A. Eyal, and R. Giryes, “Deep learning approach for processing fiber-optic DAS seismic data,” in Optics InfoBase Conference Papers (2018), Part F124.

51. A. V. Timofeev and D. I. Groznov, “Classification of seismoacoustic emission sources in fiber optic systems for monitoring extended objects,” Optoelectron. Instrum. Data Process. 56, 50–60 (2020). [CrossRef]

52. I. Ölçer and A. Öncü, “Adaptive temporal matched filtering for noise suppression in fiber optic distributed acoustic sensing,” Sensors (Switzerland) 17, 1288 (2017). [CrossRef]

53. S. Grosswig, H. Dijk, M. Den Hartogh, T. Pfeiffer, M. Rembe, M. Perk, and L. Domurath, “Leakage detection in a casing string of a brine production well by means of simultaneous fibre optic DTS/DAS measurements,” Oil Gas Eur. Mag. 45(4), 161–169 (2019). [CrossRef]

54. S. A. Abufana, Y. Dalveren, A. Aghnaiya, and A. Kara, “Variational mode decomposition-based threat classification for fiber optic distributed acoustic sensing,” IEEE Access 8, 100152–100158 (2020). [CrossRef]

55. C. Xu, J. Guan, M. Bao, J. Lu, and W. Ye, “Pattern recognition based on time-frequency analysis and convolutional neural networks for vibrational events in φ-OTDR,” Opt. Eng. 57, 016103 (2018). [CrossRef]

56. T. Wen, P. Zhu, W. Ye, M. Bao, and J. Guan, “Application of graphics processing unit parallel computing in pattern recognition for vibration events based on a phase-sensitive optical time domain reflectometer,” Appl. Opt. 58, 7127–7133 (2019). [CrossRef]

57. A. K. Fedorov, M. N. Anufriev, A. A. Zhirnov, K. V. Stepanov, E. T. Nesterov, D. E. Namiot, V. E. Karasik, and A. B. Pnev, “Note: Gaussian mixture model for event recognition in optical time-domain reflectometry based sensing systems,” Rev. Sci. Instrum. 87, 036107 (2016). [CrossRef]

58. H. Wu, X. Li, Z. Peng, and Y. Rao, “A novel intrusion signal processing method for phase-sensitive optical time-domain reflectometry (Φ-OTDR),” Proc. SPIE 9157, 915750 (2014). [CrossRef]

59. M. Adeel, C. Shang, K. Zhu, and C. Lu, “Nuisance alarm reduction: using a correlation based algorithm above differential signals in direct detected phase-OTDR systems,” Opt. Express 27, 7685 (2019). [CrossRef]

60. L. Y. Shao, S. Liu, S. Bandyopadhyay, F. Yu, W. Xu, C. Wang, H. Li, M. I. Vai, L. Du, and J. Zhang, “Data-driven distributed optical vibration sensors: a review,” IEEE Sens. J. 20, 6224–6239 (2020). [CrossRef]

61. J. Wu, Z. Peng, M. Wang, R. Cao, M. J. Li, H. Wen, H. Liu, and K. P. Chen, “Fabrication of ultra-weak fiber Bragg grating (UWFBG) in single-mode fibers through Ti-doped silica outer cladding for distributed acoustic sensing,” in Optical Sensors and Sensing Congress (ES, FTS, HISE, Sensors), OSA Technical Digest (Optica Publishing Group, 2019), paper ETh1A.4.

62. O. V. Butov, Y. K. Chamorovskii, K. M. Golant, A. A. Fotiadi, J. Jason, S. M. Popov, and M. Wuilpart, “Sensitivity of high Rayleigh scattering fiber in acoustic/vibration sensing using phase-OTDR,” Proc. SPIE 10680, 106801B (2018). [CrossRef]

63. M. Bublin, “Event detection for distributed acoustic sensing: combining knowledge-based, classical machine learning, and deep learning approaches,” Sensors 21, 7527 (2021). [CrossRef]

64. L. Shiloh, A. Eyal, S. Member, and R. Giryes, “Efficient processing of distributed acoustic sensing data using a deep learning approach,” J. Lightwave Technol. 37, 4755–4762 (2019). [CrossRef]

65. L. Wu, “Study on the fiber-optic perimeter sensor signal processor based on neural network classifier,” in Proceedings—IEEE 2011 10th International Conference on Electronic Measurement & Instruments (ICEMI) (2011), Vol. 1, pp. 93–97.

66. J. George, L. Mary, and K. S. Riyas, “Vehicle detection and classification from acoustic signal using ANN and KNN,” in International Conference on Control Communication & Computing (ICCC) (2013), pp. 436–439.

67. M. Aktas, T. Akgun, M. U. Demircin, and D. Buyukaydin, “Deep learning based multi-threat classification for phase-OTDR fiber optic distributed acoustic sensing applications,” Proc. SPIE 10208, 102080G (2017). [CrossRef]

68. A. V. Makarenko, “Deep learning algorithms for signal recognition in long perimeter monitoring distributed fiber optic sensors,” IEEE International Workshop on Machine Learning for Signal Processing (MLSP), November 2016, pp. 1–11.

69. S. Wang, F. Liu, and B. Liu, “Research on application of deep convolutional network in high-speed railway track inspection based on distributed fiber acoustic sensing,” Opt. Commun. 492, 126981 (2021). [CrossRef]

70. M. Tian, H. Dong, and K. Yu, “Attention based Temporal convolutional network for φ-OTDR event classification,” in 19th International Conference on Optical Communications and Networks (ICOCN) (2021), pp. 1–3.

71. C. Cao, X. Fan, Q. Liu, and Z. He, “Practical pattern intrusion monitoring distributed optical fiber recognition system for based on Φ$ \Phi$COTDR,” ZTE Commun. 27, 2282–2283 (2017).

72. X. X. Qi, J. W. Ji, X. W. Han, and Z. H. Yuan, “An approach of passive vehicle type recognition by acoustic signal based on SVM,” in 3rd International Conference on Genetic and Evolutionary Computing (WGEC) (2009), pp. 545–548.

73. C. Wiesmeyr, M. Litzenberger, M. Waser, A. Papp, H. Garn, G. Neunteufel, and H. Döller, “Real-time train tracking from distributed acoustic sensing data,” Appl. Sci. 10, 448 (2020). [CrossRef]

74. C. Xu, J. Guan, M. Bao, J. Lu, and W. Ye, “Pattern recognition based on enhanced multifeature parameters for vibration events in φ-OTDR distributed optical fiber sensing system,” Microw. Opt. Technol. Lett. 59, 3134–3141 (2017). [CrossRef]

75. Q. Sun, H. Feng, X. Yan, and Z. Zeng, “Recognition of a phase-sensitivity OTDR sensing system based on morphologic feature extraction,” Sensors 15, 15179–15197 (2015). [CrossRef]

76. Y. Wang, P. Wang, K. Ding, H. Li, J. Zhang, X. Liu, Q. Bai, D. Wang, and B. Jin, “Pattern recognition using relevant vector machine in optical fiber vibration sensing system,” IEEE Access 7, 5886–5895 (2019). [CrossRef]

77. Z. Wang, S. Lou, S. Liang, and X. Sheng, “Multi-class disturbance events recognition based on EMD and XGBoost in φ-OTDR,” IEEE Access 8, 63551–63558 (2020). [CrossRef]

78. X. Wang, Y. Liu, S. Liang, W. Zhang, and S. Lou, “Event identification based on random forest classifier for Φ-OTDR fiber-optic distributed disturbance sensor,” Infrared Phys. Technol. 97, 319–325 (2019). [CrossRef]

79. J. Wang, Y. Hu, and Y. Shao, “The digging signal identification by the random forest algorithm in the phase-OTDR technology,” IOP Conf. Ser. Mater. Sci. Eng. 394, 032005 (2018). [CrossRef]

80. H. Jia, S. Lou, S. Liang, and X. Sheng, “Event identification by F-ELM model for φ-OTDR fiber-optic distributed disturbance sensor,” IEEE Sens. J. 20, 1297–1305 (2020). [CrossRef]

81. H. Wu, S. Yang, X. Liu, C. Xu, H. Lu, C. Wang, K. Qin, Z. Wang, Y. J. Rao, and A. O. Olaribigbe, “Simultaneous extraction of multi-scale structural features and the sequential information with an end-to-end mCNN-HMM combined model for DAS,” J. Lightwave Technol. 39, 6606–6616 (2021). [CrossRef]

82. X. Chen and C. Xu, “Disturbance pattern recognition based on an ALSTM in a long-distance φ-OTDR sensing system,” Microw. Opt. Technol. Lett. 62, 168–175 (2020). [CrossRef]

83. F. Uyar, T. Onat, C. Unal, T. Kartaloglu, E. Ozbay, and I. Ozdur, “A direct detection fiber optic distributed acoustic sensor with a mean SNR of 7.3 dB at 102.7 km,” IEEE Photon. J. 11, 1–8 (2019). [CrossRef]

84. A. Ahmed and Z. Yu, “Research status of distributed optical fiber sensing system based on phase-sensitive optical time domain reflectometry,” Int. J. Sci. Res. 9, 834–840 (2019). [CrossRef]

85. Y. Muanenda, “Recent advances in distributed acoustic sensing based on phase-sensitive optical time domain reflectometry,” J. Sens. 2018, 3897873 (2018). [CrossRef]

86. Z. Pan, K. Liang, Q. Ye, H. Cai, R. Qu, and Z. Fang, “Phase-sensitive OTDR system based on digital coherent detection,” in Asia Communications and Photonics Conference and Exhibition (ACP) (2011), pp. 1–6.

87. F. Uyar, T. Onat, C. Unal, T. Kartaloglu, I. Ozdur, and E. Ozbay, “94.8 km-range direct detection fiber optic distributed acoustic sensor,” in Conference on Lasers Electro-Optics (CLEO) (2019), pp. 2–3.

88. H. He, L. Yan, H. Qian, Y. Zhou, X. Zhang, B. Luo, W. Pan, X. Fan, and Z. He, “Suppression of the interference fading in phase-sensitive OTDR with phase-shift transform,” J. Lightwave Technol. 8724, 295–302 (2020). [CrossRef]

89. X. Liang, Z. Ge, L. Sun, M. He, and H. Chen, “LSTM with wavelet transform based data preprocessing for stock price prediction,” Math. Probl. Eng. 2019, 1340174 (2019). [CrossRef]

90. Z. Qin, L. Chen, and X. Bao, “Continuous wavelet transform for non-stationary vibration detection with phase-OTDR,” Opt. Express 20, 20459–20465 (2012). [CrossRef]

91. K. Liu, P. Ma, J. An, Z. Li, J. Jiang, P. Li, L. Zhang, and T. Liu, “Endpoint detection of distributed fiber sensing systems based on STFT algorithm,” Opt. Laser Technol. 114, 122–126 (2019). [CrossRef]

92. S. Liang, X. Sheng, and S. Lou, “Experimental investigation on lower nuisance alarm rate phase-sensitive OTDR using the combination of a Mach–Zehnder interferometer,” Infrared Phys. Technol. 75, 117–123 (2016). [CrossRef]

93. Z. Rehman, M. T. Mirza, A. Khan, and H. Xhaard, “Predicting G-protein-coupled receptors families using different physiochemical properties and pseudo amino acid composition,” in G Protein Coupled Receptors, P. Conn, ed. (Academic, 2013), Vol. 522, Chap. 4, pp. 61–79.

94. S. K. Satapathy, S. Dehuri, A. K. Jagadev, and S. Mishra, “Introduction,” in EEG Brain Classification for Epileptic Seizure Detection, S. K. Satapathy, S. Dehuri, A. K. Jagadev, and S. Mishra, eds. (Academic, 2019), Chap. 1, pp. 1–25.

95. B. Mohebali, A. Tahmassebi, A. Meyer-Baese, and A. H. Gandomi, “Probabilistic neural networks: a brief overview of theory, implementation, and application,” in Handbook of Probabilistic Models, P. Samui, D. Tien Bui, S. Chakraborty, and R. C. Deo, eds. (Butterworth-Heinemann, 2020), Chap. 14, pp. 347–367.

96. Y. Tang, “Deep learning using linear support vector machines,” arXiv:1306.0239 (2013).

97. H. Jia, S. Liang, S. Lou, and X. Sheng, “A k-nearest neighbor algorithm-based near category support vector machine method for event identification of ℓ-OTDR,” IEEE Sens. J. 19, 3683–3689 (2019). [CrossRef]

98. J. Hu and P. W. Tse, “A relevance vector machine-based approach with application to oil sand pump prognostics,” Sensors (Basel) 13, 12663–12686 (2013). [CrossRef]

99. D. Shrivastava, S. Sanyal, A. K. Maji, and D. Kandar, “Bone cancer detection using machine learning techniques,” in Smart Healthcare for Disease Diagnosis and Prevention, S. Paul and D. Bhatia, eds. (Academic, 2020), Chap. 17, pp. 175–183.

100. B. Williams, C. Halloin, W. Löbel, F. Finklea, E. Lipke, R. Zweigerdt, and S. Cremaschi, “Data-driven model development for cardiomyocyte production experimental failure prediction,” in 30 European Symposium on Computer Aided Process Engineering, S. Pierucci, F. Manenti, G. L. Bozzano, and D. Manca, eds. (Elsevier, 2020), Vol. 48, pp. 1639–1644.

101. S. Misra and H. Li, “Noninvasive fracture characterization based on the classification of sonic wave travel times,” in Machine Learning for Subsurface Characterization, S. Misra, H. Li, and J. He, eds. (Gulf Professional Publishing, 2020), Chap. 9, pp. 243–287.

102. Z.-L. Sun, H. Wang, W.-S. Lau, G. Seet, and D. Wang, “Application of BW-ELM model on traffic sign recognition,” Neurocomputing 128, 153–159 (2014). [CrossRef]

103. F. K. Inaba, E. O. Teatini Salles, S. Perron, and G. Caporossi, “DGR-ELM–distributed generalized regularized ELM for classification,” Neurocomputing 275, 1522–1530 (2018). [CrossRef]

104. C. Cao, X. Fan, Q. Liu, and Z. He, “Practical pattern recognition system for distributed optical fiber intrusion monitoring system based on phase-sensitive coherent OTDR,” in Asia Communication Photonics Conference (ACPC) (2015), pp. 2–4.

105. M. Zhang, Y. Li, J. Chen, Y. Song, J. Zhang, and M. Wang, “Event detection method comparison for distributed acoustic sensors using φ-OTDR,” Opt. Fiber Technol. 52, 101980 (2019). [CrossRef]

106. P. Kannadaguli and V. Bhat, “A comparison of Gaussian mixture modeling (GMM) and hidden Markov modeling (HMM) based approaches for automatic phoneme recognition in Kannada,” in International Conference on Signal Processing and Communication (ICSC) (2015), pp. 257–260.

107. B.-J. Yoon, “Hidden Markov models and their applications in biological sequence analysis,” Curr. Genomics 10, 402–415 (2009). [CrossRef]

108. Y. C. Manie, J. W. Li, P. C. Peng, R. K. Shiu, Y. Y. Chen, and Y. T. Hsu, “Using a machine learning algorithm integrated with data de-noising techniques to optimize the multipoint sensor network,” Sensors (Switzerland) 20, 1070 (2020). [CrossRef]

109. Z. Jiang, Y. Lai, J. Zhang, H. Zhao, and Z. Mao, “Multi-factor operating condition recognition using 1D convolutional long short-term network,” Sensors (Switzerland) 19, 5488 (2019). [CrossRef]

110. Ö. Yildirim, “A novel wavelet sequences based on deep bidirectional LSTM network model for ECG signal classification,” Comput. Biol. Med. 96, 189–202 (2018). [CrossRef]

111. B. Rosenhahn and B. Andres, Pattern Recognition, D. Hutchison and T. Kanade, eds. (Springer Nature, 2016).

112. T. N. Sainath, A. R. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for LVCSR,” in IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings (ICASSP) (2013), pp. 8614–8618.

113. J. Yoon, D. Jarrett, and M. van der Schaar, “Time-series generative adversarial networks,” in Advances in Neural Information Processing Systems (2019), Vol. 32, pp. 1–11.

114. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems (2014), Vol. 3, pp. 2672–2680.

	Predicted Class
Actual Class	Population	Positive	Negative
	Positive	True Positive (TP)	False Negative (FN)
	Negative	False Positive (FP)	True Negative (TN)

Classifier	No. of Events	Accuracy	Precision	Recall	F1-Score
HMM	5	0.982	0.9905	0.9826	0.9860
SVMM	5	0.919	0.9295	0.9402	0.9296
RF	5	0,928	0.9206	0.9264	0.9231
XGB	5	0.937	0.9327	0.9638	0.9462
DT	5	0.892	0.8829	0.8927	0.8860
BN	5	0.783	0.8301	0.7971	0.8046

Classifier	No. of Events	Accuracy	Precision	Recall
1DCNN	5	0.9290	0.9276	0.9286
1DCNN & CNN	5	0.9490	0.9504	0.9496
2DCNN	5	0.9490	0.9466	0.9490
1DCNN & BiLSTM	5	0.9700	0.9762	0.9690

	Predicted Class
Actual Class	Population	Positive	Negative
	Positive	True Positive (TP)	False Negative (FN)
	Negative	False Positive (FP)	True Negative (TN)

Classifier	No. of Events	Accuracy	Precision	Recall	F1-Score
HMM	5	0.982	0.9905	0.9826	0.9860
SVMM	5	0.919	0.9295	0.9402	0.9296
RF	5	0,928	0.9206	0.9264	0.9231
XGB	5	0.937	0.9327	0.9638	0.9462
DT	5	0.892	0.8829	0.8927	0.8860
BN	5	0.783	0.8301	0.7971	0.8046

Classification Methods	No. of Events	Fiber Length	Application Field	Spatial Resolution	Preprocessing Method	Accuracy	Precision	Recall	f-score	NAR	IDT
ANN [40]	3	65 km	Oil pipeline safety monitoring	–	WPD	94.4%	–	–	–	5.6%	–
ANN [40]	3	65 km	Oil pipeline safety monitoring	–	WD	91.1%	–	–	–	8.9%	–
MLP [63]	2	17 km	Pipeline safety monitoring	10 m	Signal smoothing and filtering	99.88%	99.87%	99.80%	0.99	$< 1 p e r$ month	0.55 µs
C-GAN [64]	3	20 km	Seismic wave detection	10.3 m	–	80.2%	–	–	89.5%	45%	–
GAN [64]	3	5 km	Seismic wave detection	5.5 m	–	83%	–	–	87.72%	54%	–
PNN [65]	4	–	Safety alarm detection	–	FFT, Power spectrum, WD	98%	–	–	–	1.5%	–
KNN + ANN [66]	3	–	Vehicle detection	–	MFCC	72.33%	–	–	–	–	–
CNN [32]	5	1 km	Home-made Perimeter security	10 m	No Preprocessing. Temporal-spatial data matrix as CNN inputs	96.67%	–	–	–	–	–
CNN [67]	6	50 km	Threat detection	10 m	WD, SFTF, FFT	93%	98.10%	–	–	–	–
CNN [25]	4	5 km	Pipeline safety monitoring	20 m	–	85%	–	–	–	–
CNN [68]	7	50 km	Long perimeter monitoring	–	“Hand engineered”	91%	92.06%	–	91.39%	–	–
CNN [42]	6	8 km	Pipeline monitoring	10 m	No preprocessing	91%	–	–	–	–	–
CNN [69]	4	1.5 km	High-speed railway track inspection	10 m	–	98.04%	–	–	–	–	–
1DCNN + SVM [37]	5	34 (35) km	Oil pipeline safety monitoring	8 (10) m	WPD	97.5%	97.95%	97.16%	97.52%	–	–
1DCNN + Softmax [37]	5	34 (35) km	Oil pipeline monitoring	8 (10) m	WPD	95.7%	95.19%	95.10%	95.03%	–	–
1DCNN + BiLSTM [27]	4	48 km	Oil and gas safety pipeline monitoring	20 m	–	99.26% (500 Hz) 97.20% (100 Hz)	–	–	–	–	–
1DCNN + BiLSTM [15]	5	40 km	Urban safety monitoring	5 m	No preprocessing	97%	97.06%	96.9%	97.06%	–	–
1DCNN + XGB [37]	5	34 (35) km	Pipeline monitoring	8 (10) m	WPD	96.68%	97.61%	96.92%	97.25%	–	–

Classifier	No. of Events	Identification Rate (%)	Identification Time (s)
RF	2	98.67	–
F-ELM	5	95.33	$< 0.1$
BP Network	3	94.4	–
SVM	4	93.8	0.6
CNN + SVM	4	93.3	0.6
RVM	3	88.6	–
CNN	5	82.1	–

Abstract

1. INTRODUCTION

A. Significance of $\phi$-OTDR Systems

B. Challenges Facing $\phi$-OTDR Systems

C. Brief Introduction to This Review

2. UNDERLYING $\phi$-OTDR WORKING PRINCIPLE

3. SIGNAL PREPROCESSING TECHNIQUES IN $\phi$-OTDR

4. EVALUATION METRICS FOR MACHINE/DEEP LEARNING METHODS IN $\phi$-OTDR

A. Nuisance Alarm Rates

B. Identification Time

C. Confusion Matrix

5. MACHINE LEARNING ALGORITHMS FOR EVENTS CLASSIFICATIONS IN $\phi$-OTDR

A. Artificial Neural Networks

1. Basic ANN

2. MLP

3. Probability-Based Neural Network

B. Support Vector Machines

1. Basic SVM

2. Near Category Support Vector Machine

3. Linear Support Vector Machine

4. Relevance Vector Machine

C. ${{K}}$-Nearest Neighbors

D. Random Forest

E. Extreme Learning Machine

F. Extreme Gradient Boosting

G. Probabilistic Approach: Gaussian Mixture Models and Hidden Markov Models

1. HMMs

2. mCNN-HMM

3. GMMs

4. GMM-HMM

6. DEEP LEARNING ALGORITHMS FOR EVENT CLASSIFICATION IN $\phi$-OTDR

A. Long Short-Term Memory

1. Basic LSTM

2. Attention-Based Long Short-Term Memory

B. Convolutional Neural Networks

1. Basic CNNs

2. Convolutional Long Short-Term Neural Network

3. One-Dimensional Convolutional Neural Networks and Bidirectional Long Short-Term Memory

C. Generative Adversarial Network

D. Semi-Supervised DL method

7. DISCUSSION

A. Summary of the Comparative Analysis

B. Future Works and Recommendation

8. CONCLUSION

Funding

Acknowledgment

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (14)

Tables (5)

Equations (7)

Applied Optics