Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Optical signal detection in turbid water using multidimensional integral imaging with deep learning

Open Access Open Access

Abstract

Optical signal detection in turbid and occluded environments is a challenging task due to the light scattering and beam attenuation inside the medium. Three-dimensional (3D) integral imaging is an imaging approach which integrates two-dimensional images from multiple perspectives and has proved to be useful for challenging conditions such as occlusion and turbidity. In this manuscript, we present an approach for the detection of optical signals in turbid water and occluded environments using multidimensional integral imaging employing temporal encoding with deep learning. In our experiments, an optical signal is temporally encoded with gold code and transmitted through turbid water via a light-emitting diode (LED). A camera array captures videos of the optical signals from multiple perspectives and performs the 3D signal reconstruction of temporal signal. The convolutional neural network-based bidirectional Long Short-Term Network (CNN-BiLSTM) network is trained with clear water video sequences to perform classification on the binary transmitted signal. The testing data was collected in turbid water scenes with partial signal occlusion, and a sliding window with CNN-BiLSTM-based classification was performed on the reconstructed 3D video data to detect the encoded binary data sequence. The proposed approach is compared to previously presented correlation-based detection models. Furthermore, we compare 3D integral imaging to conventional two-dimensional (2D) imaging for signal detection using the proposed deep learning strategy. The experimental results using the proposed approach show that the multidimensional integral imaging-based methodology significantly outperforms the previously reported approaches and conventional 2D sensing-based methods. To the best of our knowledge, this is the first report on underwater signal detection using multidimensional integral imaging with deep neural networks.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Underwater signal detection in turbid water and occluded conditions is of great importance for many applications such as target detection, information transmission, and marine exploration [15]. Optical sensing using conventional image sensors in these environments is difficult as the captured images may provide reduced visibility and a low signal-to-noise ratio (SNR). Various approaches have been proposed for signal detection in a turbid medium, such as a correlation filter-based detector [2,3], polarimetric-based approaches [3], single-pixel detector [6], etc. Deep learning-based approaches have recently gained attraction for classification and recognition tasks due to their generalized learning capabilities and higher accuracy [79]. However, optical signal detection under degraded conditions remains a challenge, especially in cases where signals are partially occluded or embedded in the scattering medium.

In the previously reported correlation-based approach with multidimensional integral imaging for the detection of temporally encoded optical signals in the turbid medium [13], 3D integral imaging was shown to outperform conventional 2D imaging for signal detection capabilities. However, correlation-based filters have limitations, such as reduced generalization capability as compared with CNNs for classification and recognition tasks.

In recent years, different types of deep neural networks, namely, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM), have been proposed to detect the signal in atmospheric turbulence and scattering medium [1013], which can achieve better detection accuracy than the conventional methods. In this paper, temporally encoded optical signals are transmitted through a turbid and occluded underwater environment. This signal is captured using a CMOS image sensor array, and the recorded 2D elemental video frames are reconstructed using 3D integral imaging to improve the visibility of the optical signal in degraded conditions [14]. The CNN-BiLSTM based detector is used to detect the source signal in the reconstructed 3D video sequence. The proposed deep neural network is trained with clear water video sequences to perform the classification on the binary transmitted signal. To improve the performance and the generalization capabilities of the neural network model, we have used data augmentation to increase the size of the training dataset and to simulate noisy conditions. Since we use a 7-bit encoding scheme for the transmitted data sequence, a sliding window-based is used to extract each 7-frame video sequence from the recorded data. Each video sequence is fed to a pretrained CNN to extract the spatial feature vectors representing each frame. These features are then fed to a Bi-LSTM network which learns the temporal pattern hidden inside the encoded data. We demonstrate through our experiments that the conventional signal detection algorithms may fail under turbid and occluded underwater environments, whereas the proposed deep learning approach enhances the signal detection performance under such conditions. The performance of the proposed CNN-BiLSTM based detector is measured using performance metrics such as receiver operating characteristic (ROC) curves, area under the curve (AUC), and the number of bit errors. The signal detection results are compared for various turbidity conditions to demonstrate the improved performance of the proposed approach.

The rest of the paper is organized as follows: integral imaging and CNN-BiLSTM based detection are described in Section 2, followed by the details of the optical imaging system and underwater data collection procedure as described in Section 3. The experimental results and discussions are given in Section 4, and finally, the conclusions are presented in Section 5.

2. Methodology

2.1 3D integral imaging

Integral imaging (InIm) is a three-dimensional (3D) imaging technique, first proposed by Lippmann [15], which consists of the camera pickup and computational reconstruction process. In the pickup or capture process, multiple 2D elemental images are captured, and both intensity and directional information of optical rays are recorded [1619]. These elemental images can be recorded by a lenslet array, an array of cameras, or a moving camera, as shown in Fig. 1(a). In the reconstruction process, a pinhole model is assumed, and the rays are back-propagated through their corresponding virtual pinholes at a particular depth plane to provide depth information about the 3D scene [2022]. Mathematically, the reconstruction process can be described as follows:

$$R({x,y;z;t} )= \frac{1}{{O(x,y;t)}}\sum\limits_{m = 0}^{K - 1} {\sum\limits_{n = 0}^{L - 1} {E{I_{m,n}}} \left( {x - m\frac{{{N_x}{P_x}f}}{{{C_x}z}},y - n\frac{{{N_y}{P_y}f}}{{{C_y}z}};t} \right)}$$
where R (x, y; z; t) is the integral imaging reconstructed image at depth z and at time frame t, x and y are the pixel indices of each elemental image. O(x, y, t) is the number of overlapping pixels at time frame t. z is the reconstruction distance denoted as z = zair +zw/nw, where zair is a distance in air medium, zw is the distance in the water, and nw is the refractive index of water. Nx, Ny represents the total number of pixels of each elemental image, K and L are the number of elemental images obtained in x and y directions, and EIm,n (·) is the elemental image in the mth column and nth row. By shifting and overlapping the elemental images, the reconstructed image is obtained on a specific depth plane. Px and Py are the pitch between adjacent image sensors on the camera array, and f is the focal length of the camera lens. Cx and Cy are the width and height of the image sensor, respectively. The camera pickup process and the computational reconstruction process are depicted in Fig. 1(a) and 1(b) respectively.

 figure: Fig. 1.

Fig. 1. 3D integral imaging process: (a) Pickup stage of the integral imaging system and (b) computational volumetric reconstruction process for integral imaging.

Download Full Size | PDF

2.2 CNN-BiLSTM based optical signal detection

Figure 2 shows the detailed framework of CNN-BiLSTM network used for optical signal detection. First, a 3D reconstructed video is obtained using the integral imaging-based computational reconstruction algorithm. During the training phase, reconstructed 7-frame video sequences are used to train an end-to-end CNN-BiLSTM network. A pretrained GoogLeNet [23] trained on the ImageNet dataset [24] is used to extract the spatial features at each frame of the captured video sequence, followed by a bidirectional recurrent neural network to extract the temporal information content of the transmitted video sequence for each class [9]. The feature vectors are taken from the last pooling layer (“pool5-7 × 7_s1”) of the GoogLeNet and used as the feature vector representing the spatial information content of each video frame. The bi-directional long short-term memory (BiLSTM) network is a variant of the long short-term memory (LSTM) network, where two separate LSTM networks are used to extract the temporal information along the forward and backward direction [2528]. The details of the LSTM cell structure are described in Appendix A. Assuming an M-bit encoding scheme is utilized, the information encoded in an M-frame sequence can be used to produce a k dimensional feature vector ${{\textbf x}_i}$, i=1,2,3,…M for each frame ${I_n}$, n=1,2,3,…M. Thus, for each M bit frame sequence, we get the feature matrix ${\textbf X}$ by concatenating the feature vectors ${{\textbf x}_i}$. The feature matrix ${\textbf X}$ for each video is then fed into a Bi-LSTM network, and its output is fed to a fully connected layer followed by a softmax layer for classification. During the testing phase, a sliding window-based approach is used to extract 7-frame video sequences from the recorded data sequence which is fed to the CNN-BiLSTM model. We have used BiLSTM layer with 100 hidden units to learn the temporal dependency of the feature vectors. The Bi-LSTM classification network was trained for 100 epochs with a batch size of 10 at each iteration. The network weights and biases were optimized using Adam optimizer with a learning rate of 0.0001 for the first 50 epochs and 0.00001 for the final 50 epochs. The Adam is a first order gradient-based optimization algorithm for stochastic objective functions. It is based on the adaptive estimates of lower-order moments. It is suitable for non-stationary objectives, noisy/spare gradients and have been proved to be robust and well suited for wide class of non-convex machine learning problems [29]. We have used the dropout layer with a dropout rate of 0.3. All the above parameters were chosen by tuning the hyperparameters of the network using 10% of the training dataset as validation data.

 figure: Fig. 2.

Fig. 2. Block diagram for Integral imaging-based optical signal detection using proposed deep learning architecture: InIm: Integral imaging, CNN: convolutional neural network, LSTM: Long Short-Term Memory Network. Red dotted block represents the bidirectional LSTM architecture and the arrows inside the block indicate the connection to forward and backward LSTMs.

Download Full Size | PDF

3. Experimental methods

This section describes the experimental setup, underwater data collection, and method for the proposed multidimensional integral imaging-based system in a turbid and occluded environment. The experimental setup is shown in Fig. 3. The light source used is a light-emitting diode (LED), operating at 630 nm, and is allowed to pass through a turbid water tank having dimensions 500(W) × 250(L) × 250(H) mm. The turbidity of the water is controlled through the addition of antacid, and experiments were carried out at different turbidity levels to assess the performance of the proposed system at various turbidities. The optical signals were coded using the gold code sequence [30]. The signal is transmitted through a turbid underwater medium with occlusion at a speed of 20 bits per second. When the transmitted data is 1, the 7-bit gold code is transmitted, and when the transmitted data is 0, the 7-bit flipped gold code is transmitted. Thus, the temporal structural difference between the videos representing class “1” and “0” could be utilized by the LSTM network to create a decision boundary between the two classes. For optical signal transmission, we choose an 8-bit data sequence as [1, 0, 0, 1, 1, 0, 1, 0], wherein each bit of the original data is coded with a 7-bit gold code ([1,1,0,0,1,0,1]) yielding a 56-bit encoded signal. An occluded underwater environment is created by placing an artificial plant in front of the light source inside the water tank. To quantify the turbidity level, we use Beer's coefficient, derived from the Beer-Lambert's law, which states that I = I0e-αd, where I0 is initial intensity, I is the intensity after propagating a distance d in a turbid medium, and $\alpha $ is Beer’s coefficient. For the measurement of α a sample of water was taken from each turbidity condition, and the intensity of light I0 and I was measured at two locations with d=10 mm, where d is distance between the I0 and I. The intensities I0 and I are measured using the optical power meter and detector (Newport 818-SL/DB Silicon) at a wavelength of 630 nm with a 10 mm aperture. In this experiment, α ranges from 0.0047 to 0.0318 mm-1. The transmitted optical signals were captured using a 3 × 3 camera array consisting of G-192 GigE cameras and C-mount zoom lenses with a focal length of 20 mm. The pitch between cameras is 80 mm in both the horizontal and vertical directions. The spatial resolution of the camera sensor is 1600(H) × 1200(V) with a pixel size of 4.5 µm × 4.5 µm, and the camera array is synchronized to record video data set at a frame rate of 20 fps. In our experiments, the signal transmission and the receiving cameras are synchronized such that each recorded frame corresponds to the transmission of exactly 1 bit of the transmitted signal. This synchronization was done in order to transmit signals at the maximum speed allowable by the recording camera's framerate. Recent developments in camera technology allows much faster frame rates of 1 million frame per second.

 figure: Fig. 3.

Fig. 3. Water tank with turbid water and underwater plant to mimic underwater occlusion, a 3 × 3 camera array for 3D integral imaging pickup. Optical signal sent by the red LED light source. The external white light LED is used to mimic ambient light for shallow water scenes.

Download Full Size | PDF

The sliding window approach (depicted in Fig. 4) is used to slice each possible 7-frame video sequence from the transmitted data sequence. Thus, a series of 7 frame video sequences, Ui (i, i+1,…, i+6), are obtained from the transmitted signal using the sliding window approach, where i=1,2,3,…N-6 and N is the total number of frames in the encoded transmitted video sequence. There are 36 unique possible 7-frame video sequences, of these 36 possibilities, we can categorize them into three different cases: case 1) where the 7-frame video sequence exactly overlaps with the gold code sequence (class ‘1’), case 2) where the 7-frame video sequence exactly overlaps with the flipped gold code sequence (class ‘0’), and case 3) the remaining 34 possibilities wherein the sliced 7-frame video sequence neither matches exactly the gold code nor the flipped gold code sequences (‘idle’ class). Thus, to train our proposed CNN-BiLSTM optical signal detection network, we created three different class labels, ‘1’, ‘0’, and ‘idle’. In particular, our interest is to detect the classes ‘1’ and ‘0’. However, in our experiments, the inclusion of the ‘idle’ class in the training dataset results in better discrimination between the two classes of interest. For training the network, the transmitted signals are recorded in clear water. We recorded data with two different F numbers, F1.8 and F8. Thus, in the training dataset, before augmentation, we have two (one corresponding to each F number) videos each for class ‘1’ and class ‘0’, and 68 ‘idle’ class videos. To reduce network instability or bias towards the ‘idle’ class, the data sets of classes ‘0’ and ‘1’ are repeated 34 times in order to obtain 68 videos for each class.

 figure: Fig. 4.

Fig. 4. Proposed sliding window with CNN-BiLSTM-based classification approach.

Download Full Size | PDF

In total, we have 204 videos for training, 68 video sequences corresponding to each class. To improve the performance and generalization capability of the CNN-BiLSTM network in degraded environments, we have performed data augmentation on the training dataset. In a turbid water environment, we encounter optical signal attenuation and blurring. Therefore, we adopted two data augmentation models: (1) attenuation plus additive Gaussian noise, and (2) blurring plus additive Gaussian noise. Attenuation plus additive Gaussian noise model is modeled as Idegraded=β× Ioriginal +n, where Idegraded is the degraded captured image (data), β is the attenuation factor, Ioriginal is the original image (data), and n is the additive noise modeled as Gaussian with mean µ and variance σ2. Similarly, the blurring plus additive Gaussian noise model can be written as Idegraded=h Ⓧ Ioriginal +n, where represents the convolution operator and h is a 2D Gaussian kernel given as h=1/(2π s2)exp(-(x2+y2)) with zero mean (0,0) and variance s. To generate the augmented dataset, the augmentation models are applied to each of the training video sequences. For estimating the parameters of the augmentation model, a dataset of signal ‘on’ and ‘off’ images were recorded at various turbidity levels ranging from α=0.005 to 0.033 mm-1 without occlusion. Parameters µ and σ2 are estimated from the fitted normal distribution on the histogram of ‘on’ and ‘off’ images using maximum likelihood estimation at various turbidities. The additive Gaussian noise distribution mean and variance vary in the range of 0.015 to 0.032 and 0.004 to 0.01, respectively. The parameter s is assumed to be chosen from a uniformly distributed random variable ranging from 1 to 10. The parameter β is chosen from uniformly distributed random values ranging from 0 to 1. Since the test data contains the clear water image (β = 1) and turbid water images (β <1), it is reasonable to assume a range of β from 0 to 1.

The CNN-BiLSTM classifier outputs the classification scores for the respective classes, the maximum value between the classification scores of ‘0’ and ‘1’ is selected; if the selected value corresponds to class ‘0’, then it is multiplied by -1 else, if it corresponds to class ‘1’, then the selected value remains unchanged as shown in Fig. 4. The above score selection method is applied on the each Ui (i, i+1,…,i+6) video to generate a transformed classification score sequence S(i). For testing the performance of the proposed approach, we recorded data with occlusion and turbidity; a sample 2D elemental image is recorded using a central camera with an aperture size of F1.8 is shown in Fig. 5(a). The integral imaging-based computational reconstruction algorithm, as outlined in Eq.1, enhances the visibility of the optical signal in a turbid and occluded environment, as illustrated in Fig. 5(b) at α = 0.0047 mm−1.

 figure: Fig. 5.

Fig. 5. (a) Images of the underwater occluded signal viewed from the central camera perspective, taken at α = 0.0047 mm−1, and (b) reconstructed image using 3D integral imaging with occlusion at α = 0.0047 mm−1.

Download Full Size | PDF

The transformed classification scores S(i) of the transmitted video sequence should have high and low peaks corresponding to the transmission of binary sequences 1 and 0, respectively. Given the 8-bit original data and 7-bit coding scheme, the transformed classification score S(i) is expected to have 8 prominent locations of either local minima or maxima values, each separated by 7-bits. By summation of the prominence of these peaks separated by 7-bits, we can find the correct start frame for the signal transmission as the frame which gives the maximum sum of the prominence across the recorded signal [2]. The final classification of the transmitted binary data sequences can be done using thresholding of the transformed classification score S(i). For our results, we set the threshold to be ‘0’. For each turbidity level, the encoded 56-bit signal transmission experiment is repeated eight times corresponding to the 64-bits of original data to be decoded to compute the performance metrics. Thus, in total for testing, we have 2880 7-frame videos, 576 videos for each turbidity level. The overall flow chart for optical signal transmission and signal detection is shown in Fig. 6.

 figure: Fig. 6.

Fig. 6. Flow chart of the proposed system for (a) optical signal transmission and (b) deep learning-based detection in underwater communication. InIm denotes Integral Imaging.

Download Full Size | PDF

4. Results and discussion

The performance of the proposed underwater signal detection system for the experimental data under varying turbidity conditions is evaluated using the receiver operating characteristic (ROC) curves, the area under the curve (AUC), and the number of detection errors. The ROC analysis for underwater signal detection is shown in Fig. 7(a) for lower turbidity (α = 0.0114 mm−1) and in Fig. 7(b) for higher turbidity (α = 0.0318 mm−1) levels. The ROC for the 3D integral imaging system with CNN-BiLSTM (blue line) method achieves an AUC of 1 at α = 0.0114 mm−1 and 0.8965 at α = 0.0318 mm−1, which outperforms the other tested methods. The conventional 2D imaging system with CNN-BiLSTM (black line) gives AUC values of 0.5684, and 0.4805, respectively. The 3D integral imaging system with nonlinear correlation (green line) gives AUC values of 0.899, and 0.4561, respectively. Finally, the conventional 2D imaging with nonlinear correlation (red line) [1,31] give AUC values of 0.5117, and 0.4453, respectively, at turbidity levels of α = 0.0114 mm−1 and α = 0.0318 mm−1.

 figure: Fig. 7.

Fig. 7. ROC (Receiver operating characteristic) curves for underwater signal detection at a turbidity level (α=0.0114 mm-1 and α=0.0318 mm-1). Results are compared between the 3D integral imaging reconstructed video data with CNN-BiLSTM (blue line), a conventional 2D imaging system with CNN-BiLSTM (black line), a 3D integral imaging system with nonlinear correlation (green line), and conventional 2D imaging with nonlinear correlation (red line).

Download Full Size | PDF

The AUC and number of detection errors are shown in Fig. 8 as a function of Beer's coefficient. Figure 8(a) shows the AUC versus the Beer's law coefficient. The AUC of the proposed method maintains a higher value than that of all other tested methods. Also, from Fig. 8(b), the number of detection errors increases for all methods as the turbidity increases; however, the number of errors using the proposed CNN-BiLSTM approach is lower than that of all other tested methods.

 figure: Fig. 8.

Fig. 8. (a) Area under curves and (b) number of detection errors for underwater signal detection at various turbidity levels. Results are compared between the 3D integral imaging reconstructed video data with CNN-BiLSTM (blue line), a conventional 2D imaging system with CNN-BiLSTM (black line), a 3D integral imaging system with nonlinear correlation (green line), and conventional 2D imaging with nonlinear correlation (red line).

Download Full Size | PDF

From Fig. 7 and Fig. 8, the experimental comparison shows that the proposed CNN-BiLSTM approach is more effective than the previously reported correlation-based approach [2,3] in challenging experimental conditions, including occlusion and turbidity. Compared with the deep neural network-based approaches, the correlation-based approaches are less generalizable and particularly useful when limited training datasets are available. Even though, the 3D imaging system has increased computational load as well as calibration requirements as compared to a single camera system, in degraded environments and applications where the accuracy is of prime concern, the 3D integral imaging-based system proves to be useful. Thus, the proposed 3D integral imaging-based CNN- BiLSTM approach is promising for underwater signal detection in occluded and turbid environments. The estimated time (in seconds) of the proposed system is 9.2 and 6.10 respectively for the 3D integral imaging reconstruction and the sliding window-based CNN-BiLSTM classification. The computation time has been calculated for 72 frames video (0.21 seconds/frame). For 2D imaging, the computation time for sliding window approach and classification remains the same as 3D imaging. Thus, the total computation time for 2D imaging is 6.10 seconds (0.085 seconds/frame). However, the computation time for 3D integral imaging reconstruction can be further reduced by using GPU based stream-processing [32] and much faster cameras at mega frames per second which is a focus of our future work.

5. Conclusion

In summary, we have presented an underwater signal detection approach in turbid and occluded medium based on multidimensional integral imaging and deep neural networks. We have compared the performance of the proposed method with conventional 2D imaging and correlation-based approaches. The results show that multidimensional integral imaging substantially improves the performance of optical signal detection in comparison to other imaging modalities under degraded environments such as partial occlusion and turbidity. Future experiments may be extended in more challenging environments such as underwater turbulence [33], much faster cameras, and exploring other deep learning strategies which greatly reduce the computation as well as calibration requirements for enhanced optical signal detection under degraded conditions [34].

Appendix A: long short-term memory (LSTM) cell computation

Consider an input sequence of T time steps, ${\textbf x} = [{x_1},{x_2},{x_3},\ldots .,{x_T}]$, which is passed through a BiLSTM network [28]. In BiLSTM network, update at t-th hidden vector depend on both forward and backward directions. The forward LSTM flow is shown in Eq. (2) [25,28]. Similarly, the backward equation could be derived by replacing → with ←.

$$\overrightarrow {{i_t}} = \sigma ({\overrightarrow {{W_{xi}}} {x_t} + \overrightarrow {{W_{hi}}} {h_{t - 1}} + \overrightarrow {{b_i}} } )\; $$
$$\overrightarrow {{f_t}} = \sigma ({\overrightarrow {{W_{xf}}} {x_t} + \overrightarrow {{W_{hf}}} {h_{t - 1}} + \overrightarrow {{b_f}} } )\; \; $$
$$\overrightarrow {{o_t}} = \sigma ({\overrightarrow {{W_{xo}}} {x_t} + \overrightarrow {{W_{ho}}} {h_{t - 1}} + \overrightarrow {{b_o}} } )$$
$$\overrightarrow {{g_t}} = \textrm{tanh}({\overrightarrow {{W_{xc}}} {x_t} + \overrightarrow {{W_{hc}}} {h_{t - 1}} + \overrightarrow {{b_c}} } )\; \; \; \; $$
$$\overrightarrow {{c_t}} = \overrightarrow {{f_t}} \odot \overrightarrow {{c_{t - 1}}} + \overrightarrow {{i_t}} \odot \overrightarrow {{g_t}} \; \; $$
$$\overrightarrow {{h_t}} = \overrightarrow {{o_t}} \odot \textrm{tanh}({\overrightarrow {{c_t}} } )\; $$
where → and ← denote the forward and backward directions, respectively. it, ft and ot denote the input gate, forget gate, and output gate, respectively. These three gates use a sigmoid function, $\sigma(x )= 1/({1 + {e^{ - x}}} )$, to rescale the signal to [0, 1].gt and ht denote the modulated gate and hidden state at t-th time, respectively. The modulated gate uses a hyperbolic tangent function $tanh(x )= ({{e^x} - {e^{ - x}}} )/({{e^x} + {e^{ - x}}} )$ to rescale the signal to [-1, 1]. Here, Wkl and bk, k ε {x, h} and l ε {0, c, f, i} represent the corresponding weight matrices and bias terms of the network, respectively. Here, Θ denote the element-wise multiplication. The output of these forward and backward layers is merged, i.e., ${h_t} = f(\overrightarrow {{h_t}} ,\overleftarrow {{h_t}} )$. We have chosen the merging operator f to be a concatenation operator.

Funding

Air Force Office of Scientific Research (FA9550-18-1-0338, FA9550-21-1-0333); Office of Naval Research (N000141712405, N000142012690).

Acknowledgments

We wish to acknowledge support under The Office of Naval Research (ONR) (N000141712405, N000142012690), Air Force Office of Scientific Research (FA9550-18-1-0338, FA9550-21-1-0333). T. O'Connor acknowledges the Dept. of Education through the GAANN Fellowship.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Komatsu, A. Markman, and B. Javidi, “Optical sensing and detection in turbid water using multidimensional integral imaging,” Opt. Lett. 43(14), 3261–3264 (2018). [CrossRef]  

2. R. Joshi, T. O’Connor, X. Shen, M. Wardlaw, and B. Javidi, “Optical 4D signal detection in turbid water by multidimensional integral imaging using spatially distributed and temporally encoded multiple light sources,” Opt. Express 28(7), 10477–10490 (2020). [CrossRef]  

3. R. Joshi, G. Krishnan, T. O’Connor, and B. Javidi, “Signal detection in turbid water using temporally encoded polarimetric integral imaging,” Opt. Express 28(24), 36033–36045 (2020). [CrossRef]  

4. B. Javidi, A. Carnicer, J. Arai, T. Fujii, H. Hua, H. Liao, M. Martínez-Corral, F. Pla, A. Stern, L. Waller, Q.-H. Wang, G. Wetzstein, M. Yamaguchi, and H. Yamamoto, “Roadmap on 3D integral imaging: sensing, processing, and display,” Opt. Express 28(22), 32266–32293 (2020). [CrossRef]  

5. M. Dubreuil, P. Delrot, I. Leonard, A. Alfalou, C. Brosseau, and A. Dogariu, “Exploring underwater target detection by imaging polarimetry and correlation techniques,” Appl. Opt. 52(5), 997–1005 (2013). [CrossRef]  

6. E. Tajahuerce, V. Durán, P. Clemente, E. Irles, F. Soldevila, P. Andrés, and J. Lancis, “Image transmission through dynamic scattering media by single-pixel photodetection,” Opt. Express 22(14), 16945–16955 (2014). [CrossRef]  

7. N. Cohen, S. Shmilovich, Y. Oiknine, and A. Stern, “Deep neural network classification in the compressively sensed spectral image domain,” J. Electron. Imag. 30(04), 1–10 (2021). [CrossRef]  

8. H. Lee, I. Lee, T. Q. S. Quek, and S. H. Lee, “Binary signaling design for visible light communication: a deep learning framework,” Opt. Express 26(14), 18131–18142 (2018). [CrossRef]  

9. G. Krishnan, R. Joshi, T. O’Connor, F. Pla, and B. Javidi, “Human gesture recognition under degraded environments using 3D-integral imaging and deep learning,” Opt. Express 28(13), 19711–19725 (2020). [CrossRef]  

10. M. S. M. Alamgir, M. N. Sultana, and K. Chang, “Link Adaptation on an Underwater Communications Network Using Machine Learning Algorithms: Boosted Regression Tree Approach,” IEEE Access 8, 73957–73971 (2020). [CrossRef]  

11. B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. Bülow, D. Lavery, P. Bayvel, and L. Schmalen, “End-to-End Deep Learning of Optical Fiber Communications,” J. Lightwave Technol. 36(20), 4843–4855 (2018). [CrossRef]  

12. M. A. Amirabadi, M. H. Kahaei, and S. A. Nezamalhosseini, “Deep learning based detection technique for FSO communication systems,” Phys. Commun. 43, 101229 (2020). [CrossRef]  

13. S. Avramov-Zamurovic, A. T. Watnik, J. R. Lindle, K. P. Judd, and J. M. Esposito, “Machine learning-aided classification of beams carrying orbital angular momentum propagated in highly turbid water,” J. Opt. Soc. Am. A 37(10), 1662–1672 (2020). [CrossRef]  

14. X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: Sensing, display, and applications [Invited],” Appl. Opt. 52(4), 546–560 (2013). [CrossRef]  

15. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theor. Appl. 7(1), 821–825 (1908). [CrossRef]  

16. S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12(3), 483–491 (2004). [CrossRef]  

17. N. Davies, M. McCormick, and L. Yang, “Three-dimensional imaging systems: a new development,” Appl. Opt. 27(21), 4520–4528 (1988). [CrossRef]  

18. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36(7), 1598–1603 (1997). [CrossRef]  

19. G. Scrofani, J. Sola-Pikabea, A. Llavador, E. Sanchez-Ortiga, J. C. Barreiro, G. Saavedra, J. Garcia-Sucerquia, and M. Martínez-Corral, “FIMic: design for ultimate 3D-integral microscopy of in-vivo biological samples,” Biomed. Opt. Express 9(1), 335–346 (2018). [CrossRef]  

20. J. Arai, E. Nakasu, T. Yamashita, H. Hiura, M. Miura, T. Nakamura, and R. Funatsu, “Progress Overview of Capturing Method for Integral 3-D Imaging Displays,” Proc. IEEE 105(5), 837–849 (2017). [CrossRef]  

21. M. Yamaguchi, “Full-Parallax Holographic Light-Field 3-D Displays and Interactive 3-D Touch,” Proc. IEEE 105(5), 947–959 (2017). [CrossRef]  

22. M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics 10(3), 512–566 (2018). [CrossRef]  

23. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9 (2015).

24. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 248–255 (2009).

25. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput. 9(8), 1735–1780 (1997). [CrossRef]  

26. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]  

27. M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). [CrossRef]  

28. J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4694–4702 (2015).

29. S. Bock and M. Weiß, “A Proof of Local Convergence for the Adam Optimizer,” in 2019 International Joint Conference on Neural Networks (IJCNN), 1–8 (2019).

30. R. Gold, “Optimal binary sequences for spread spectrum multiplexing (Corresp.),” IEEE Trans. Inf. Theory 13(4), 619–621 (1967). [CrossRef]  

31. B. Javidi and D. Painchaud, “Distortion-invariant pattern recognition with Fourier-plane nonlinear filters,” Appl. Opt. 35(2), 318–331 (1996). [CrossRef]  

32. F. Yi, I. Moon, J.-A. Lee, and B. Javidi, “Fast 3D Computational Integral Imaging Using Graphics Processing Unit,” J. Disp. Technol. 8(12), 714–722 (2012). [CrossRef]  

33. Z. Vali, A. Gholami, Z. Ghassemlooy, M. Omoomi, and D. G. Michelson, “Experimental study of the turbulence effect on underwater optical wireless communications,” Appl. Opt. 57(28), 8314–8319 (2018). [CrossRef]  

34. M. Li and H. Li, “Application of deep neural network and deep reinforcement learning in wireless communication,” PLoS One 15(7), e0235447 (2020). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1.
Fig. 1. 3D integral imaging process: (a) Pickup stage of the integral imaging system and (b) computational volumetric reconstruction process for integral imaging.
Fig. 2.
Fig. 2. Block diagram for Integral imaging-based optical signal detection using proposed deep learning architecture: InIm: Integral imaging, CNN: convolutional neural network, LSTM: Long Short-Term Memory Network. Red dotted block represents the bidirectional LSTM architecture and the arrows inside the block indicate the connection to forward and backward LSTMs.
Fig. 3.
Fig. 3. Water tank with turbid water and underwater plant to mimic underwater occlusion, a 3 × 3 camera array for 3D integral imaging pickup. Optical signal sent by the red LED light source. The external white light LED is used to mimic ambient light for shallow water scenes.
Fig. 4.
Fig. 4. Proposed sliding window with CNN-BiLSTM-based classification approach.
Fig. 5.
Fig. 5. (a) Images of the underwater occluded signal viewed from the central camera perspective, taken at α = 0.0047 mm−1, and (b) reconstructed image using 3D integral imaging with occlusion at α = 0.0047 mm−1.
Fig. 6.
Fig. 6. Flow chart of the proposed system for (a) optical signal transmission and (b) deep learning-based detection in underwater communication. InIm denotes Integral Imaging.
Fig. 7.
Fig. 7. ROC (Receiver operating characteristic) curves for underwater signal detection at a turbidity level (α=0.0114 mm-1 and α=0.0318 mm-1). Results are compared between the 3D integral imaging reconstructed video data with CNN-BiLSTM (blue line), a conventional 2D imaging system with CNN-BiLSTM (black line), a 3D integral imaging system with nonlinear correlation (green line), and conventional 2D imaging with nonlinear correlation (red line).
Fig. 8.
Fig. 8. (a) Area under curves and (b) number of detection errors for underwater signal detection at various turbidity levels. Results are compared between the 3D integral imaging reconstructed video data with CNN-BiLSTM (blue line), a conventional 2D imaging system with CNN-BiLSTM (black line), a 3D integral imaging system with nonlinear correlation (green line), and conventional 2D imaging with nonlinear correlation (red line).

Equations (7)

Equations on this page are rendered with MathJax. Learn more.

R ( x , y ; z ; t ) = 1 O ( x , y ; t ) m = 0 K 1 n = 0 L 1 E I m , n ( x m N x P x f C x z , y n N y P y f C y z ; t )
i t = σ ( W x i x t + W h i h t 1 + b i )
f t = σ ( W x f x t + W h f h t 1 + b f )
o t = σ ( W x o x t + W h o h t 1 + b o )
g t = tanh ( W x c x t + W h c h t 1 + b c )
c t = f t c t 1 + i t g t
h t = o t tanh ( c t )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.