Optical signal detection in turbid water using multidimensional integral imaging with deep learning

Gokul Krishnan; Rakesh Joshi; Timothy O’Connor; Bahram Javidi

doi:10.1364/OE.440114

1. Introduction

Underwater signal detection in turbid water and occluded conditions is of great importance for many applications such as target detection, information transmission, and marine exploration [1–5]. Optical sensing using conventional image sensors in these environments is difficult as the captured images may provide reduced visibility and a low signal-to-noise ratio (SNR). Various approaches have been proposed for signal detection in a turbid medium, such as a correlation filter-based detector [2,3], polarimetric-based approaches [3], single-pixel detector [6], etc. Deep learning-based approaches have recently gained attraction for classification and recognition tasks due to their generalized learning capabilities and higher accuracy [7–9]. However, optical signal detection under degraded conditions remains a challenge, especially in cases where signals are partially occluded or embedded in the scattering medium.

In the previously reported correlation-based approach with multidimensional integral imaging for the detection of temporally encoded optical signals in the turbid medium [1–3], 3D integral imaging was shown to outperform conventional 2D imaging for signal detection capabilities. However, correlation-based filters have limitations, such as reduced generalization capability as compared with CNNs for classification and recognition tasks.

In recent years, different types of deep neural networks, namely, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM), have been proposed to detect the signal in atmospheric turbulence and scattering medium [10–13], which can achieve better detection accuracy than the conventional methods. In this paper, temporally encoded optical signals are transmitted through a turbid and occluded underwater environment. This signal is captured using a CMOS image sensor array, and the recorded 2D elemental video frames are reconstructed using 3D integral imaging to improve the visibility of the optical signal in degraded conditions [14]. The CNN-BiLSTM based detector is used to detect the source signal in the reconstructed 3D video sequence. The proposed deep neural network is trained with clear water video sequences to perform the classification on the binary transmitted signal. To improve the performance and the generalization capabilities of the neural network model, we have used data augmentation to increase the size of the training dataset and to simulate noisy conditions. Since we use a 7-bit encoding scheme for the transmitted data sequence, a sliding window-based is used to extract each 7-frame video sequence from the recorded data. Each video sequence is fed to a pretrained CNN to extract the spatial feature vectors representing each frame. These features are then fed to a Bi-LSTM network which learns the temporal pattern hidden inside the encoded data. We demonstrate through our experiments that the conventional signal detection algorithms may fail under turbid and occluded underwater environments, whereas the proposed deep learning approach enhances the signal detection performance under such conditions. The performance of the proposed CNN-BiLSTM based detector is measured using performance metrics such as receiver operating characteristic (ROC) curves, area under the curve (AUC), and the number of bit errors. The signal detection results are compared for various turbidity conditions to demonstrate the improved performance of the proposed approach.

The rest of the paper is organized as follows: integral imaging and CNN-BiLSTM based detection are described in Section 2, followed by the details of the optical imaging system and underwater data collection procedure as described in Section 3. The experimental results and discussions are given in Section 4, and finally, the conclusions are presented in Section 5.

2. Methodology

2.1 3D integral imaging

Integral imaging (InIm) is a three-dimensional (3D) imaging technique, first proposed by Lippmann [15], which consists of the camera pickup and computational reconstruction process. In the pickup or capture process, multiple 2D elemental images are captured, and both intensity and directional information of optical rays are recorded [16–19]. These elemental images can be recorded by a lenslet array, an array of cameras, or a moving camera, as shown in Fig. 1(a). In the reconstruction process, a pinhole model is assumed, and the rays are back-propagated through their corresponding virtual pinholes at a particular depth plane to provide depth information about the 3D scene [20–22]. Mathematically, the reconstruction process can be described as follows:

(1)$$R({x,y;z;t} )= \frac{1}{{O(x,y;t)}}\sum\limits_{m = 0}^{K - 1} {\sum\limits_{n = 0}^{L - 1} {E{I_{m,n}}} \left( {x - m\frac{{{N_x}{P_x}f}}{{{C_x}z}},y - n\frac{{{N_y}{P_y}f}}{{{C_y}z}};t} \right)}$$

where R (x, y; z; t) is the integral imaging reconstructed image at depth z and at time frame t, x and y are the pixel indices of each elemental image. O(x, y, t) is the number of overlapping pixels at time frame t. z is the reconstruction distance denoted as z = z_air +z_w/n_w, where z_air is a distance in air medium, z_w is the distance in the water, and n_w is the refractive index of water. N_x, N_y represents the total number of pixels of each elemental image, K and L are the number of elemental images obtained in x and y directions, and EI_m,n (·) is the elemental image in the m^th column and n^th row. By shifting and overlapping the elemental images, the reconstructed image is obtained on a specific depth plane. P_x and P_y are the pitch between adjacent image sensors on the camera array, and f is the focal length of the camera lens. C_x and C_y are the width and height of the image sensor, respectively. The camera pickup process and the computational reconstruction process are depicted in Fig. 1(a) and 1(b) respectively.

Fig. 1. 3D integral imaging process: (a) Pickup stage of the integral imaging system and (b) computational volumetric reconstruction process for integral imaging.

Download Full Size | PDF

2.2 CNN-BiLSTM based optical signal detection

Figure 2 shows the detailed framework of CNN-BiLSTM network used for optical signal detection. First, a 3D reconstructed video is obtained using the integral imaging-based computational reconstruction algorithm. During the training phase, reconstructed 7-frame video sequences are used to train an end-to-end CNN-BiLSTM network. A pretrained GoogLeNet [23] trained on the ImageNet dataset [24] is used to extract the spatial features at each frame of the captured video sequence, followed by a bidirectional recurrent neural network to extract the temporal information content of the transmitted video sequence for each class [9]. The feature vectors are taken from the last pooling layer (“pool5-7 × 7_s1”) of the GoogLeNet and used as the feature vector representing the spatial information content of each video frame. The bi-directional long short-term memory (BiLSTM) network is a variant of the long short-term memory (LSTM) network, where two separate LSTM networks are used to extract the temporal information along the forward and backward direction [25–28]. The details of the LSTM cell structure are described in Appendix A. Assuming an M-bit encoding scheme is utilized, the information encoded in an M-frame sequence can be used to produce a k dimensional feature vector ${{\textbf x}_i}$, i=1,2,3,…M for each frame ${I_n}$, n=1,2,3,…M. Thus, for each M bit frame sequence, we get the feature matrix ${\textbf X}$ by concatenating the feature vectors ${{\textbf x}_i}$. The feature matrix ${\textbf X}$ for each video is then fed into a Bi-LSTM network, and its output is fed to a fully connected layer followed by a softmax layer for classification. During the testing phase, a sliding window-based approach is used to extract 7-frame video sequences from the recorded data sequence which is fed to the CNN-BiLSTM model. We have used BiLSTM layer with 100 hidden units to learn the temporal dependency of the feature vectors. The Bi-LSTM classification network was trained for 100 epochs with a batch size of 10 at each iteration. The network weights and biases were optimized using Adam optimizer with a learning rate of 0.0001 for the first 50 epochs and 0.00001 for the final 50 epochs. The Adam is a first order gradient-based optimization algorithm for stochastic objective functions. It is based on the adaptive estimates of lower-order moments. It is suitable for non-stationary objectives, noisy/spare gradients and have been proved to be robust and well suited for wide class of non-convex machine learning problems [29]. We have used the dropout layer with a dropout rate of 0.3. All the above parameters were chosen by tuning the hyperparameters of the network using 10% of the training dataset as validation data.

Fig. 2. Block diagram for Integral imaging-based optical signal detection using proposed deep learning architecture: InIm: Integral imaging, CNN: convolutional neural network, LSTM: Long Short-Term Memory Network. Red dotted block represents the bidirectional LSTM architecture and the arrows inside the block indicate the connection to forward and backward LSTMs.

Download Full Size | PDF

3. Experimental methods

This section describes the experimental setup, underwater data collection, and method for the proposed multidimensional integral imaging-based system in a turbid and occluded environment. The experimental setup is shown in Fig. 3. The light source used is a light-emitting diode (LED), operating at 630 nm, and is allowed to pass through a turbid water tank having dimensions 500(W) × 250(L) × 250(H) mm. The turbidity of the water is controlled through the addition of antacid, and experiments were carried out at different turbidity levels to assess the performance of the proposed system at various turbidities. The optical signals were coded using the gold code sequence [30]. The signal is transmitted through a turbid underwater medium with occlusion at a speed of 20 bits per second. When the transmitted data is 1, the 7-bit gold code is transmitted, and when the transmitted data is 0, the 7-bit flipped gold code is transmitted. Thus, the temporal structural difference between the videos representing class “1” and “0” could be utilized by the LSTM network to create a decision boundary between the two classes. For optical signal transmission, we choose an 8-bit data sequence as [1, 0, 0, 1, 1, 0, 1, 0], wherein each bit of the original data is coded with a 7-bit gold code ([1,1,0,0,1,0,1]) yielding a 56-bit encoded signal. An occluded underwater environment is created by placing an artificial plant in front of the light source inside the water tank. To quantify the turbidity level, we use Beer's coefficient, derived from the Beer-Lambert's law, which states that I = I₀e^-αd, where I₀ is initial intensity, I is the intensity after propagating a distance d in a turbid medium, and $\alpha $ is Beer’s coefficient. For the measurement of α a sample of water was taken from each turbidity condition, and the intensity of light I₀ and I was measured at two locations with d=10 mm, where d is distance between the I₀ and I. The intensities I₀ and I are measured using the optical power meter and detector (Newport 818-SL/DB Silicon) at a wavelength of 630 nm with a 10 mm aperture. In this experiment, α ranges from 0.0047 to 0.0318 mm^-1. The transmitted optical signals were captured using a 3 × 3 camera array consisting of G-192 GigE cameras and C-mount zoom lenses with a focal length of 20 mm. The pitch between cameras is 80 mm in both the horizontal and vertical directions. The spatial resolution of the camera sensor is 1600(H) × 1200(V) with a pixel size of 4.5 µm × 4.5 µm, and the camera array is synchronized to record video data set at a frame rate of 20 fps. In our experiments, the signal transmission and the receiving cameras are synchronized such that each recorded frame corresponds to the transmission of exactly 1 bit of the transmitted signal. This synchronization was done in order to transmit signals at the maximum speed allowable by the recording camera's framerate. Recent developments in camera technology allows much faster frame rates of 1 million frame per second.

Fig. 3. Water tank with turbid water and underwater plant to mimic underwater occlusion, a 3 × 3 camera array for 3D integral imaging pickup. Optical signal sent by the red LED light source. The external white light LED is used to mimic ambient light for shallow water scenes.

Download Full Size | PDF

The sliding window approach (depicted in Fig. 4) is used to slice each possible 7-frame video sequence from the transmitted data sequence. Thus, a series of 7 frame video sequences, U_i (i, i+1,…, i+6), are obtained from the transmitted signal using the sliding window approach, where i=1,2,3,…N-6 and N is the total number of frames in the encoded transmitted video sequence. There are 36 unique possible 7-frame video sequences, of these 36 possibilities, we can categorize them into three different cases: case 1) where the 7-frame video sequence exactly overlaps with the gold code sequence (class ‘1’), case 2) where the 7-frame video sequence exactly overlaps with the flipped gold code sequence (class ‘0’), and case 3) the remaining 34 possibilities wherein the sliced 7-frame video sequence neither matches exactly the gold code nor the flipped gold code sequences (‘idle’ class). Thus, to train our proposed CNN-BiLSTM optical signal detection network, we created three different class labels, ‘1’, ‘0’, and ‘idle’. In particular, our interest is to detect the classes ‘1’ and ‘0’. However, in our experiments, the inclusion of the ‘idle’ class in the training dataset results in better discrimination between the two classes of interest. For training the network, the transmitted signals are recorded in clear water. We recorded data with two different F numbers, F1.8 and F8. Thus, in the training dataset, before augmentation, we have two (one corresponding to each F number) videos each for class ‘1’ and class ‘0’, and 68 ‘idle’ class videos. To reduce network instability or bias towards the ‘idle’ class, the data sets of classes ‘0’ and ‘1’ are repeated 34 times in order to obtain 68 videos for each class.

Fig. 4. Proposed sliding window with CNN-BiLSTM-based classification approach.

Download Full Size | PDF

In total, we have 204 videos for training, 68 video sequences corresponding to each class. To improve the performance and generalization capability of the CNN-BiLSTM network in degraded environments, we have performed data augmentation on the training dataset. In a turbid water environment, we encounter optical signal attenuation and blurring. Therefore, we adopted two data augmentation models: (1) attenuation plus additive Gaussian noise, and (2) blurring plus additive Gaussian noise. Attenuation plus additive Gaussian noise model is modeled as I_degraded=β× I_original +n, where I_degraded is the degraded captured image (data), β is the attenuation factor, I_original is the original image (data), and n is the additive noise modeled as Gaussian with mean µ and variance σ². Similarly, the blurring plus additive Gaussian noise model can be written as I_degraded=h Ⓧ I_original +n, where Ⓧ represents the convolution operator and h is a 2D Gaussian kernel given as h=1/(2π s²)exp(-(x²+y²)) with zero mean (0,0) and variance s. To generate the augmented dataset, the augmentation models are applied to each of the training video sequences. For estimating the parameters of the augmentation model, a dataset of signal ‘on’ and ‘off’ images were recorded at various turbidity levels ranging from α=0.005 to 0.033 mm^-1 without occlusion. Parameters µ and σ² are estimated from the fitted normal distribution on the histogram of ‘on’ and ‘off’ images using maximum likelihood estimation at various turbidities. The additive Gaussian noise distribution mean and variance vary in the range of 0.015 to 0.032 and 0.004 to 0.01, respectively. The parameter s is assumed to be chosen from a uniformly distributed random variable ranging from 1 to 10. The parameter β is chosen from uniformly distributed random values ranging from 0 to 1. Since the test data contains the clear water image (β = 1) and turbid water images (β <1), it is reasonable to assume a range of β from 0 to 1.

The CNN-BiLSTM classifier outputs the classification scores for the respective classes, the maximum value between the classification scores of ‘0’ and ‘1’ is selected; if the selected value corresponds to class ‘0’, then it is multiplied by -1 else, if it corresponds to class ‘1’, then the selected value remains unchanged as shown in Fig. 4. The above score selection method is applied on the each U_i (i, i+1,…,i+6) video to generate a transformed classification score sequence S(i). For testing the performance of the proposed approach, we recorded data with occlusion and turbidity; a sample 2D elemental image is recorded using a central camera with an aperture size of F1.8 is shown in Fig. 5(a). The integral imaging-based computational reconstruction algorithm, as outlined in Eq.1, enhances the visibility of the optical signal in a turbid and occluded environment, as illustrated in Fig. 5(b) at α = 0.0047 mm⁻¹.

Fig. 5. (a) Images of the underwater occluded signal viewed from the central camera perspective, taken at α = 0.0047 mm⁻¹, and (b) reconstructed image using 3D integral imaging with occlusion at α = 0.0047 mm⁻¹.

Download Full Size | PDF

The transformed classification scores S(i) of the transmitted video sequence should have high and low peaks corresponding to the transmission of binary sequences 1 and 0, respectively. Given the 8-bit original data and 7-bit coding scheme, the transformed classification score S(i) is expected to have 8 prominent locations of either local minima or maxima values, each separated by 7-bits. By summation of the prominence of these peaks separated by 7-bits, we can find the correct start frame for the signal transmission as the frame which gives the maximum sum of the prominence across the recorded signal [2]. The final classification of the transmitted binary data sequences can be done using thresholding of the transformed classification score S(i). For our results, we set the threshold to be ‘0’. For each turbidity level, the encoded 56-bit signal transmission experiment is repeated eight times corresponding to the 64-bits of original data to be decoded to compute the performance metrics. Thus, in total for testing, we have 2880 7-frame videos, 576 videos for each turbidity level. The overall flow chart for optical signal transmission and signal detection is shown in Fig. 6.

Fig. 6. Flow chart of the proposed system for (a) optical signal transmission and (b) deep learning-based detection in underwater communication. InIm denotes Integral Imaging.

Download Full Size | PDF

4. Results and discussion

The performance of the proposed underwater signal detection system for the experimental data under varying turbidity conditions is evaluated using the receiver operating characteristic (ROC) curves, the area under the curve (AUC), and the number of detection errors. The ROC analysis for underwater signal detection is shown in Fig. 7(a) for lower turbidity (α = 0.0114 mm⁻¹) and in Fig. 7(b) for higher turbidity (α = 0.0318 mm⁻¹) levels. The ROC for the 3D integral imaging system with CNN-BiLSTM (blue line) method achieves an AUC of 1 at α = 0.0114 mm⁻¹ and 0.8965 at α = 0.0318 mm⁻¹, which outperforms the other tested methods. The conventional 2D imaging system with CNN-BiLSTM (black line) gives AUC values of 0.5684, and 0.4805, respectively. The 3D integral imaging system with nonlinear correlation (green line) gives AUC values of 0.899, and 0.4561, respectively. Finally, the conventional 2D imaging with nonlinear correlation (red line) [1,31] give AUC values of 0.5117, and 0.4453, respectively, at turbidity levels of α = 0.0114 mm⁻¹ and α = 0.0318 mm⁻¹.

Fig. 7. ROC (Receiver operating characteristic) curves for underwater signal detection at a turbidity level (α=0.0114 mm^-1 and α=0.0318 mm^-1). Results are compared between the 3D integral imaging reconstructed video data with CNN-BiLSTM (blue line), a conventional 2D imaging system with CNN-BiLSTM (black line), a 3D integral imaging system with nonlinear correlation (green line), and conventional 2D imaging with nonlinear correlation (red line).

Download Full Size | PDF

The AUC and number of detection errors are shown in Fig. 8 as a function of Beer's coefficient. Figure 8(a) shows the AUC versus the Beer's law coefficient. The AUC of the proposed method maintains a higher value than that of all other tested methods. Also, from Fig. 8(b), the number of detection errors increases for all methods as the turbidity increases; however, the number of errors using the proposed CNN-BiLSTM approach is lower than that of all other tested methods.

Fig. 8. (a) Area under curves and (b) number of detection errors for underwater signal detection at various turbidity levels. Results are compared between the 3D integral imaging reconstructed video data with CNN-BiLSTM (blue line), a conventional 2D imaging system with CNN-BiLSTM (black line), a 3D integral imaging system with nonlinear correlation (green line), and conventional 2D imaging with nonlinear correlation (red line).

Download Full Size | PDF

From Fig. 7 and Fig. 8, the experimental comparison shows that the proposed CNN-BiLSTM approach is more effective than the previously reported correlation-based approach [2,3] in challenging experimental conditions, including occlusion and turbidity. Compared with the deep neural network-based approaches, the correlation-based approaches are less generalizable and particularly useful when limited training datasets are available. Even though, the 3D imaging system has increased computational load as well as calibration requirements as compared to a single camera system, in degraded environments and applications where the accuracy is of prime concern, the 3D integral imaging-based system proves to be useful. Thus, the proposed 3D integral imaging-based CNN- BiLSTM approach is promising for underwater signal detection in occluded and turbid environments. The estimated time (in seconds) of the proposed system is 9.2 and 6.10 respectively for the 3D integral imaging reconstruction and the sliding window-based CNN-BiLSTM classification. The computation time has been calculated for 72 frames video (0.21 seconds/frame). For 2D imaging, the computation time for sliding window approach and classification remains the same as 3D imaging. Thus, the total computation time for 2D imaging is 6.10 seconds (0.085 seconds/frame). However, the computation time for 3D integral imaging reconstruction can be further reduced by using GPU based stream-processing [32] and much faster cameras at mega frames per second which is a focus of our future work.

5. Conclusion

In summary, we have presented an underwater signal detection approach in turbid and occluded medium based on multidimensional integral imaging and deep neural networks. We have compared the performance of the proposed method with conventional 2D imaging and correlation-based approaches. The results show that multidimensional integral imaging substantially improves the performance of optical signal detection in comparison to other imaging modalities under degraded environments such as partial occlusion and turbidity. Future experiments may be extended in more challenging environments such as underwater turbulence [33], much faster cameras, and exploring other deep learning strategies which greatly reduce the computation as well as calibration requirements for enhanced optical signal detection under degraded conditions [34].

Appendix A: long short-term memory (LSTM) cell computation

Consider an input sequence of T time steps, ${\textbf x} = [{x_1},{x_2},{x_3},\ldots .,{x_T}]$, which is passed through a BiLSTM network [28]. In BiLSTM network, update at t-th hidden vector depend on both forward and backward directions. The forward LSTM flow is shown in Eq. (2) [25,28]. Similarly, the backward equation could be derived by replacing → with ←.

(A1)$$\overrightarrow {{i_t}} = \sigma ({\overrightarrow {{W_{xi}}} {x_t} + \overrightarrow {{W_{hi}}} {h_{t - 1}} + \overrightarrow {{b_i}} } )\; $$

(A2)$$\overrightarrow {{f_t}} = \sigma ({\overrightarrow {{W_{xf}}} {x_t} + \overrightarrow {{W_{hf}}} {h_{t - 1}} + \overrightarrow {{b_f}} } )\; \; $$

(A3)$$\overrightarrow {{o_t}} = \sigma ({\overrightarrow {{W_{xo}}} {x_t} + \overrightarrow {{W_{ho}}} {h_{t - 1}} + \overrightarrow {{b_o}} } )$$

(A4)$$\overrightarrow {{g_t}} = \textrm{tanh}({\overrightarrow {{W_{xc}}} {x_t} + \overrightarrow {{W_{hc}}} {h_{t - 1}} + \overrightarrow {{b_c}} } )\; \; \; \; $$

(A5)$$\overrightarrow {{c_t}} = \overrightarrow {{f_t}} \odot \overrightarrow {{c_{t - 1}}} + \overrightarrow {{i_t}} \odot \overrightarrow {{g_t}} \; \; $$

(A6)$$\overrightarrow {{h_t}} = \overrightarrow {{o_t}} \odot \textrm{tanh}({\overrightarrow {{c_t}} } )\; $$

where → and ← denote the forward and backward directions, respectively. i_t, f_t and o_t denote the input gate, forget gate, and output gate, respectively. These three gates use a sigmoid function, $\sigma(x )= 1/({1 + {e^{ - x}}} )$, to rescale the signal to [0, 1].g_t and h_t denote the modulated gate and hidden state at t-th time, respectively. The modulated gate uses a hyperbolic tangent function $tanh(x )= ({{e^x} - {e^{ - x}}} )/({{e^x} + {e^{ - x}}} )$ to rescale the signal to [-1, 1]. Here, W_kl and b_k, k ε {x, h} and l ε {0, c, f, i} represent the corresponding weight matrices and bias terms of the network, respectively. Here, Θ denote the element-wise multiplication. The output of these forward and backward layers is merged, i.e., ${h_t} = f(\overrightarrow {{h_t}} ,\overleftarrow {{h_t}} )$. We have chosen the merging operator f to be a concatenation operator.

Funding

Air Force Office of Scientific Research (FA9550-18-1-0338, FA9550-21-1-0333); Office of Naval Research (N000141712405, N000142012690).

Acknowledgments

We wish to acknowledge support under The Office of Naval Research (ONR) (N000141712405, N000142012690), Air Force Office of Scientific Research (FA9550-18-1-0338, FA9550-21-1-0333). T. O'Connor acknowledges the Dept. of Education through the GAANN Fellowship.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Komatsu, A. Markman, and B. Javidi, “Optical sensing and detection in turbid water using multidimensional integral imaging,” Opt. Lett. 43(14), 3261–3264 (2018). [CrossRef]

2. R. Joshi, T. O’Connor, X. Shen, M. Wardlaw, and B. Javidi, “Optical 4D signal detection in turbid water by multidimensional integral imaging using spatially distributed and temporally encoded multiple light sources,” Opt. Express 28(7), 10477–10490 (2020). [CrossRef]

3. R. Joshi, G. Krishnan, T. O’Connor, and B. Javidi, “Signal detection in turbid water using temporally encoded polarimetric integral imaging,” Opt. Express 28(24), 36033–36045 (2020). [CrossRef]

4. B. Javidi, A. Carnicer, J. Arai, T. Fujii, H. Hua, H. Liao, M. Martínez-Corral, F. Pla, A. Stern, L. Waller, Q.-H. Wang, G. Wetzstein, M. Yamaguchi, and H. Yamamoto, “Roadmap on 3D integral imaging: sensing, processing, and display,” Opt. Express 28(22), 32266–32293 (2020). [CrossRef]

5. M. Dubreuil, P. Delrot, I. Leonard, A. Alfalou, C. Brosseau, and A. Dogariu, “Exploring underwater target detection by imaging polarimetry and correlation techniques,” Appl. Opt. 52(5), 997–1005 (2013). [CrossRef]

6. E. Tajahuerce, V. Durán, P. Clemente, E. Irles, F. Soldevila, P. Andrés, and J. Lancis, “Image transmission through dynamic scattering media by single-pixel photodetection,” Opt. Express 22(14), 16945–16955 (2014). [CrossRef]

7. N. Cohen, S. Shmilovich, Y. Oiknine, and A. Stern, “Deep neural network classification in the compressively sensed spectral image domain,” J. Electron. Imag. 30(04), 1–10 (2021). [CrossRef]

8. H. Lee, I. Lee, T. Q. S. Quek, and S. H. Lee, “Binary signaling design for visible light communication: a deep learning framework,” Opt. Express 26(14), 18131–18142 (2018). [CrossRef]

9. G. Krishnan, R. Joshi, T. O’Connor, F. Pla, and B. Javidi, “Human gesture recognition under degraded environments using 3D-integral imaging and deep learning,” Opt. Express 28(13), 19711–19725 (2020). [CrossRef]

10. M. S. M. Alamgir, M. N. Sultana, and K. Chang, “Link Adaptation on an Underwater Communications Network Using Machine Learning Algorithms: Boosted Regression Tree Approach,” IEEE Access 8, 73957–73971 (2020). [CrossRef]

11. B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. Bülow, D. Lavery, P. Bayvel, and L. Schmalen, “End-to-End Deep Learning of Optical Fiber Communications,” J. Lightwave Technol. 36(20), 4843–4855 (2018). [CrossRef]

12. M. A. Amirabadi, M. H. Kahaei, and S. A. Nezamalhosseini, “Deep learning based detection technique for FSO communication systems,” Phys. Commun. 43, 101229 (2020). [CrossRef]

13. S. Avramov-Zamurovic, A. T. Watnik, J. R. Lindle, K. P. Judd, and J. M. Esposito, “Machine learning-aided classification of beams carrying orbital angular momentum propagated in highly turbid water,” J. Opt. Soc. Am. A 37(10), 1662–1672 (2020). [CrossRef]

14. X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: Sensing, display, and applications [Invited],” Appl. Opt. 52(4), 546–560 (2013). [CrossRef]

15. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theor. Appl. 7(1), 821–825 (1908). [CrossRef]

16. S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12(3), 483–491 (2004). [CrossRef]

17. N. Davies, M. McCormick, and L. Yang, “Three-dimensional imaging systems: a new development,” Appl. Opt. 27(21), 4520–4528 (1988). [CrossRef]

18. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36(7), 1598–1603 (1997). [CrossRef]

19. G. Scrofani, J. Sola-Pikabea, A. Llavador, E. Sanchez-Ortiga, J. C. Barreiro, G. Saavedra, J. Garcia-Sucerquia, and M. Martínez-Corral, “FIMic: design for ultimate 3D-integral microscopy of in-vivo biological samples,” Biomed. Opt. Express 9(1), 335–346 (2018). [CrossRef]

20. J. Arai, E. Nakasu, T. Yamashita, H. Hiura, M. Miura, T. Nakamura, and R. Funatsu, “Progress Overview of Capturing Method for Integral 3-D Imaging Displays,” Proc. IEEE 105(5), 837–849 (2017). [CrossRef]

21. M. Yamaguchi, “Full-Parallax Holographic Light-Field 3-D Displays and Interactive 3-D Touch,” Proc. IEEE 105(5), 947–959 (2017). [CrossRef]

22. M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics 10(3), 512–566 (2018). [CrossRef]

23. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9 (2015).

24. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 248–255 (2009).

25. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput. 9(8), 1735–1780 (1997). [CrossRef]

26. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

27. M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). [CrossRef]

28. J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4694–4702 (2015).

29. S. Bock and M. Weiß, “A Proof of Local Convergence for the Adam Optimizer,” in 2019 International Joint Conference on Neural Networks (IJCNN), 1–8 (2019).

30. R. Gold, “Optimal binary sequences for spread spectrum multiplexing (Corresp.),” IEEE Trans. Inf. Theory 13(4), 619–621 (1967). [CrossRef]

31. B. Javidi and D. Painchaud, “Distortion-invariant pattern recognition with Fourier-plane nonlinear filters,” Appl. Opt. 35(2), 318–331 (1996). [CrossRef]

32. F. Yi, I. Moon, J.-A. Lee, and B. Javidi, “Fast 3D Computational Integral Imaging Using Graphics Processing Unit,” J. Disp. Technol. 8(12), 714–722 (2012). [CrossRef]

33. Z. Vali, A. Gholami, Z. Ghassemlooy, M. Omoomi, and D. G. Michelson, “Experimental study of the turbulence effect on underwater optical wireless communications,” Appl. Opt. 57(28), 8314–8319 (2018). [CrossRef]

34. M. Li and H. Li, “Application of deep neural network and deep reinforcement learning in wireless communication,” PLoS One 15(7), e0235447 (2020). [CrossRef]

Optical signal detection in turbid water using multidimensional integral imaging with deep learning

Abstract

1. Introduction

2. Methodology

2.1 3D integral imaging

2.2 CNN-BiLSTM based optical signal detection

3. Experimental methods

4. Results and discussion

5. Conclusion

Appendix A: long short-term memory (LSTM) cell computation

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Equations (7)

Optics Express