Dispersion characterization and pulse prediction with machine learning

Sanjaya Lohani; Erin M. Knutson; Wenlei Zhang; Ryan T. Glasser

doi:10.1364/OSAC.2.003438

1. Introduction

Optical pulse propagation through dispersive media often results in significant temporal distortion. The ability to characterize a medium based on its dispersive effects on pulses provides, for example, a route toward remote sensing of unknown media, and can provide information as to how to optimize an optical communications platform. Here we develop and experimentally implement neural networks with the ability to make predictions of input pulse shapes and dispersive features at the output of a four-wave mixing interaction in a warm gas of atoms; i.e. at the receiving end of an optical communications or remote sensing scheme. Four-wave mixing (FWM) in atomic vapor may be used to generate twin beams that have been shown to be useful in imaging [1,2], spectroscopy [3], and communications, both classical and quantum [4]. Furthermore, the resultant two-mode squeezed light [5] may be used to realize quantum steering [6] and continuous-variable quantum teleportation [7], to improve on metrological limits [8], and to generate high-purity narrow-band single photons [9], among many other applications [10–12]. FWM has many advantages over other methods of generating intensity-correlated beams; for example, there is no need for a cavity, and the bright output modes are generated such that they are spatially separated (as well as frequency-separated) from the pump beam(s) [3]. The generally phase-insensitive modes also need not be spatially Gaussian or symmetric [13–15], and the system is readily expanded either by cascading [16,17] or adding more input beams [18–20].

In many of these FWM applications, it is critical to measure the frequency response of the intensity of the “probe,” or seed, beam – referred to as a gain line measurement – in order to characterize the dispersion in the atomic medium, which depends in a complex manner on the atomic makeup, temperature, and so on. This is done by scanning the probe in frequency and detecting the resultant intensity while the pump is going through the medium, then subtracting the same measurement taken while the pump is blocked, in order to account for frequency dependence outside of the FWM interaction. In practice, this requires either a tunable probe laser or a frequency shifter such as an acoustic-optical modulator (AOM) or an electro-optical modulator (EOM). Accordingly, this measurement adds undesirable time and expensive equipment to quantum communications that rely on four-wave mixing. It is often much simpler and faster, in practice, to pulse light than to scan it over a broad frequency range. To address these issues, we introduce a scheme for predicting a portion of the gain line of an atomic FWM system using only single pulses with fixed center frequency on the probe mode, by measuring the distorted output pulses. In this way we take advantage of the fact that a short pulse is broad in its frequency spectrum, and no experimental frequency scanning or tuning is necessary. To do this, we train a convolutional neural network (CNN), which may also be used to correct for external dispersion or temporal distortion on the probe [21] in its prediction of the gain line measurement, thereby correcting for the temporal distortion by optimizing the probe pulses at the input of the system.

To generate FWM we pump a one-inch cell of rubidium vapor with 200 mW of 795 nm CW laser light. A $\approx$ 3 GHz red-detuned, relative to the pump, probe beam crosses the pump at an angle of 0.8 degrees. This probe beam may be scanned in frequency via an acousto-optic modulator (AOM) in order to generate gain line data in the standard frequency-scanning method, or amplitude modulated via the AOM (with a fixed frequency) and an arbitrary waveform generator in order to generate pulses, as shown in Fig. 1.

Fig. 1. (a.) Experimental setup for detecting reference (input) pulses. A Ti:Sapphire laser is locked to a wavelength of approximately 795 nm and passed through a $\lambda /2$ waveplate (WP) then split on a polarizing beam splitter (PBS). A portion of the light is passed through an acoustic-optical modulator (AOM), which is modulated with various pulses and waveforms (inset a-I) from an arbitrary waveform generator. The intensity-modulated and frequency-shifted light (orange) then bypasses a flip mirror (FM) and is incident on a detector (PDA), resulting in the reference probe pulses (a-II). (b.) Experimental setup for detecting gain lines and output probe pulses. The AOM is either scanned in frequency over $\sim$100 MHz (b-I) or pulsed as before (b-II), resulting in a gain line (b-III) or output probe pulses (b-IV) respectively. An energy level diagram for the FWM process is shown in inset b-V.

Download Full Size | PDF

Machine learning techniques have been applied to various scientific and research fields [22–32], including using CNNs in the context of optical communications [33–37]. Additionally, deep neural networks have been shown to be useful in a variety of regression type optimization scenarios [38–41]. Here we use a CNN to make predictions for the input probe pulses which propagate through a nonlinear dispersive medium, which often contains complex gain and absorption features. Additionally, this technique is able to predict the profile of unknown input pulses sent through a dispersive medium by using the measured output pulse profiles. This flexibility allows for the developed system to be used in a variety of applications, including remote sensing of unknown materials, and for the optimization of optical pulse propagation through unknown media.

2. Results and discussion

The CNN contains a single two dimensional convolutional layer with a kernel of size $[ $5, 5, 2$] $ ($[ $3, 3, 2$]$ for the results shown in Fig. 5), where 2 represents the two-channel input. The convolutional layer has a stride length of 1 (2 for the results shown in Fig. 5) and a rectified linear unit (ReLU) activation convolutes the input pulse (image) with a size of $[ $50, 50, 2$] $ to a size of $[ $46, 46, 10$] $, where 10 represents the number of feature mappings. Then we apply a zero padding such that the dimension of the image after the convolution, again, becomes $[ $50, 50, 10$] $. After this, we apply a two dimensional max pool layer with a kernel of size $[ $2, 2$] $ that reduces the width and height of image to half its value, $[ $25, 25, 10$] $. Next, we attach a fully connected layer (FCL) with 5,000 neurons (2,500 neurons for the single channel input case) to the output of the max-pooling followed by the ReLU activation function. Then we apply a dropout with a rate of $50\%$ to the outputs of the FCL. Finally, we connect the output of the FCL to an output layer consisting of 2,500 neurons (300 for gain curve predictions) followed by a linear activation function. Note that the hyperparameters of the neworks are manually optimized as discussed in [33]. In order to generate the two channel data set for predicting input pulse profiles, we stack the FWM output probes (desired outputs) and corresponding gain lines on each other. These are then randomly split into a training set and test set. The training set is then fed into the CNN, which makes predictions for the required input probe pulses to be sent through the Rb cell. Examples of the desired outputs (FWM output probes), gains and corresponding required input probe pulses are shown in Fig. 2, along with a schematic of the neural network architecture. This process is repeated many times with different initialization points for the given unknown test set of output probes and gains, and the required input probes are predicted and compared to the experimental input probes. Additionally, the CNN makes predictions for the input probes using only output probes as single-channel data (i.e no gain lines). Finally, we alter the system to make predictions for gain line profiles by using output probes and input probes stacked on each other as the two channel data set, as well as using only output probes as the single channel set. The predicted gains are again compared to the experimental values, as shown in Figs. 4 and 5. For latter case, we benchmark how closely the predicted dispersion profiles fit the experimental data as the training data set is varied.

Fig. 2. Architecture of the neural network for unknown probe prediction using gain and desired outputs as the two channel input, or outputs only (without gain) as the single channel input. Here the measured output probe and dispersion profiles are used to predict the input probe pulse. As discussed in the text the scheme is easily altered to predict the dispersion profile using the input and output probe pulses as the two channel input, or the output pulses only (without input pulses) as the single channel input.

Download Full Size | PDF

In order to make predictions for input probe pulses by making use of a gain curve, we use two channels of data in the convolutional networks. Here the gain, input probe pulses, and output probes after FWM have 2,500 points between the time-scale of 9.71 $\mu s$ to 12.2 $\mu s$. First we convert these 2,500 points to a corresponding image of size $50\times 50$. As a result we have a total of 64 different sets (images) of gain, output probes, and their corresponding input probes. Note that here the gain curve remains the same for all the combinations of output probes and their respective input probes. Next we randomly split them into training data consisting of 60 sets of pulses and testing data with 4 sets of pulses. Note that each training and testing set has an output probe stacked with a gain line so as to make the two channel data to the network, with the corresponding input probe as the target. The two channel data is scaled to have zero mean and unity variance before being fed into the CNN (no scaling is performed on the target probe pulses). After this, the network is trained with a learning hyper-parameter of 0.008 for up to 600 epochs using a stochastic batch optimization technique using adamoptimizer of tensorflow [42]. Then we feed the unknown 4 sets of pulses (output probes and gain lines) to the pre-trained network to make the predictions for their corresponding input probes. The predicted results (green) versus experimentally measured (blue) pulses are shown in Fig. 3(a-d). Similarly, we use only the FWM output probe as a single channel data (with no gain line) to the network and the corresponding input probe as the output of the network to train the network. With the same hyper-parameter settings as before, predictions made by the pre-trained network are shown by the red curves in Fig. 3(a-d). The translucent bands, shaded green around the predicted green curves and shaded red around the predicted red curves represent one standard deviation from the mean value of 15 different trials. This exact system may then be used to predict input pulse profiles for given desired output profiles, by using a desired output pulse with the gain line as the two channel input into the network.

Fig. 3. (a-d) Input probe (pulse) predictions using, green: FWM output probes and gain profiles as two channels training inputs, and red: FWM output probes only (without gain) as a single channel data to train the network.

Download Full Size | PDF

We now turn to predicting different gain curves using input probes and their corresponding FWM output probes. Note that here the input probe remains the same for all the different combinations of gain and corresponding output probe sets. We use 27 different combinations of gain lines and FWM output probes, which are randomly split into training data and testing data with 25 pulse sets and 2 pulse sets respectively. The input probes and FWM output probes are stacked on each other to make two channel data input to the network, with the corresponding gain line as the target. Note that we clip the gain curves to 300 points, corresponding to a 3.017 GHz to 3.117 GHz detuning, which is equal to the number of output neurons of the network, with the input and FWM output probes again consisting of 2,500 points. The CNN is trained with the same hyper-parameter settings (except now a learning hyper-parameter of 0.009) as described in previous paragraphs and makes predictions for unknown gain lines. We find the predicted gains (green curves) are nearly identical to the experimental values (blue curves) as shown in Fig. 4. Similarly, we train the same network with only a FWM output probe (no input probes) as single channel data to the network and again make predictions for the unknown gain lines. We again find significant overlap between the prediction results (red curves) and the experimental data. The predicted and experimental gains peaked approximately at a detuning of 3.047 GHz and 3.071 GHz, and are shown in Fig. 4(a), and Fig. 4(b), respectively. The translucent bands again represent one standard deviation from the mean value of 15 different trials. Furthermore, in the case of two channel input to the network, the mean square loss between the unknown target gain line and predicted gain at each epoch is shown in inset of Fig. 4(b), which shows the loss is saturating after 200 epochs.

Fig. 4. Predicting the gain curve of a non-linear medium peaked approximately at a detuning of (a) 3.047 GHz, and (b) 3.071 GHz using FWM output probes and input probes as two channel training inputs (green), and FWM output probes only as single channel training inputs (red) to the network. The mean square loss at each epoch is shown in inset of (b).

Download Full Size | PDF

Lastly, we investigate the improvement in making predictions of gain with respect to the number of training sets. In order to generate a prediction benchmark, we use 18 different gain curves peaked between 3.068 GHz to 3.074 GHz as the unknowns to be predicted, and vary the number of pulses used in the training data set. We use network layers as discussed above with a learning hyper-parameter of 0.009. First we randomly choose 2 pulses out of 18 as the test data (unknown) and keep them fixed. After this we again randomly select 2, 4, 8, and 16 pulses as the training data each out of the remaining 16 pulses and then train the networks separately with them. Finally, the pre-trained networks make predictions for the unknown gains. As expected, we find the better gain predictions when using a higher number of training pulse sets as shown in Fig. 5. Note that the gain predictions shown by the red and green curves correspond to the experimental gains shown by the black and blue curves, respectively. The unknown test gain curve predictions using training data with sets of 2, 4, 8, and 16 pulses are shown in Fig. 5(a-d).

Fig. 5. Gain curve predictions using the training data with a set of (a) 2, (b) 4, (c) 8, and (d) 16 pulses, respectively.

Download Full Size | PDF

3. Conclusion

In conclusion, we have implemented convolutional neural networks with the ability to make estimations of unknown input pulses that have experienced distortion when passing through a dispersive atomic medium (nonlinear four-wave mixing in rubidium vapor), given the resultant distorted output pulses. We demonstrate that the predicted input probe pulse shapes and amplitudes match well with their experimental counterparts. In addition to straightforward classification, this method may be expanded as in an end-to-end communication or remote sen sing system, to make predictions at the receiving end for completely unknown transmitted pulses propagating through different dispersive media. Once pre-trained, the networks may also directly be used to optimize the input pulses that should be sent through a dispersive medium, given a desired output (or received) pulse. Additionally, with the same networks, we have demonstrated the successful prediction of gain lines – a measurement over a range of probe frequencies – using probe pulses with a single center frequency, thus requiring no scanning. This could considerably simplify experiments wherein it is important to characterize the approximate response of a medium to various frequency inputs, but where frequency scanning the relevant beam is difficult, time-consuming, or costly.

Funding

Office of Naval Research (N000141912374); National Science Foundation (DGE-1154145); Northrop Grumman – NG NEXT.

Acknowledgment

This research was supported in part using high performance computing (HPC) resources and services provided by Technology Services at Tulane University, New Orleans, LA.

Disclosures

The authors declare no conflicts of interest.

References

1. V. Boyer, A. M. Marino, R. C. Pooser, and P. D. Lett, “Entangled Images from Four-Wave Mixing,” Science 321(5888), 544–547 (2008). [CrossRef]

2. J. Shi, G. Patera, M. I. Kolobov, and S. Han, “Quantum temporal imaging by four-wave mixing,” Opt. Lett. 42(16), 3121–3124 (2017). [CrossRef]

3. C. Thiel, “Four-wave mixing and its applications,” Fac. Washington, Wash. DC (2008).

4. Y. Cai, J. Feng, H. Wang, G. Ferrini, X. Xu, J. Jing, and N. Treps, “Quantum-network generation based on four-wave mixing,” Phys. Rev. A 91(1), 013843 (2015). [CrossRef]

5. C. F. McCormick, V. Boyer, E. Arimondo, and P. D. Lett, “Strong relative intensity squeezing by four-wave mixing in rubidium vapor,” Opt. Lett. 32(2), 178–180 (2007). [CrossRef]

6. L. Wang, S. Lv, and J. Jing, “Quantum steering in cascaded four-wave mixing processes,” Opt. Express 25(15), 17457 (2017). [CrossRef]

7. W. Diao, C. Cai, W. Yang, X. Song, and C. Duan, “Theoretical Aspects of Continuous Variables Quantum Teleportation Based on Phase-Sensitive Four-Wave Mixing,” Int. J. Theor. Phys. 58(1), 323–331 (2019). [CrossRef]

8. F. Hudelist, J. Kong, C. Liu, J. Jing, Z. Y. Ou, and W. Zhang, “Quantum metrology with parametric amplifier-based photon correlation interferometers,” Nat. Commun. 5(1), 3049 (2014). [CrossRef]

9. A. MacRae, T. Brannan, R. Achal, and A. I. Lvovsky, “Tomography of a High-Purity Narrowband Photon from a Transient Atomic Collective Excitation,” Phys. Rev. Lett. 109(3), 033601 (2012). [CrossRef]

10. R. M. Camacho, P. K. Vudyasetu, and J. C. Howell, “Four-wave-mixing stopped light in hot atomic rubidium vapour,” Nat. Photonics 3(2), 103–106 (2009). [CrossRef]

11. N. Corzo, A. M. Marino, K. M. Jones, and P. D. Lett, “Multi-spatial-mode single-beam quadrature squeezed states of light from four-wave mixing in hot rubidium vapor,” Opt. Express 19(22), 21358 (2011). [CrossRef]

12. M. Turnbull, “Multi-spatial-mode quadrature squeezing from four-wave mixing in a hot atomic vapour,” d_ph, University of Birmingham (2014).

13. L. Cao, J. Du, J. Feng, Z. Qin, A. M. Marino, M. I. Kolobov, and J. Jing, “Experimental observation of quantum correlations in four-wave mixing with a conical pump,” Opt. Lett. 42(7), 1201–1204 (2017). [CrossRef]

14. O. Danaci, C. Rios, and R. T. Glasser, “All-optical mode conversion via spatially multimode four-wave mixing,” New J. Phys. 18(7), 073032 (2016). [CrossRef]

15. J. D. Swaim, E. M. Knutson, O. Danaci, and R. T. Glasser, “Multi-mode four-wave mixing with a spatially-structured pump,” arXiv:1802.03412 [physics, physics:quant-ph] (2018).

16. Z. Qin, L. Cao, H. Wang, A. M. Marino, W. Zhang, and J. Jing, “Experimental Generation of Multiple Quantum Correlated Beams from Hot Rubidium Vapor,” Phys. Rev. Lett. 113(2), 023602 (2014). [CrossRef]

17. Z. Qin, L. Cao, and J. Jing, “Experimental characterization of quantum correlated triple beams generated by cascaded four-wave mixing processes,” Appl. Phys. Lett. 106(21), 211104 (2015). [CrossRef]

18. H. Wang, C. Fabre, and J. Jing, “Single-step fabrication of scalable multimode quantum resources using four-wave mixing with a spatially structured pump,” Phys. Rev. A 95(5), 051802 (2017). [CrossRef]

19. S. Liu, H. Wang, and J. Jing, “Two-beam pumped cascaded four-wave-mixing process for producing multiple-beam quantum correlation,” Phys. Rev. A 97(4), 043846 (2018). [CrossRef]

20. E. M. Knutson, J. D. Swaim, S. Wyllie, and R. T. Glasser, “Optimal mode configuration for multiple phase-matched four-wave-mixing processes,” Phys. Rev. A 98(1), 013828 (2018). [CrossRef]

21. S. Lohani and R. T. Glasser, “Turbulence correction with artificial neural networks,” Opt. Lett. 43(11), 2611–2614 (2018). [CrossRef]

22. J. Lv, Z. Na, X. Liu, and Z. Deng, “Machine Learning and Its Applications in Wireless Communications,” in Communications, Signal Processing, and Systems, Q. Liang, J. Mu, M. Jia, W. Wang, X. Feng, and B. Zhang, eds. (Springer, Singapore, 2019), Lecture Notes in Electrical Engineering, pp. 2429–2436.

23. R. C. Deo, “Machine Learning in Medicine,” Circulation 132(20), 1920–1930 (2015). [CrossRef]

24. A. Karpatne, I. Ebert-Uphoff, S. Ravela, H. A. Babaie, and V. Kumar, “Machine Learning for the Geosciences: Challenges and Opportunities,” IEEE Trans. Knowl. Data Eng. 31(8), 1544–1554 (2019). [CrossRef]

25. K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh, “Machine learning for molecular and materials science,” Nature 559(7715), 547–555 (2018). [CrossRef]

26. C. Hegde and K. E. Gray, “Use of machine learning and data analytics to increase drilling efficiency for nearby wells,” J. Nat. Gas Sci. Eng. 40, 327–335 (2017). [CrossRef]

27. M.-A. T. Vu, T. Adalı, D. Ba, G. Buzsáki, D. Carlson, K. Heller, C. Liston, C. Rudin, V. S. Sohal, A. S. Widge, H. S. Mayberg, G. Sapiro, and K. Dzirasa, “A Shared Vision for Machine Learning in Neuroscience,” J. Neurosci. 38(7), 1601–1607 (2018). [CrossRef]

28. A. D. Tranter, H. J. Slatyer, M. R. Hush, A. C. Leung, J. L. Everett, K. V. Paul, P. Vernaz-Gris, P. K. Lam, B. C. Buchler, and G. T. Campbell, “Multiparameter optimisation of a magneto-optical trap using deep learning,” Nat. Commun. 9(1), 4360 (2018). [CrossRef]

29. T. Zahavy, A. Dikopoltsev, D. Moss, G. I. Haham, O. Cohen, S. Mannor, and M. Segev, “Deep learning reconstruction of ultrashort pulses,” Optica 5(5), 666–673 (2018). [CrossRef]

30. W. Huang, Y. Mao, C. Xie, and D. Huang, “Quantum hacking of free-space continuous-variable quantum key distribution by using a machine-learning technique,” Phys. Rev. A 100(1), 012316 (2019). [CrossRef]

31. G. R. Steinbrecher, J. P. Olson, D. Englund, and J. Carolan, “Quantum optical neural networks,” npj Quantum Inf. 5(1), 60 (2019). [CrossRef]

32. Y. Ismail, I. Sinayskiy, and F. Petruccione, “Integrating machine learning techniques in quantum communication to characterize the quantum channel,” J. Opt. Soc. Am. B 36(3), B116–B121 (2019). [CrossRef]

33. S. Lohani, E. M. Knutson, M. O’Donnell, S. D. Huver, and R. T. Glasser, “On the use of deep neural networks in optical communications,” Appl. Opt. 57(15), 4180–4190 (2018). [CrossRef]

34. T. Tanimura, T. Hoshida, T. Kato, S. Watanabe, and H. Morikawa, “Convolutional Neural Network-Based Optical Performance Monitoring for Optical Transport Networks,” J. Opt. Commun. Netw. 11(1), A52–A59 (2019). [CrossRef]

35. B. Rahmani, D. Loterie, G. Konstantinou, D. Psaltis, and C. Moser, “Multimode optical fiber transmission with a deep learning network,” Light: Sci. Appl. 7(1), 69 (2018). [CrossRef]

36. T. Doster and A. T. Watnik, “Machine learning approach to OAM beam demultiplexing via convolutional neural networks,” Appl. Opt. 56(12), 3386–3396 (2017). [CrossRef]

37. S. Lohani and R. T. Glasser, “Robust free space oam communications with unsupervised machine learning,” in Frontiers in Optics (Optical Society of America, 2019), pp. FTu5B–3.

38. M. M. Lotfinejad, R. Hafezi, M. Khanali, S. S. Hosseini, M. Mehrpooya, and S. Shamshirband, “A Comparative Assessment of Predicting Daily Solar Radiation Using Bat Neural Network (BNN), Generalized Regression Neural Network (GRNN), and Neuro-Fuzzy (NF) System: A Case Study,” Energies 11(5), 1188 (2018). [CrossRef]

39. H. Ye, Q. Ren, X. Hu, T. Lin, L. Shi, G. Zhang, and X. Li, “Modeling energy-related CO2 emissions from office buildings using general regression neural network,” Resour. Conserv. Recycl. 129, 168–174 (2018). [CrossRef]

40. Y. Xu, J. Du, L. Dai, and C. Lee, “A Regression Approach to Speech Enhancement Based on Deep Neural Networks,” IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015). [CrossRef]

41. L. K. Tan, Y. M. Liew, E. Lim, and R. A. McLaughlin, “Convolutional neural network regression for short-axis left ventricle segmentation in cardiac cine MR sequences,” Med. Image Anal. 39, 78–86 (2017). [CrossRef]

42. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” (2015). Software available from tensorflow.org.

Dispersion characterization and pulse prediction with machine learning

Abstract

1. Introduction

2. Results and discussion

3. Conclusion

Funding

Acknowledgment

Disclosures

References

Cited By

Figures (5)

OSA Continuum