Machine learning approach to OAM beam demultiplexing via convolutional neural networks

Timothy Doster; Abbie T. Watnik

doi:10.1364/AO.56.003386

1. INTRODUCTION

Free-space optical (FSO) communication is the transmission of information over a distance between a transmitter and a receiver using optical wavelengths, i.e., ultraviolet, visible, and infrared. FSO communication contrasts with fiber-based communication systems as it does not require a physical communication link and relies on the atmosphere as the transmission medium as opposed to an optical fiber. This is valuable when it is necessary to communicate line-of-sight between non-fixed locations or when established (fiber-based) communication systems have been destroyed by natural disasters or hostile actors. Though frequency division multiplexed RF communication also uses the atmosphere as its transmission medium, FSO offers several important advantages, namely, higher modulation bandwidth allowing higher information capacity [1], smaller beam divergence, which provides larger signal intensity at the receiver [2], and improved security to prevent eavesdropping due to directionality and non-penetration of physical obstacles [3].

Due to the complexity of the information that needs to be transmitted and/or the length of time allowed for transmission, it is often necessary to increase the information capacity of the data link [4]. Typically, for FSO communication, one can control the wavelength, polarization, and frequency of distinct light beams and thus multiplex together different signals; additionally, spatial and temporal methods can also be considered. Another option is to utilize orbital angular momentum (OAM) thus allowing beams with different mode numbers to be multiplexed together and transmitted over the same link [5,6].

OAM is a property of a coherent light beam that arises from the azimuthal components of linear momentum acting at the radius of the beam with a dependency of $\exp (i m θ)$ . The parameter, $m \in Z$ , is the topological charge or mode number and indicates that there is a theoretically infinite number of modes possible; due to noise, however, this is limited [7]. This creates a twisting of the light beam with a helical phase front. Bessel [8], Bessel–Gauss [9], Laguerre–Gauss [10], Hermite–Gauss [11], Ince–Gauss [12], and Mathieu–Gauss [13] are all beam types that possess OAM properties. Without the presence of turbulence, OAM beams exhibit orthogonality, which is very useful for optical FSO communication because multiplexed beams will not interfere with each other, thus allowing recovery of each mode. However, the presence of turbulence causes the mixing of information between adjacent modes, which produces channel crosstalk [14,15]. This crosstalk results in the degradation of the signal and a loss of information.

The major contribution of this paper is a new method to determine which OAM modes are active in a transmitted signal utilizing convolutional neural etworks (CNN), first discussed in [16]. This CNN-based demultiplexing method avoids costly optical solutions by relying only on an intensity image of the unique multiplexing patterns at the receiver side. We test our CNN-based technique against a traditional demultiplexing method, conjugate mode sorting, with various OAM mode sets and levels of simulated atmospheric turbulence in a laboratory setting. The CNN-based method was shown to demultiplex combinatorially multiplexed OAM modes from a fixed set with $> 99 %$ accuracy for high levels of turbulence—well exceeding the conjugate demultiplexing method. We also show that this new method is robust to added sensor noise, number of photon detections, number of pixels, unknown levels of turbulence, and training set size limitations.

This methodology has some similarity to [17,18], where intensity images and machine learning were also combined to create a long distance FSO communication system. Our proposed system, however, differs from this previous work in three regards: the type of machine learning (CNN versus self-organizing-maps), the type of OAM-carrying beam (Bessel–Gauss versus Laguerre–Gauss), and the encoding dictionary creation strategy (multiplexed OAM modes versus ± mode superpositions and relative phase differences).

In Section 2, we derive and describe the Bessel–Gauss beam, an OAM-carrying beam, which we will use in our experimentations. In Section 3, we describe a traditional OAM demultiplexing technique and our proposed CNN-based technique. In Section 4, we describe our laboratory experimental setup, including how we simulate turbulence. In Section 5, we detail our experimental collection procedure for both demultiplexing techniques then conduct several experiments to compare the two methods and assess the limits of the proposed method. Finally, in Section 6, we provide some conclusions and possible new research directions.

2. ORBITAL ANGULAR MOMENTUM

Mathematically, we can describe an electromagnetic wave as a field, $u (x, y, z; t)$ , with spatial coordinates $(x, y, z)$ and time $t$ , which follows the hyperbolic partial differential equation [19]:

\frac{\partial^{2} u}{\partial t^{2}} = c^{2} \nabla^{2} u,

where

\nabla^{2}

is the Laplacian and

c

is the speed of light. If we assume that the field variations are sinusoidal,

u (x, y, z; t) = U (x, y, z) e^{- i ω t}

, then we get the Helmholtz equation:

\nabla^{2} U + k^{2} U = 0,

where

k = ω / c = 2 π / λ

is the wavenumber and

λ

is the wavelength. If we now change to cylindrical coordinates, Eq. (2) becomes

\frac{1}{r} \frac{\partial}{\partial r} (r \frac{\partial \bar{U}}{\partial r}) + \frac{\partial^{2} \bar{U}}{\partial z^{2}} + k^{2} \bar{U} = 0 .

Using a simplification, $V (r, z) = \bar{U} (r, z) e^{- i k z}$ , and the paraxial assumption, $\partial^{2} V / \partial z^{2} = 0$ , we get the paraxial wave equation

\frac{1}{r} \frac{\partial}{\partial r} (r \frac{\partial V}{\partial r}) + 2 i k \frac{\partial V}{\partial z} = 0 .

By solving Eq. (4) in different coordinate systems with different symmetry assumptions, we can define several different beams that carry OAM, one of which is the Bessel beam.

Ideal Bessel beams are described as

u_{B (m)} (r, θ, z) = C_{B} J_{m} (β r) \exp (- i k_{z} z) \exp (i m θ),

where

C_{B}

is constant,

J_{m}

is the order

m

Bessel function, and

β

is the radial frequency,

k = \sqrt{k_{z}^{2} + β^{2}} = 2 π / λ

. Bessel beams, in their solution to Eq. (4), exhibit distance agnostic intensity distribution and thus are considered diffraction-free beams. For a true Bessel beam to be created, it would require an infinite amount of energy to maintain the diffraction-free propagation; however, a Gaussian-tapered Bessel beam, or BGB, can be created where, for a finite distance, the diffraction free property holds (pseudo-diffraction-free beam) [20].

If we assume the problem is circularly symmetric, the BGB can be realized:

u_{B G (m)} (r, θ, z) = \frac{C_{B G} w_{0}}{w (z)} J_{m} (\frac{β r}{1 + i z / z_{r}}) \times \exp [i (k - \frac{β^{2}}{2 k}) z - ζ (z) + \frac{- 1}{w^{2} (z)}] \times \exp [\frac{i k}{2 R (z)} (r^{2} + β^{2} \frac{z_{r}}{k^{2}})] \exp (i m θ),

or when

z = 0

,

u_{B G (m)} (r, θ, z = 0) = C_{B G} J_{m} (β r) \exp [- {(r / w_{0})}^{2}] \exp (i m θ),

where

C_{B G}

is a constant,

ζ (z) = \tan^{- 1} (z / z_{R})

is the Gouy phase,

w (z) = w_{0} \sqrt{1 + (z / z_{R})}

is the beam radius,

w_{0}

is the beam waist,

R (z) = z [1 + {(z_{R} / z)}^{2}]

is the radius of curvature, and

z_{R} = π w_{0}^{2} / λ

is the Rayleigh range. Optically, a BGB is produced by the superposition of Gaussian beams whose axes are uniformly distributed on a cone [8]. The angular half aperture of the cone,

θ_{C}

, is related to the radial frequency,

β

, as

β = k \sin (θ_{C})

[9]. For a fixed

θ_{C}

, as

z

increases, the superposition of the Gaussian beams will break apart and will result in a Gaussian beam. By relating the

θ_{C}

to the angular spread of a Gaussian beam, Gori et al. [9] showed that for a propagation distance

Z

,

Z = w_{0} / θ_{C}

. Using this relation, one can approximately find the radial frequency supporting a propagation distance as such:

β = k \sin (w_{0} / Z)

. We should note here that for

β = 0

, the special case of a Gaussian beam is revealed. An example of a computer simulated BGB (

β = 350

) can be seen in Fig. 1.

Fig. 1. Simulated $u_{B G (5)}$ with $β = 350$ at $z = 0$ . Phase information is represented by the hue while the energy is represented by the normalized intensity. Colorbar is in radians.

Download Full Size | PDF

In the laboratory, BGBs have been created by several methods including modifying a laser beam with a computer-generated hologram [21], a spiral phase plate [22], or cylindrical lenses [23]. Furthermore, it has been shown in [5,6,24,25] that multiplexing and demultiplexing OAM beams are possible for communication links. For ease of experimentation, we will rely on computer-generated holograms displayed on a spatial light modulator (SLM), which will be described in Section 4.C.

3. DETECTING OAM MODES

Due to the orthogonality property of OAM beams, different mode numbers can be multiplexed together or optically combined into a single beam; see Fig. 2 for an example. After propagating through the atmosphere and arriving at the receiver, this multiplexed beam must be demultiplexed to ascertain which modes are present in the signal. We first describe a popular demultiplexing technique that is known as conjugate mode sorting. We will then describe our proposed CNN-based technique.

Fig. 2. Effects of simulated multiplexing different OAM modes together for BGB ( $β = 350$ ) in numerical simulation. Left column is $m = {- 2,7}$ , middle column is $m = {5,7}$ , and right column is $m = {- 2,5, 7}$ . Colorbar is in radians.

Download Full Size | PDF

A. Conjugate Mode Sorting

Conjugate mode sorting [6,26] is a method to determine the OAM mode number of a detected beam based on its orthogonality properties. Given a transmitted OAM beam, $u_{m} (r, θ, Z)$ , we cycle through the support of the mode set, $u_{n}^{*} (r, θ)$ , where $*$ is the complex conjugate, as seen in Fig. 3 forming the product $u_{m} (r, θ, Z) u_{n}^{*} (r, θ)$ . If we detect intensity at the origin, i.e., no doughnut mode, then the transmitted signal contains OAM mode $n$ .

Fig. 3. Conjugate mode sorting for an OAM mode set of $[- 2,5, 7]$ and multiplexed signal with modes $m = - 2$ and $m = 7$ .

Download Full Size | PDF

This sorting method is dependent on having good alignment between the transmitter and the receiver; misalignment is shown to have comparable effects to turbulence in the correct determination of the OAM mode [6]. Due to the effect of turbulence, the normalized energy will not be concentrated at the origin of the correct conjugate mode—thus we have to look at the relative energy across all the modes. For the non-multiplexing case, to correctly determine which OAM mode has been transmitted, one can simply take the maximum value near the origin across the support of the mode set. For the multiplexing case, again to determine which OAM mode(s) have been transmitted, a threshold must be chosen so as to decide whether a mode is present or not in the signal.

In the laboratory, such a system could be designed using a single SLM that cycles through the various conjugate modes, assuming the transmission time was sufficient to complete the range of test modes [27]. A more complex system could also be designed where the incoming beam is tested in series with multiple volume holograms with individual channel detectors [28].

There are other traditional methods for mode sorting that we will be unable to go into detail on here, namely, counting spiral fringes [29], optical transformations [30], measuring the Doppler effect [31], dove prism interferometers [32], and self-organizing-maps [18]. See [33] for a detailed discussion of these other methods.

B. CNN-Based Mode Sorting

CNNs are a supervised machine learning algorithm that can be viewed as a function composition ( $\circ$ ) chain of $L$ (also the number of layers) alternating linear and non-linear functions:

f (x) = a_{L} \circ b_{L} \circ a_{L - 1} \circ b_{L - 1} \circ \dots \circ a_{1} \circ b_{1} (x),

where

a_{j}

is a non-linear activation function and

b_{j} (x) = W_{j} x + β_{j}

is a linear function that applies a set of weights,

W_{j}

, and biases,

β_{j}

, to an input,

x

.

CNNs contain several layers that are composed of convolutional filters that mimic the receptive field known to exist in mammalian eyes. In each convolutional layer, a collection of filter sets, each composed of 2-D filters equal to the number of input channels, are trained. These trained filters are convolved with the input to create a number of convolutional outputs or activations, dependent on the spatial size of the filters, the stride (distance between receptive fields), and padding. The activations are processed by a non-linear activation function, such as a rectified linear unit (ReLU) that allows for additional layers to further contribute to the learning task. Optionally, a max-pooling (MP) operator can be added before the activation function to reduce computational demands and add translational invariance. This translational invariance will be very useful when we conduct our laboratory experiments (see Section 4), as it removes the need for pixel-wise alignment.

After a series of convolutional layers, the notion of spatial information is abandoned and all input neurons become connected to all output neurons in a series of fully connected layers. The fully connected layers are still separated by a non-linear activation function but also contain a regularizing dropout unit to avoid overfitting. The final layer of the network represents the unique classes we wish to separate. During training, the labeled training data are passed through the network multiple times; each complete pass of the training data is known as an epoch. Once a labeled image (in reality, a series of images is processed together in a mini-batch) has been processed by the network, a loss function (e.g., softmax-multinomial-logistic) measures the error. This error is back-propagated through the network using the chain rule and the layer weights are updated using stochastic gradient descent. Once trained, the network in the testing phase will produce a probability for each input as belonging to one of the output classes. The network we have chosen to utilize is known as Alexnet [34] and is composed of 5 convolutional layers and 3 fully connected layers, see Fig. 4 and Table 1 for the network topology. Our variant of Alexnet contains approximately 21.5 million trainable weights.

Table 1. Architecture for CNN-Based Demultiplexing Method

View Table | View all tables in this article

Fig. 4. Alexnet architecture. Note that layers 1–5 are convolutional (and include max-pooling, and ReLU) and layers 6–8 are fully connected (and include dropout and ReLU). The smaller black squares represent the receptive field or convolutional filter.

Download Full Size | PDF

In our proposed CNN-based demultiplexing method, an intensity image will be taken of the received multiplexed pattern. A CNN will be trained on all possible OAM mode patterns for a transmit dictionary, i.e., all possible bit string encodings. For example, if we wanted to transmit messages of bit-length 5, we would require 5 different OAM modes and train the network to distinguish 32 ( $2^{5}$ ) different multiplexed mode patterns. Training of the network would take into account various levels of turbulence, but no adjustments due to angular misalignment (tilt) are necessary. Once an image of the OAM encoded signal is recorded, it is passed through the trained network and a probability is produced for each possible multiplexing (or simply which trained multiplexing class had the highest probability). The probabilities will also produce a measure of certainty for the received beam that could be used in error-correcting codes. For computational reasons it is better to take all the images received over a short time frame and pass them through the trained network together.

We will perform our lab experimentations with CNNs trained and tested on graphical processing units (GPUs). In a deployed system, where size, weight, and power (SWAP) are a concern, the trained networks can be transitioned to field programmable gate arrays (FPGAs) [35] or neuromorphic chips [36].

4. LABORATORY EXPERIMENT

In the following subsections we describe the laboratory setup built to test the proposed CNN-based demultiplexing method, how turbulence is simulated, and how we encoded the OAM multiplexed signal and turbulence hologram onto the SLM.

A. Equipment and Setup

Our lab setup is composed of a 633-nm 5-mW laser, two Forth Dimension Displays binary phase ferroelectric SLMs, a Dalsa GigE camera, and several standard optical tools such as mirrors, pinhole filters, and diffraction order filters. A picture of the lab equipment setup for this experiment can be seen in Fig. 5.

Fig. 5. Photo of laboratory experiment setup.

Download Full Size | PDF

The SLM is programmed with a binary phase hologram that creates the desired mode multiplexing when illuminated with a Gaussian plane wave. We also add, to varying levels, simulated turbulence to the hologram (see next subsection for details). The SLMs and camera are driven by a MATLAB program that controls the timings of the three independent devices.

The camera is aligned such that under the highest turbulence level and mode number that the beam energy all falls in a $256 \times 256$ pixel collection window (which includes a 50-pixel guard region).

B. Simulating Turbulence

Turbulence was simulated by inserting a random phase screen along the propagation path of the beam corresponding to the modified Kolmogorov turbulence model of Andrews [37]:

Ψ (κ) = 0.033 C_{n}^{2} {(κ^{2} + 1 / L_{0}^{2})}^{- 11 / 6} \exp (- κ^{2} / κ_{ℓ}^{2}) \times (1 + 1.082 (κ / κ_{ℓ}) - 0.254 {(κ / κ_{ℓ})}^{7 / 6}),

where

κ

is the spatial frequency (rad/m),

L_{0}

is the outer scale of turbulence,

ℓ_{0}

is the inner scale of turbulence,

κ_{ℓ} = κ / (3.3 ℓ_{0})

, and atmospheric turbulence strength,

C_{n}^{2}

, is the structure constant of the index refraction, a measure of the strength of the turbulence. The Fried parameter, defined as

r_{0} = {(0.423 k^{2} \sec (α) \int_{Path} C_{n}^{2} (z) d z)}^{- 3 / 5},

where

α

is the zenith angle, is a measure of the quality of the transmission through the atmosphere along the defined path. Assuming that we have a constant turbulence strength over the propagation distance and

α = 0

, we may relate

C_{n}^{2}

to the Fried parameter

r_{0}

:

r_{0} = {(0.423 k^{2} Δ z C_{n}^{2})}^{- 3 / 5} .

The phase screen $P$ is created by

P = F^{- 1} {\sqrt{Ψ} C},

where

F^{- 1}

is the inverse 2-D Fourier transform and

C

is a collection of complex Gaussian random variables of the same size as the numerical propagation grid. Subharmonics are added to Eq. (12) by the methods of Lane et al. [38] so as to closer match the theory for lower spatial frequencies. Figure 6 shows two examples of turbulence phase screens created using these methods.

Fig. 6. Example of two realizations of simulated turbulence screens created by the described methods; scale is in radians.

Download Full Size | PDF

C. Creating the Encoding Hologram

To create our OAM-encoded signal, we write a hologram to the SLM, which, after illumination by a Gaussian plane wave, will create a BGB of desired mode numbers with optional simulated turbulence. Let $M = {m_{1}, \dots, m_{t}}$ be the OAM modes we wish to encode and $P$ , from Eq. (12), be the turbulence phase screen of the desired turbulence level. For a numerical grid replicating the pixel array and pixel size of the SLM, let

S [x, y] = \exp (i P [x, y]) \sum_{j = 1}^{t} u_{B G (m_{j})} [x, y],

where

S [x, y]

is the multiplexed beam.

For efficient projection using our binary phase SLM, uniform amplitude beams with a tilt phase and the multiplexed beam phase are interfered to create an off-axis hologram, described by

H [x, y] = {| \exp (i ang (S [x, y])) + \exp (i x) |}^{2} .

To imprint on the SLM, $H$ is then binarized such that the phase value on the SLM is

\hat{H} [x, y] = {\begin{matrix} π & H [x, y] > (| H_{\max} | + | H_{\min} |) / 2 \\ 0 & otherwise \end{matrix} .

In the optical layout, we use a pinhole to pick off the appropriate term of the encoded hologram associated with the multiplexed beam. For the conjugate mode sorting method, a second SLM is present on the receive side. The process to create the hologram is the same, although instead of encoding a multiplexed OAM beam with turbulence, the complex conjugate of a single BGB with mode $m_{j}$ is displayed on the SLM. We cycle through all $j = {1, \dots, t}$ modes serially to determine which modes are present in the initial transmit beam.

5. RESULTS

We will now detail the collection procedure for our lab experiment, the training required for both our proposed CNN-based detection method and the traditional conjugate mode sorting method, and several comparisons based on turbulence level, mode set constituents, training set sizes, and image quality.

A. Collection Procedures

To test our proposed CNN-based mode sorting method, we utilize the equipment described in Section 4.A in two different configurations. In configuration 1, as seen in Fig. 7, we perform the sorting using the conjugate mode sorting method. For this configuration we use both SLMs and for every OAM-coded signal sent by the transmit side we collect, in serial, the conjugate demultiplexed images for every mode in the mode set. In configuration 2, as seen in Fig. 8, we perform the sorting using the proposed CNN-based approach. A sample of the data collected for mode set 1 without turbulence is shown in Fig. 9; a colormap is applied for visualization only. The 32 sub-images correspond to each combinatorial multiplexing.

Fig. 7. Diagram representing the conjugate mode sorting experiment; P = Pinhole and M = Mirror. First, laser light is collimated. Next, a hologram is created to represent the BGB OAM mode-encoded signal with the created random turbulence realization. This hologram is displayed on the first SLM, and after the plane wave interferes with the hologram the beam propagates in free space until reaching the second SLM. Conjugate mode holograms are displayed in serial on the second SLM and the resulting demultiplexed pattern is recorded by the camera.

Download Full Size | PDF

Fig. 8. Diagram representing the CNN-based mode sorting experiment. First, laser light is collimated. Next, a hologram is created to represent the BGB OAM mode-encoded signal with the created random turbulence realization. This hologram is displayed on the SLM, and after the plane wave interferes with the hologram the beam propagates in free space until it is recorded by the camera.

Download Full Size | PDF

Fig. 9. Example of experimental data without turbulence for mode set 1. Title for each sub-image is the bit-string (5 digit binary number) and the modes that are active (set of integers in braces). The images have been cropped and a colormap has been applied for visualization purposes.

Download Full Size | PDF

Using the method described in Section 4.B, we create three different turbulence levels, $D / r_{0} = 5$ , $D / r_{0} = 10$ , and $D / r_{0} = 15$ , where $D$ is the linear dimension of the SLM and $r_{0}$ is the Fried’s parameter, and encode this turbulence with the multiplexed hologram as discussed in Section 4.C. In Table 2, we describe the three different mode sets used in the experiment. The sets are all approximately centered around $m = 0$ but contain different spacing between adjacent modes. Increasing the spacing between modes in the encoding set diminishes the effects of crosstalk as we have a priori knowledge of all the possible transmit modes and discount the crosstalk effects on modes not in the mode set. However, increasing the spacing between modes necessitates the inclusion of modes with higher mode number. Studies have shown that there is an exponential drop-off in channel efficiency (energy detected in the correct mode versus all other modes) with relation to mode number [14,39]; this effect is partially due to higher mode numbers having a wider beam diameter and thus interfering with more turbulence field. These sets thus offer a good means to compare the two different demultiplexing techniques for both extremes of this relation.

Table 2. Mode Sets Used in Experiment

View Table | View all tables in this article

B. Training for Conjugate Mode Sorting

Training for the conjugate mode sorting method proved to be quite difficult due to the optical alignment required—a difficulty we do not encounter with our proposed CNN-based method, which we will discuss later. As stated in Section 4, we first aligned the optical system without turbulence to give proper demultiplexing for single-mode OAM-carrying beams (signals with only a single OAM mode present). With the introduction of turbulence, which introduces lateral and longitudinal shifts in the detected beams due to tip/tilt contribution in the phase, the optical techniques employed alone were not enough to provide adequate demultiplexing quality.

To give the best results possible, we used extra information to align the system, to which the CNN did not have access. First, we performed a series of image processing techniques on the recorded demultiplexing images, including applying a bad pixel mask and smoothing the images with a $3 \times 3$ median filter. For each of the 150 realizations of turbulence (same as the testing split for the CNN-based method), we algorithmically aligned the signals with only a single mode present to their correct conjugate demultiplexing mode. We then found the radius, $r \in [0,50]$ , of a masking circle centered on the alignment previously found, which defined the maximal ratio of energy inside to the energy outside. A threshold was then individually optimized for each turbulence realization so as to minimize the bit-error-ratio (BER) for all combinatorial multiplexing.

C. Training for CNN-Based Demultiplexing Method

To train the CNN-based demultiplexing method, we split the data collected into two separate sets: a training set with 850 different turbulence realizations and a testing set with 150 different turbulence realizations. The training and testing sets are completely independent of one another and do not share any turbulence realizations. The testing set for the CNN-based solution is the same turbulence realizations as those used for the conjugate mode sorting technique.

We did experiment with training the network weights from scratch, but we found slightly better results were obtained by fine-tuning preexisting weights associated with the Imagenet Classification Challenge [40]. In the fine-tuning procedure, the previously trained weights from all but the last layer are transferred to a new network. The final layer is defined to have 32 outputs, one for each unique 5-bit string, initialized as Gaussian random variables. The learning rate for the final layer also contains a learning rate multiplier of 10× that of the other layers. This procedure works well because the weights learned for classification of one type of images serve as a good initialization for the weights needed for the classification of another type of image. Since the lower layers learn the most general weights, we increase the learning rate for the final layer so it can be tuned to the specific task. In Table 3 we define the hyperparameters for training our network; these were chosen to match the literature [34]. Due to the relative ease of the training process, efforts were not made to adjust the training hyperparameters; future studies will examine what effect reducing the number of epochs will have on the accuracy of the trained network.

Table 3. Hyperparameters for CNN Training

View Table | View all tables in this article

The network training was accomplished using the Caffe [41] software package and a Nvidia GeForce GTX TITAN X GPU. Training time took approximately 30 min. Testing times average, with proper batching and preloaded network weights, 1 ms; these times can be further reduced utilizing the specialized hardware as mentioned above.

D. CNN-Based versus Conjugate Mode Sorting

The results for the CNN-based and conjugate mode sorting methods are presented in Tables 4 and 5. In Table 4 we compare the accuracy of the two demultiplexing methods for different levels of turbulence and mode sets. We define accuracy here as the percentage of signals that are demultiplexed correctly. To demultiplex a signal correctly in the CNN-based method means that the CNN correctly classified the image, i.e., the probability assigned to the class corresponding to the correct demultiplexing was greater than that of any other class. In the case of the conjugate mode demultiplexing method, to correctly demultiplex a signal, the decision threshold for each demultiplexing mode must correctly decide if the corresponding mode is present in the signal. The results presented in Table 4 indicate that the CNN-based method easily outperformed the conjugate mode method. The only comparison that was statistically close occurred at the largest mode spacing and lowest turbulence level. Unlike the conjugate mode sorting method, the CNN-based method showed very little accuracy decrease in relation to smaller mode spacing. This indicates the CNN-based method can admit a larger encoding set and can better adapt to large transmission distances where higher mode numbers cause unsatisfactory beam divergence. The CNN-based technique also showed very limited decrease in performance with increasing levels of turbulence (unlike the conjugate mode method). In Table 5, we make a similar comparison but in relation to the BER. BER is defined as the ratio of bits decoded incorrectly to the total number of bits transmitted; 0.0 indicates all bits were decoded correctly, 1.0 indicates all bits were decoded incorrectly, and 0.5 indicates the decoded system performed randomly. In this comparison, we see similar trends between the two methods. We note here that we have provided raw BER and in practice, forward error correcting codes [42], multiple-input and multiple-output (MIMO) [43], spatial diversity [44], adaptive optics [45], or low-density parity check (LDPC)-codes [28], would be used to vastly improve these results.

Table 4. Demultiplexing Accuracy

View Table | View all tables in this article

Table 5. Demultiplexing BER

View Table | View all tables in this article

E. Unknown Level of Turbulence

We can also test our CNN-based technique for unknown levels of atmospheric turbulence. First, we conduct a leave-one-out experiment with the three different levels of turbulence. In Table 6 we train a CNN with only two of the three turbulence levels for each mode set, then measure the accuracy independently for each of the turbulence levels. For example, in the first row of Table 6 we show the results for a CNN trained on mode set 1 (column 1) without turbulence level $D / r_{0} = 5$ (column 2) in the training set: in other words the training set comprised turbulence realizations of level $D / r_{0} = 10$ and $D / r_{0} = 15$ . Still in reference to row 1, column 3 shows the results for testing the trained network with turbulence realizations of level $D / r_{0} = 5$ ; similarly, columns 4 and 5 show the results for testing on $D / r_{0} = 10$ and $D / r_{0} = 15$ , respectively. Even though the network is not directly trained on the missing turbulence level, it is able to develop generalized features from the other turbulence levels that it is trained on. We see a small performance drop in the highest level of turbulence because the network now needs to generalize more; however, we see an improvement in the lower levels of turbulence as the network has additional information (from the other turbulence realizations) to develop richer features. In Table 7 we train a CNN using all three of the turbulence levels and report accuracies for testing on the individual levels and overall. We see similar results here to the other experiments with decreasing performance with increasing turbulence levels, now magnified slightly because of the need to generalize further to the additional turbulence levels.

Table 6. Demultiplexing Accuracy for CNN Trained with Only 2 of the 3 Turbulence Levels

View Table | View all tables in this article

Table 7. Demultiplexing Accuracy for CNN Trained with All 3 Turbulence Levels

View Table | View all tables in this article

The shaded cells represent those turbulence values not included in the training set.

As getting high SNR, well controlled high-resolution data becomes challenging once moving from the away from the laboratory, we will now address in four separate quality studies, the performance of our CNN-based method in relation to: number of pixels, number of training samples, number of photons, and level of sensor noise.

F. Number of Pixels

In this subsection we will look at the number of pixels contained in the recorded image. We developed our experimental collection with the Alexnet architecture in mind; thus, the system was aligned to provide a $256 \times 256$ window centered on the OAM beam intensity pattern. To allow for beam wander caused by turbulence, the beams were centered such that the largest mode had a 50-pixel guard window surrounding it before the addition of turbulence. To test the effect of a reduced number of pixels in the recorded image $I$ we will down-sample, $D_{x}$ , by a factor of $x$ in both spatial dimensions, then up-sample, $U_{x}$ , by the same factor to produce the reduced pixel image $\tilde{I}$ :

\tilde{I} = U_{x} (D_{x} (I)) .

The up- and down-sample operator is computed numerically using bicubic interpolation, and an example of the transformation can be seen in Fig. 10. The results of this experiment are shown in Table 8 for reductions of 4 ( $x = 2$ ), 16 ( $x = 4$ ), and 36 ( $x = 6$ ); for brevity we list only the results from the highest turbulence level. We notice a small decrease in demultiplexing accuracy, but given the large decrease in image quality and the relatively small difference between multiplexing patterns differing by only a bit in their bit-string representations, the CNN-based method proved adept with a limited number of pixels. These experiments were conducted without modifying the network structure or hyperparameters so results should be interpreted as lower bounds; in best practice a completely new architecture should be learned.

Table 8. Demultiplexing Accuracy with Reduced Spatial Image Sizes for $D / r_{0} = 15$

View Table | View all tables in this article

Fig. 10. Example of reduced spatial image size for the multiplexed BGB with $m = {- 7,8, 13}$ and $D / r_{0} = 5$ . From left to right, $256 \times 256,128 \times 128,64 \times 64$ , and $42 \times 42$ . Colormap and crop applied for visualization only.

Download Full Size | PDF

G. Number of Training Samples

As collecting training data can be time consuming and with a limited window to collect similar turbulence levels, we now consider the results obtainable with a smaller collection of provided training data. In this experiment, we limit the training set to sizes of 10, 50, and 100 samples while leaving the testing unchanged; for simplicity, we kept the first $n$ turbulence realization for the desired training set sizes. We show in Table 9 that when moving from 850 samples per multiplexing pattern to 100 samples, the overall accuracy suffers only about two percentage points. Further, reducing the training set to 10 samples, an 85-fold decrease in the number of samples, still produces viable results. For smaller training set sizes, one could use data augmentation schemes such as flips, rotations, crops, jitters, skews, etc., or simulated data [39], of which only the first three were employed in this study to produce better results.

Table 9. Demultiplexing Accuracy with Reduced Training Set Size for $D / r_{0} = 15$

View Table | View all tables in this article

H. Number of Photons

Sensor quality is another aspect of image quality to consider. The focal-plane array (FPA) used to image the incoming signal counts the number of photons striking each cell; due to the physical properties of the FPA or the rate that signals are transmitted, this defines another aspect of the quality of the collected image. To explore this limitation, without relying on recollecting our data, we look at different levels of quantization or bit depth for our collected data. The camera sensor originally recorded the image, $I$ , at 8 bits; we will digitally apply 2 different quantization levels, 7-bit and 6-bit, to the collected data to achieve ${\bar{I}}^{b}$ :

{\bar{I}}^{b} = Q_{b} (I),

where

Q_{b}

is a quantization operator producing a bit depth of

b

; an example of this transformation can be seen in Fig. 11. The CNN defined earlier is then retrained and tested on these different quantization levels; the results for the largest turbulence case are shown in Table 10. We notice that we get a marginal drop-off in demultiplexing accuracy with the 4-fold decrease in the level of quantization. This manageable drop-off in performance indicates that the CNN-based approach can handle increased transmission rates requiring quicker capture times and thus less photons.

Fig. 11. Example of different quantization levels for the multiplexed BGB with $m = {- 7,8, 13}$ and $D / r_{0} = 5$ . From left to right, 8-, 7-, and 6-bit. Colormap and crop applied for visualization only.

Download Full Size | PDF

Table 10. Demultiplexing Accuracy with Increased Levels of Quantization for $D / r_{0} = 15$

View Table | View all tables in this article

I. Level of Sensor Noise

Finally we considered sensor noise. Before adding white Gaussian noise, we measured the average read noise level for the camera for a baseline; we found it to be $σ = 0.21$ . For an image $I$ (note this is 8-bit), let ${\bar{I}}_{x}$ be the image produced by added white Gaussian noise, $N_{x}$ with standard deviation $x$ :

{\bar{I}}_{x} = Q_{8} (I + N_{x}) .

In Fig. 12 we show an example of this added white Gaussian noise and in Table 11 the results for the added noise experiment are shown for $σ = 10,20, 30,40$ . The $σ = 10$ is already a large amount of added sensor noise and the CNN-based method was able to provide demultiplexing accuracies close to that of the added noise-free images. Increasing the sensor noise to extreme levels, i.e., $σ = 40$ , the CNN-based method still achieved results in excess of the noise-free conjugate mode sorting method.

Table 11. Demultiplexing Accuracy with Added Sensor Noise for $D / r_{0} = 15$

View Table | View all tables in this article

Fig. 12. Example of added sensor noise for the multiplexed BGB with $m = {- 7,8, 13}$ and $D / r_{0} = 5$ . From left to right, additive white Gaussian noise=0, 10, 20, 30, and 40. Colormap and crop applied for visualization only.

Download Full Size | PDF

6. CONCLUSIONS

We have shown the validity of our proposed demultiplexing scheme that utilizes machine learning, in the form of a CNN, to distinguish the unique OAM multiplexed signals based on their intensity patterns. We have shown this methodology can easily handle different turbulence levels, outperforming the conjugate mode sorting method, and provided results for different extremes of collected image quality and size. Our developed method has the nice advantage that, given a reasonable amount of training data, the receive side requires only a small pixel-count imager to record the OAM multiplexed signal. This advantage mitigates the cost of deploying such a system as it removes the need for a complicated and expensive optical system (such as a SLM). Our solution will also theoretically be compatible with OAM multiplexing combined with other multiplexing techniques (which we would like to test in additional studies). One such example would be a wavelength and OAM multiplexed signal (as in [25]); in this scenario, a series of wavelength filters could be deployed in front of the imager or a compact multispectral camera can be used. Once the signal has been wavelength demultiplexed, the CNN-based solution would work as presented above.

This paper is only an initial exploration of this technique—further work needs to be done in selecting an optimal network structure for the demultiplexing task and tuning that structure to our particular problem constraints. In this study we choose to use the popular Alexnet architecture but there are other choices, such as VGG19 [46], GoogLeNet (Inception) [47], and Resnet [48], which have surpassed the classification accuracy results originally obtained by Alexnet. Tuning the network, e.g., pruning unnecessary nodes and layers to reduce the number of computations and storage cost for the network weights, is also important for achieving the quickest demultiplexing possible (which directly affects the communication rate). Such network tuning is also vital for developing a deployed solution utilizing neuromorphic chips. Current work is focused on developing network structures that handle multi-label inputs so that the output layer of the network can grow proportional to the size of the encoding mode set. We are also planning on studying the applicability of the CNN-based method to other OAM carrying beams, e.g., Laguerre–Gauss [10], and simulating in the lab greater transmission distances [49].

Funding

U.S. Naval Research Laboratory (NRL).

REFERENCES

1. H. Willebrand and B. S. Ghuman, Free Space Optics: Enabling Optical Connectivity in Today’s Networks (SAMS, 2002).

2. M. Toyoshima, “Trends in satellite communications and the role of optical free-space communications,” J. Opt. Netw. 4, 300–311 (2005). [CrossRef]

3. J. C. Juarez, A. Dwivedi, A. R. Hammons, S. D. Jones, V. Weerackody, and R. A. Nichols, “Free-space optical communications for next-generation military networks,” IEEE Commun. Mag. 44(11), 46–51 (2006). [CrossRef]

4. A. K. Majumdar and J. C. Ricklin, Free-Space Laser Communications: Principles and Advances (Springer, 2010), Vol. 2.

5. J. Wang, J.-Y. Yang, I. M. Fazal, N. Ahmed, Y. Yan, H. Huang, Y. Ren, Y. Yue, S. Dolinar, M. Tur, and A. E. Willner, “Terabit free-space data transmission employing orbital angular momentum multiplexing,” Nat. Photonics 6, 488–496 (2012). [CrossRef]

6. G. Gibson, J. Courtial, M. J. Padgett, M. Vasnetsov, V. Pas’ko, S. M. Barnett, and S. Franke-Arnold, “Free-space information transfer using light beams carrying orbital angular momentum,” Opt. Express 12, 5448–5456 (2004). [CrossRef]

7. B. Guan, R. P. Scott, C. Qin, N. K. Fontaine, T. Su, C. Ferrari, M. Cappuzzo, F. Klemens, B. Keller, M. Earnshaw, and S. J. B. Yoo, “Free-space coherent optical communication with orbital angular, momentum multiplexing/demultiplexing using a hybrid 3D photonic integrated circuit,” Opt. Express 22, 145–156 (2014). [CrossRef]

8. J. Durnin, J. J. Miceli Jr., and J. H. Eberly, “Diffraction-free beams,” Phys. Rev. Lett. 58, 1499–1501 (1987). [CrossRef]

9. F. Gori, G. Guattari, and C. Padovani, “Bessel–Gauss beams,” Opt. Commun. 64, 491–495 (1987). [CrossRef]

10. L. Allen, M. W. Beijersbergen, R. J. C. Spreeuw, and J. P. Woerdman, “Orbital angular momentum of light and the transformation of Laguerre–Gaussian laser modes,” Phys. Rev. A 45, 8185–8189 (1992). [CrossRef]

11. A. Siegman, “Hermite–Gaussian functions of complex argument as optical-beam eigenfunctions,” J. Opt. Soc. Am. 63, 1093–1094 (1973). [CrossRef]

12. M. A. Bandres and J. C. Gutiérrez-Vega, “Ince–Gaussian beams,” Opt. Lett. 29, 144–146 (2004). [CrossRef]

13. J. C. Gutiérrez-Vega, M. D. Iturbe-Castillo, and S. Chávez-Cerda, “Alternative formulation for invariant optical fields: Mathieu beams,” Opt. Lett. 25, 1493–1495 (2000). [CrossRef]

14. J. A. Anguita, M. A. Neifeld, and B. V. Vasic, “Turbulence-induced channel crosstalk in an orbital angular momentum-multiplexed free-space optical link,” Appl. Opt. 47, 2414–2429 (2008). [CrossRef]

15. W. Nelson, J. P. Palastro, C. C. Davis, and P. Sprangle, “Propagation of Bessel and Airy beams through atmospheric turbulence,” J. Opt. Soc. Am. A 31, 603–609 (2014). [CrossRef]

16. T. Doster and A. T. Watnik, “Measuring multiplexed OAM modes with convolutional neural networks,” in Lasers Congress (ASSL, LSC, LAC) (Optical Society of America, 2016), paper LTh3B.2.

17. M. Krenn, J. Handsteiner, M. Fink, R. Fickler, R. Ursin, M. Malik, and A. Zeilinger, “Twisted light transmission over 143 km,” Proc. Natl. Acad. Sci. USA 113, 13648–13653 (2016). [CrossRef]

18. M. Krenn, R. Fickler, M. Fink, J. Handsteiner, M. Malik, T. Scheidl, R. Ursin, and A. Zeilinger, “Communication with spatially modulated light through turbulent air across Vienna,” New J. Phys. 16, 113028 (2014). [CrossRef]

19. L. C. Andrews and R. L. Phillips, Laser Beam Propagation through Random Media (SPIE, 2005), Vol. 52.

20. R. L. Nowack, “A tale of two beams: an elementary overview of Gaussian beams and Bessel beams,” Stud. Geophys. Geod. 56, 355–372 (2012). [CrossRef]

21. N. R. Heckenberg, R. McDuff, C. P. Smith, and A. G. White, “Generation of optical phase singularities by computer-generated holograms,” Opt. Lett. 17, 221–223 (1992). [CrossRef]

22. M. W. Beijersbergen, R. P. C. Coerwinkel, M. Kristensen, and J. P. Woerdman, “Helical-wavefront laser beams produced with a spiral phaseplate,” Opt. Commun. 112, 321–327 (1994). [CrossRef]

23. M. W. Beijersbergen, L. Allen, H. van der Veen, and J. P. Woerdman, “Astigmatic laser mode converters and transfer of orbital angular momentum,” Opt. Commun. 96, 123–132 (1993). [CrossRef]

24. T. Su, R. P. Scott, S. S. Djordjevic, N. K. Fontaine, D. J. Geisler, X. Cai, and S. J. B. Yoo, “Demonstration of free space coherent optical communication using integrated silicon photonic orbital angular momentum devices,” Opt. Express 20, 9396–9402 (2012). [CrossRef]

25. H. Huang, G. Xie, Y. Yan, N. Ahmed, Y. Ren, Y. Yue, D. Rogawski, M. J. Willner, B. I. Erkmen, K. M. Birnbaum, S. J. Dolinar, M. P. J. Lavery, M. J. Padgett, M. Tur, and A. E. Willner, “100 Tbit/s free-space data link enabled by three-dimensional multiplexing of orbital angular momentum, polarization, and wavelength,” Opt. Lett. 39, 197–200 (2014). [CrossRef]

26. A. Mair, A. Vaziri, G. Weihs, and A. Zeilinger, “Entanglement of the orbital angular momentum states of photons,” Nature 412, 313–316 (2001). [CrossRef]

27. A. Forbes, A. Dudley, and M. McLaren, “Creation and detection of optical modes with spatial light modulators,” Adv. Opt. Photon. 8, 200–227 (2016). [CrossRef]

28. I. B. Djordjevic and M. Arabaci, “LDPC-coded orbital angular momentum (OAM) modulation for free-space optical communication,” Opt. Express 18, 24722–24728 (2010). [CrossRef]

29. M. S. Soskin, V. N. Gorshkov, M. V. Vasnetsov, J. T. Malos, and N. R. Heckenberg, “Topological charge and angular momentum of light beams carrying optical vortices,” Phys. Rev. A 56, 4064–4075 (1997). [CrossRef]

30. M. P. J. Lavery, G. C. G. Berkhout, J. Courtial, and M. J. Padgett, “Measurement of the light orbital angular momentum spectrum using an optical geometric transformation,” J. Opt. 13, 064006 (2011). [CrossRef]

31. M. P. J. Lavery, F. C. Speirits, S. M. Barnett, and M. J. Padgett, “Detection of a spinning object using light’s orbital angular momentum,” Science 341, 537–540 (2013). [CrossRef]

32. J. Leach, M. J. Padgett, S. M. Barnett, S. Franke-Arnold, and J. Courtial, “Measuring the orbital angular momentum of a single photon,” Phys. Rev. Lett. 88, 257901 (2002). [CrossRef]

33. D. L. Andrews and M. Babiker, eds., The Angular Momentum of Light (Cambridge University, 2012).

34. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (2012), pp. 1097–1105.

35. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks,” in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM, 2015), pp. 161–170.

36. P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science 345, 668–673 (2014). [CrossRef]

37. L. C. Andrews, “An analytical model for the refractive index power spectrum and its application to optical scintillations in the atmosphere,” J. Mod. Opt. 39, 1849–1853 (1992). [CrossRef]

38. R. G. Lane, A. Glindemann, and J. C. Dainty, “Simulation of a Kolmogorov phase screen,” Waves Random Media 2, 209–224 (1992). [CrossRef]

39. T. Doster and A. T. Watnik, “Laguerre–Gauss and Bessel–Gauss beams propagation through turbulence: analysis of channel efficiency,” Appl. Opt. 55, 10239–10246 (2016). [CrossRef]

40. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis. 115, 211–252 (2015). [CrossRef]

41. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM International Conference on Multimedia (ACM, 2014), pp. 675–678.

42. W. C. Huffman and V. Pless, Fundamentals of Error-Correcting Codes (Cambridge University, 2010).

43. D. Gesbert, M. Shafi, D. Shiu, P. J. Smith, and A. Naguib, “From theory to practice: an overview of MIMO space-time coded wireless systems,” IEEE J. Sel. Areas Commun. 21, 281–302 (2003). [CrossRef]

44. Y. Ren, Z. Wang, G. Xie, L. Li, A. J. Willner, Y. Cao, Z. Zhao, Y. Yan, N. Ahmed, N. Ashrafi, S. Ashrafi, R. Bock, M. Tur, and A. E. Willner, “Atmospheric turbulence mitigation in an OAM-based MIMO free-space optical link using spatial diversity combined with MIMO equalization,” Opt. Lett. 41, 2406–2409 (2016). [CrossRef]

45. Y. Ren, G. Xie, H. Huang, C. Bao, Y. Yan, N. Ahmed, M. P. Lavery, B. I. Erkmen, S. Dolinar, M. Tur, M. A. Neifeld, M. J. Padgett, R. W. Boyd, J. H. Shapiro, and A. E. Willner, “Adaptive optics compensation of multiple orbital angular momentum beams propagating through emulated atmospheric turbulence,” Opt. Lett. 39, 2845–2848 (2014). [CrossRef]

46. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556 (2014).

47. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” arXiv:1409.4842 (2014).

48. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385 (2015).

49. B. Rodenburg, M. Mirhosseini, M. Malik, O. S. Magaña-Loaiza, M. Yanakas, L. Maher, N. K. Steinhoff, G. A. Tyler, and R. W. Boyd, “Simulating thick atmospheric turbulence in the lab with application to orbital angular momentum communication,” New J. Phys. 16, 033020 (2014). [CrossRef]

Layer Name	Num. Filters	Filter Size	Filter Stride	Activation Function
Conv1	96	$11 \times 11$	4	MP( $2 \times 2$ ) + ReLU
Conv2	256	$5 \times 5$	1	MP( $2 \times 2$ ) + ReLU
Conv3	384	$3 \times 3$	1	ReLU
Conv4	384	$3 \times 3$	1	ReLU
Conv5	256	$3 \times 3$	1	MP( $2 \times 2$ ) + ReLU
FC6	4096	—	—	ReLU + Dropout
FC7	4096	—	—	ReLU + Dropout
FC8	32	—	—	—

Set #	Modes
1	{−3, −1, 2, 4, 6}
2	{−4, −1, 2, 5, 8}
3	{−7, −2, 3, 8, 13}

Parameter	Value
Iterations	4000
Step size	1000
Batch size	256
Base learning rate	0.001
Gamma	0.1
Momentum	0.9
Weight decay	0.0005

	$D / r_{0} = 5$		$D / r_{0} = 10$		$D / r_{0} = 15$
Set	Conj.	CNN	Conj.	CNN	Conj.	CNN
1	80.15	99.92	61.29	99.56	35.25	99.48
2	76.88	99.94	63.27	99.69	38.79	99.43
3	98.23	99.90	91.25	99.63	62.33	99.30

	$D / r_{0} = 5$		$D / r_{0} = 10$		$D / r_{0} = 15$
Set	Conj.	CNN	Conj.	CNN	Conj.	CNN
1	.04513	.00004	.10470	.00200	.22238	.00258
2	.05208	.00025	.09337	.00125	.18929	.00489
3	.00363	.00042	.01900	.00150	.09554	.00183

Machine learning approach to OAM beam demultiplexing via convolutional neural networks

Abstract

1. INTRODUCTION

2. ORBITAL ANGULAR MOMENTUM

3. DETECTING OAM MODES

A. Conjugate Mode Sorting

B. CNN-Based Mode Sorting

4. LABORATORY EXPERIMENT

A. Equipment and Setup

B. Simulating Turbulence

C. Creating the Encoding Hologram

5. RESULTS

A. Collection Procedures

B. Training for Conjugate Mode Sorting

C. Training for CNN-Based Demultiplexing Method

D. CNN-Based versus Conjugate Mode Sorting

E. Unknown Level of Turbulence

F. Number of Pixels

G. Number of Training Samples

H. Number of Photons

I. Level of Sensor Noise

6. CONCLUSIONS

Funding

REFERENCES

Cited By

Figures (12)

Tables (11)

Equations (18)

Applied Optics

Set	Missing	$D / r_{0} = 5$	$D / r_{0} = 10$	$D / r_{0} = 15$
1	5	100.00	99.96	99.15
	10	100.00	99.90	98.83
	15	100.00	99.59	96.25
2	5	100.00	99.85	98.56
	10	100.00	99.90	98.54
	15	100.00	99.85	98.56
3	5	99.98	99.77	98.48
	10	100.00	99.73	98.08
	15	99.98	99.77	98.48

	Number of Pixels
Set	$256^{2}$	$128^{2}$	$64^{2}$	$42^{2}$
1	99.48	99.29	97.69	96.58
2	99.43	98.04	97.28	95.01
3	99.30	97.56	97.02	94.67

	Number of Samples
Set	850	100	50	10
1	99.48	98.00	97.75	90.08
2	99.43	97.41	96.69	89.00
3	99.30	97.00	96.40	88.60

Set	8-bit	7-bit	6-bit
1	99.48	99.36	99.40
2	99.43	99.17	98.71
3	99.30	98.84	98.58

Set	$σ = 0$	$σ = 10$	$σ = 20$	$σ = 30$	$σ = 40$
1	99.48	97.23	93.02	84.52	75.02
2	99.42	95.98	89.75	81.44	76.30
3	99.30	95.46	90.5	85.42	79.16