Using a deep learning algorithm in image-based wavefront sensing: determining the optimum number of Zernike terms

Jafar Bakhtiar Shohani; Morteza Hajimahmoodzadeh; Morteza Hajimahmoodzadeh; Hamidreza Fallah; Hamidreza Fallah

doi:10.1364/OPTCON.485330

1. Introduction

Adaptive optics (AO) is a prevalent technology to correct the aberrations of distorted wavefronts caused by atmospheric turbulence. The initial methods of AO were applied using wavefront sensors including Shack-Hartmann, pyramid, wavefront curvature, etc. However, these sensors have limitations that reduce the performance of AO systems. As an example, Gratadour surveyed the limitations of the Shack-Hartmann, which is one of the most commonly used wavefront sensors [1]. The main part of a closed-loop AO system is the control unit that receives the distorted wavefront and adjusts the phase corrector in order to reach the ideal ones. Some methods have been proposed to use optimization algorithms in the control unit of AO systems. For instance, Vorontsov et al. utilized theoretically and experimentally the generalized gradient descent optimization to reach a real-time wavefront compensation [2]. Likewise, he introduced a new technique to obtain a fast convergence rate in a sensor-based AO using the decoupled stochastic parallel gradient descent algorithm. He claimed that the compensation could be done in a few iterations [3]. Yazdani et al. employed the imperialist competitive algorithm (ICA) in close-loop adaptive optics numerically, with the Strehl ratio as the cost function. They compared the convergence speed of some other optimization algorithms by the ICA [4]. As another example of using optimization algorithms in propagating and focusing light through turbulent media, Fayyaz et al. studied the application of simulated annealing for focusing light through a turbid medium [5]. Different works have also been conducted in wavefront sensing using different methods such as random mask [6,7], spatial light modulator and random sampling [8,9], phase correction of orbital angular momentum mode [10,11], and adaptive optics for wireless underwater communications [12,13].

In the recent decade, sensor-less AO has played an important role in many fields of optical engineering including medical imaging [14–18], microscopy [19–23], beam shaping [24], optical communication [25], etc. It works by directly estimating wavefront coming to the imaging system without any need to use the wavefront sensors. Therefore, only the observed images are sufficient. Nevertheless, this technique still works by iteration-based algorithms. Recently, wavefront compensation has been investigated using artificial neural networks and machine learning methods. In these methods, a large number of image data are given to a network. After extracting features and performing learning steps, which are elaborated in the following section, an intelligent network could be built that has the ability of prediction. Such a network can provide the Zernike coefficients which are the expected output. The prediction task for new images which are not included in the training dataset is usually a real-time process that is an advantage compared to iteration-based wavefront sensing. The appropriate choice for this purpose is convolutional neural network, since the network is dealing with images with some features. For sensor-less wavefront sensing, a convolutional neural network (CNN) is capable of extracting useful features during the learning process. The non-iteration-based techniques are used in AO literature in different configurations [26–30].

In this paper, a new configuration of CNN is proposed to predict the Zernike coefficients in image-based wavefront sensing. Also, by comparing the different scenarios of considering the different number of Zernike terms, the optimum number of terms is obtained and reported.

2. Theory

2.1. Wavefront generation

In AO, one of the most suitable options to describe optical wavefront is Zernike expansion. Each term in this expansion represents an aberration of a wavefront and the strength of each aberration is determined by the corresponding coefficient. Generally, a wavefront can be expanded using Zernike polynomials ${Z_i}(r,\theta )$ as below:

(1)$$W(r,\theta ) = \sum\limits_j {{a_j}{Z_j}(r,\theta )} ,$$

As light passes through the turbulent atmosphere, the wavefront changes in time and space according to the random changes in refractive index in time and space. Some models have been proposed to simulate this effect. One of the most well-known models is modified Von-Karman, whose power spectrum equation is defined as [31]:

(2)$$\Phi \,(k) = 0.033\,C_n^2\,{({k^2} + k_0^2)^{ - 11/6}}\exp ( - {k^2}/k_m^2)\,,\,\,\,\,\,0 \le k < \infty ,$$

where $C_n^2$ is the refractive index structure function, $k = \,\,|2\pi ({f_x}\hat{i} + {f_y}\hat{j})|$, ${f_x}$and ${f_y}$ are the discrete spatial frequencies in x and y directions, respectively. ${k_m} = 5.92/{l_0}$, ${k_0} = 2\pi /{L_0}$, and ${L_0}$, ${l_0}$ are the outer and inner scale of the turbulence.

To simulate the optical atmospheric turbulence, the first step is generating phase screens. This could be performed by using the Fourier transform as:

(3)$$\varphi (x,y) = \sum\limits_n {\sum\limits_m {{c_{n,m}}\exp \,[\,2\pi i\,(x\,{f_{{x_n}}} + y\,{f_{{y_m}}})\,]} } ,$$

in which, ${c_{n,m}}$ are complex numbers. Welsh claimed that the ${c_{n,m}}$ have circular gaussian statistics where the mean of both imaginary and real parts is zero and their variance is proportional to the power spectrum of turbulence as [32]:

(4)$$\left\langle {|{c_{n,m}}{|^2}} \right\rangle = \frac{{\Phi \,({f_{{x_n}}},{f_{{y_m}}})}}{{{L_x}{L_y}}}\,,$$

In this equation, $\Phi \,({f_{{x_n}}},{f_{{y_m}}})$ represents the power spectral density, and ${L_x}$, ${L_y}$ are grid sizes in the x and y directions, respectively. It should be noted that the low order modes (such as tip-tilt) could not be realized by this approach accurately. This issue can be solved by using the subharmonic method. The phase produced by this method could be expressed by [33]:

(5)$${\varphi _{LF}}(x,y) = \sum\limits_{p = 1}^N {\sum\limits_n^1 {\sum\limits_m^1 {{c_{n,m}}\,\exp \,[\,2\pi i\,\,(x{f_{p,{x_n}}} + y{f_{p,{y_m}}})\,],} } }$$

where the value of p corresponds to different grid points, and “LF” represents low frequency. This phase is added to the phase in Eq. (3) to cover the low spatial frequencies.

2.2. CNN

CNN revolutionized many areas of computer vision problems such as object recognition, tracking, face recognition, etc. A typical CNN consists of different types of layers which some of them are used for feature extraction and some of them are called dense layers which comprise a number of neurons connected to each other in some layers. Each neuron receives a set of input in which inner multiplication is performed between input and weights. The result is added to a bias term. Subsequently, it is passed through an activation function (f) yielding an output. This process is shown in Fig. (1).

Fig. 1. A neuron with some input and a calculation to get output.

Download Full Size | PDF

The connection of these layers can build an artificial neural network (ANN). Nevertheless, ANN cannot work for 2D arrays like images, efficiently. Each pixel value in images is considered as an input for the neural network which composes a huge set of inputs in total. These inputs require numerous neuron connections. This could cause many challenges in the learning process and reduce the performance of the network. Therefore, dimension reduction is an important task to do in dealing with images. CNN configuration can afford dimension reduction while it maintains the features of images. These networks are usually designed to work for more than one-dimensional arrays. Contrary to multi-layer perceptron (MLP) networks, the CNN does not change the input structure to a column, therefore maintaining the connection of neighboring pixels is necessary. This is mainly important for the feature extraction operation in CNN. The key part of the CNN is the convolution layer which accounts for the majority of calculations. Each convolution layer contains some filters, and the outputs (feature maps) are the results of the convolution between the filters and the input images. Each filter can search for a specific pattern in an image. In the network training process, we are looking for appropriate filters to extract meaningful patterns from each image. Therefore, the weights of filters are called trainable parameters during the learning process. A regular CNN consists of some feature extraction steps where the outputs are passed to some dense layers. In the convolution process, a filter slides across an image. The outcome of the pixel-wise multiplication of filter elements and pixel values is considered the new pixel value. The general equation of two-dimensional convolution is defined as:

(6)$$(I \ast f)(x,y) = \sum\limits_m {\sum\limits_n {I(x - m,y - n)\,f(m,n),} }$$

where I and f represent the image and filter, respectively. In addition to convolution, there are some pooling layers in CNN. In general, there are two types of maximum and average pooling processes. The concept behind a pooling layer is related to reducing the dimension of data. Indeed, maintaining the features in images and also reducing the size of data can empower the CNN to avoid overfitting. LeCun introduced the CNN algorithm for the first time [34].

Assessment of training a machine learning algorithm is accomplished by using an optimizer and a cost function. In this paper, adaptive moment estimation (Adam) and mean square error (MSE) are used as optimizer and cost function, respectively. Adam is a well-known optimizer that is an extension of the stochastic gradient descent algorithm [35]. Learning rate is a hyper-parameter in the optimization algorithm in which Adam has an adaptive learning rate.

One of the most common cost functions for regression-based machine learning problems is Mean Square Error (MSE) which is defined as:

(7)$$MSE = \frac{1}{n}\,\sum\limits_{i = 1}^n {({y_i} - {{\hat{y}}_i}} {)^2},$$

where n is the number of samples, and ${y_i}$, ${\hat{y}_i}$ are the actual and predicted values.

There are other machine learning algorithms such as long-short term memory (LSTM) [36], k-nearest neighbor (KNN) [37], recurrent neural network (RNN) [38], etc. that could be applied in various areas and problems. LSTM and RNN usually investigate the temporal evolution of data and KNN is usually applied to classification problems. Hence, they are not very effective for the purpose of this work. For wavefront sensing, we are dealing with images which contain a set of features. Hence, the best option is applying a method that can extract features. Among the different machine learning algorithms, CNN is the most effective option for sensor-less wavefront, because it works based on feature extraction by convolution and pooling operations.

3. Results

3.1. Data generation

In this paper, recovering the phases which are affected by the optical atmospheric turbulence is reported. To this purpose, phase screens are generated according to the modified Von-Karman atmospheric model which was explained in the previous section. Our dataset includes 18000 focused and defocused images generated to feed the CNN. The first 77 Zernike coefficients, except for the piston (n = 0), are considered as the labels for each pair of images. 80% of the generated data is dedicated to the training and 20% is considered for the test data. Moreover, 20% of the training data is designated to the validation data. The input data to the CNN network must be generated in such a way that the most suitable weights are obtained during the learning process. To reach this aim, the parameters of the CNN should be chosen appropriately. Although, there is no unique configuration to acquire the best deep learning model, we examined some trials to reach a high accuracy and generalized network. In fact, the importance of this task is that the network should give us the most appropriate answers after learning process for the test data. For this purpose, we considered variable parameters in the data generation procedure. These parameters are given in Table (1). Moreover, to build an accurate network, the amount and variety of data is important. In image-based wavefront sensing, we deal with intensity patterns. It should be emphasized that one intensity pattern in focal plane cannot guarantee the uniqueness of phase reaching to the imaging system. Therefore, we considered pairs of images, focused and defocused, as the input images to the CNN. This is shown in Fig. (2).

Fig. 2. Acquisition a pair of focused and defocused images for $C_n^2 = 9 \times {10^{ - 14}}\,{m^{ - 2/3}}$.

Download Full Size | PDF

Table 1. Parameters in the simulation

View Table | View all tables in this article

The primary step to acquire images to feed the CNN is generating many turbulent wavefronts. This is done according to the explanations in the wavefront generation section. To simulate the turbulent images, point spread function (PSF) can map these wavefronts to intensity images by [39]:

(8)$$PSF = {|{h(x,y)} |^2},$$

where $h(x,y)$ represents the impulse response which is defined by:

(9)$$h(x,y) = FT[A(x^{\prime},y^{\prime})\exp \{ i\varphi (x^{\prime},y^{\prime})\} ],$$

in which FT represents the Fourier transform, $\varphi (x^{\prime},y^{\prime})$ is the turbulent wavefront, and $A(x^{\prime},y^{\prime})$ shows the aperture function. Therefore, the convolution of PSF and an aberration-free image can generate the turbulent intensity images as:

(10)$${I_2} = {I_1} \otimes PSF,$$

where ${\otimes} $ indicates the convolution operation, and ${I_1}$ and ${I_2}$ represent the aberration-free and aberrated images at the focus, respectively. So far, one category of the input data for the CNN model could be generated. On the other hand, because of the phase diversity of turbulent wavefronts, the uniqueness of distorted phases for in-focus images cannot be promised. Thus, another category of images is needed. When a simulated wavefront with its specific Zernike terms and its corresponding in-focus image are available, the out of focus image could be simulated by adding the defocus term of the Zernike polynomial which is defined as $\,{Z_4} = \sqrt 3 (2{\rho ^2} - 1)$, to the simulated wavefront of the in-focus situation. In the simulation, it is equivalent to shift the focal plane as shown in Fig. (2). In this work, the defocus length equal to $\lambda /4$ is chosen. Therefore, we could generate thousands of two categories of images as input data to the CNN.

It should be emphasized that the Zernike coefficients corresponding to any pair of in-focus and out of focus images are needed in our work to feed the CNN model as output. On the other hand, the Zernike coefficients of any arbitrary wavefront which can be generated by any power spectrum model can be obtained. In continuous space, this can be done using the following equation:

(11)$${a_j} = \frac{{\int\limits_0^{2\pi } {\int\limits_0^1 {W(r,\theta )\,{Z_j}(r,\theta )\,r\,dr\,d\theta } } }}{{\int\limits_0^{2\pi } {\int\limits_0^1 {Z_j^2(r,\theta )\,r\,dr\,d\theta } } }}.\,\,$$

In discrete space, this could be done by using the Moore-Penrose method. To do this, we first convert Zernike polynomials up to a desired number of terms in column forms and put them together to form the Z matrix as [40]:

(12)$$Z = [{Z_1}\,\,{Z_2}\,\,\ldots \,\,{Z_n}].$$

To extract the Zernike coefficients related to any wavefront, the matrix multiplication can be used as:

(13)$$A = {({Z^T}Z)^{ - 1}}{Z^T}W,$$

where A is a one dimensional array of Zernike coefficients, ${Z^T}$ is the transpose of Z, and W is the wavefront.

3.2. Deep learning model

In color images with three RGB channels, there are usually three input channels for a typical CNN network. In our work, the images are generated in gray scale. The dimensions of the input images to the network are 128 × 128. In the first step, the images are entered into a convolution layer in which the size of each convolution filter is 11 × 11. In this layer, 64 filters are defined whose weights are updated during the learning process. The outputs of the convolution process are also called feature maps, because the features are extracted based on the filter weights. As an example, 64 feature maps corresponding to the 64 filters in first layer are shown in Fig. (3). These feature maps and filter weights are achieved after 100 epochs of learning. The extraction of large-scale or small-scale features is in association with the dimension of the convolution filters. Therefore, we considered five convolution layers with different size of filters. After passing the first layer, a max pooling layer is applied to reduce the size of the images. The output is given to the next layer which is a batch normalization layer. This layer normalizes the output of the previous layer to help the algorithm to overcome the overfitting. The output of the batch normalization is defined as:

(14)$${y_j} = \alpha \left( {\frac{{{x_j} - {x_m}}}{{\sqrt {\sigma_m^2} }}} \right) + \beta \,,$$

in which, $\alpha$ and $\beta$ are two trainable parameters that are updated during the learning process, and ${x_m}$ and $\sigma _m^2$ are mean and variance of the input to the j^th layer, respectively. The details of the CNN steps in our work is presented according to the Table (2) and the configured CNN is schematically shown in the Fig. (3).

Fig. 3. Steps of CNN including feature extraction, dense layers, and an output layer. The resulted weight maps for the first convolution layer containing 64 filters, and the feature maps related to this layer (after finishing the learning process) are shown.

Download Full Size | PDF

Table 2. Layers and parameters of the CNN configuration in this work

View Table | View all tables in this article

In Fig. (4) it can be seen that the value of cost function for the training data is decreasing continuously. Testing the weights for every epoch is done on the validation data. Although some small fluctuations are observed during updating the trainable parameters, it is decreasing as well. Generally, small fluctuations of validation can usually occur because of some reasons including inappropriate learning rate, not adjusted batch size, overfitting, etc. During learning process, the network experiences new validation data which may cause this kind of fluctuations. However, it is important that the trend of validation loss has a downward trend and fluctuations get smaller in new epochs which can be seen in Fig. (4).

Fig. 4. Training and validation loss function for 100 epochs.

Download Full Size | PDF

In supervised learning algorithms like CNN, two general sets of input and output data are given to the network. The images of the focused and defocused intensities are the 2D input data and up to 200 derived corresponding Zernike coefficients are considered as the output data (also called labels). Therefore, the cost function is optimized during the learning process based on the pairs of images and Zernike coefficients. After learning process, the trainable parameters including the filter weights, batch normalization parameters and weights of dense layers are determined. To evaluate the designed network the test data are used. It must be emphasized that the test data are not included in the training procedure. Figures 5(a) and 5(b) represent two examples of in-focus and out of focus intensity images chosen from the test dataset. These two images are given to the trained network to assess the network. The results of correcting these two images can be seen in Fig. 5(c) and 5(d).

Fig. 5. An example of the test data: (a) in-focus, and (b) out of focus images before correction. (c) in-focus, and (d) out of focus images after correction by the proposed CNN model.

Download Full Size | PDF

The valuable property of this network is the processing time to obtain Zernike coefficients which is a few milliseconds. It is straightforward to reconstruct the distorted wavefront, and the phase corrector unit in AO can be controlled to compensate for this distorted wavefront. In this work, implementation of learning process was carried out by using tensorflow package in Python 3.9 programming language with a Core-i7 CPU at 3.1 GHz and 32 GB DDR4 RAM on a personal computer. The training process for 100 epochs took 9 hours and it should be emphasized that it is done only one time. After training, the weights of the network can be saved and could be applied to any pair of turbulence-affected in-focus and out of focus images to extract the corresponding Zernike coefficients. Moreover, it should be said that the mean extraction time in our work was ∼40 ms for any pair of images. In comparison, the execution of iteration-based methods takes longer time. For instance, Wu et al. studied the genetic algorithm for wavefront compensation [41], and asserted that the execution time for the compensation is 7s approximately. As a similar work to that of ours method, Ma et al. investigated the application of CNN as a deep learning algorithm in sensor-less adaptive optics compensation numerically [30]. They could predicted the Zernike coefficients in 0.16 seconds for different strengths of turbulence based on images-based wavefront sensing up to 35 terms.

The corrected intensity patterns for both focused and defocused ones are shown in Figs. 5(c)–5(d). Also, the actual and predicted wavefronts corresponding to these intensities are presented in Fig. (6). The difference between these wavefronts, i.e. the remaining phase, is shown in Fig. (6). Moreover, the comparison of 77 Zernike coefficients, i.e. up to the order of n = 13 without piston term, for actual and predicted wavefronts is given in Fig. (7).

Fig. 6. The actual and predicted wavefronts related to the intensities in Fig. (5).

Download Full Size | PDF

Fig. 7. The actual and predicted 77 Zernike coefficients related to the intensities in Fig. (5).

Download Full Size | PDF

Wavefront error assessment can be used to check the accuracy of the CNN for all of the test data. This error is expressed by:

(15)$$\sigma _W^2 = \sum\limits_{j = 1}^n {a_j^2} \,,$$

where ${a_j}$ represents the j^th Zernike coefficient of the actual wavefront and $\sigma _W^2$ is the wavefront error. This could be calculated for the remaining wavefront after correction by:

(16)$$\sigma _{\Delta W}^2 = \sum\limits_{j = 1}^n {{{({a_j} - {{\hat{a}}_j})}^2},}$$

in which ${\hat{a}_j}$ represents the predicted Zernike coefficients and the ${\sigma _{\varDelta W}}$ shows the Zernike coefficient error which represents the deviation of remaining wavefront. If all of the deviations in a wavefront are ideally corrected, the wavefront error will be zero. However, some deviations remain after correction, hence the variance of the wavefront is non-zero practically. It is customary to write wavefront error as [42]:

(17)$${\sigma _\varphi } = \frac{{2\pi }}{\lambda }{\sigma _W}\,.$$

This error can be seen in Fig. (8), where the wavefront error is calculated for different turbulence strengths ($C_n^2$). The trend of these data points shows that the wavefronts are well corrected and it could be said that a robust network is designed.

Fig. 8. Before and after correction wavefront error for different turbulence strengths ($C^{2}_{n}$) for all of the test images in dataset.

Download Full Size | PDF

The Zernike polynomials in the usual form are denoted by two indices (n, m) as $Z_n^m(r,\theta )$, in which n is a non-negative integer number and m is defined as:

(18)$$m ={-} n, - n + 2,\ldots ,n,\,\,\textrm{where}\,\,n = 0,1,2,\ldots ,$$

The Zernike polynomials can also be expressed by one-indexed ${Z_j}(r,\theta )$ in which:

(19)$$j = \frac{{n(n + 2) + m}}{2}\,.$$

We tried to use an appropriate number of terms, so we can reconstruct wavefront with good accuracy. In this regard, we decided to choose n={5, 6, …, 13}, which in turn will result the maximum value for j={20, 27, 35, 44, 54, 65, 77, 90, 104}. These indices are given in Table (3). Using the proposed deep learning method, for the wavefront shown in Fig. (9), the reconstructed wavefronts are derived and shown in Fig. (10).

Fig. 9. A wavefront chosen from the test data in the dataset.

Download Full Size | PDF

Fig. 10. Prediction of the wavefront of Fig. (9).

Download Full Size | PDF

Table 3. Relationship between the indices of Zernike terms

View Table | View all tables in this article

As can be seen in Fig. (10), by comparing the predictions and their error values the prediction based on 20 terms is not reasonable and more terms can be considered. This process has progressed to the first 104 sentences. To answer the question: “how many terms should be considered in the procedure of prediction by the proposed deep learning model?”, evaluation of a criterion can be taken into account. In this work, the mean deviation of the residual wavefronts is reported as:

(20)$$\Delta {W_{mean}} = \,\frac{{\sqrt {{{\sum\limits_i {\sum\limits_j {({W_{i,j}^{\,\textrm{(ac)}} - W_{i,j}^{\,\textrm{(pr)}}} )} } }^2}} }}{n},$$

in which ${W^{\,\textrm{(ac)}}}$ and ${W^{\,\textrm{(pr)}}}$ are the actual and predicted wavefront matrices, respectively, and n is the number of matrix elements inside the phase mask ($r < 1$). Therefore the average residual wavefronts is calculated for different number of Zernike terms for all of the test data which is shown in Fig. (11) and presented in Table (4).

As can be seen in Fig. (11), and Table (4), the difference between the wave front deviation for the 65 and 77 terms is small. Considering 77 Zernike terms can cover more elements of the phase correction part in AO to correct the wavefront and provide the ability to better compensate the deviation, but according to the figure, by choosing 65 Zernike terms, the results of the proposed CNN is more stable, and the length of the error-bar for 65 is almost the same as 77.

Fig. 11. Average wavefront deviation and the error-bar arrows for different number of Zernike terms and for all of the test data, calculated based on the Eq. (20).

Download Full Size | PDF

Table 4. The average value of wavefront deviation for different terms

View Table | View all tables in this article

It should be mentioned that in AO, the number of corrected modes should be sufficient. As mentioned, in this work we considered a range of 20 to 104 modes. The proposed CNN showed that it is potent to consider more than 20 terms. On the other hand, considering too much Zernike terms increases the complexity of the deep learning algorithm and consequently, decreases its performance. Therefore, there is an optimum number of terms. We considered this range of terms and observed that the wavefront deviation increases after a certain number of terms that can be observed in Fig. (11).

4. Discussion

In this work, the application of machine learning to estimate the turbulent wavefront is investigated. In the past years, iterative algorithms were applied to this aim. But the challenge of real-time estimation, in addition to the wavefront sensing precision, remained a problem. Recently, methods based on artificial intelligence have been used to overcome this challenge. In this work, distorted wavefronts impacted by atmospheric turbulence were produced numerically. The modified Von-Karman power spectrum model and also the Fourier transform have been utilized to generate such wavefronts. Our dataset is comprised of 18000 images, where the parameters including atmospheric turbulence strength and inner and outer scale of turbulence were chosen variables during data generation. This could increase the potential of the CNN network to be as much generalized as possible. In the machine learning part of this work, a CNN model is designed and its parameters were adjusted. Intensity patterns from generated phase screens are considered as input data and Zernike coefficients related to each pair of images are considered as output data. After performing the learning steps, a network is built that can estimate the wavefronts instantaneously (about 40 ms). Also, all of the parameters related to the CNN model, including the number of layers, the size of the convolution filters, the size of the pooling layers, the batch normalization layers, and the number of neurons in the dense layers were adjusted. Through these steps, an efficient network is built to predict different Zernike terms for each pair of images with a high degree of accuracy. Also, the wavefront error values for 20, 27, 35, 44, 54, 65, 77, 90, and 104 number of terms were calculated and reported. Moreover, the wavefront deviation values before and after wavefront correction for 65 terms and all of the test data are reported.

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. D. Gratadour, E Gendron, and G. Rousset, “Intrinsic limitations of Shack–Hartmann wavefront sensing on an extended laser guide source,” J. Opt. Soc. Am. A 27(11), A171 (2010). [CrossRef]

2. MA Vorontsov, GW Carhart, M Cohen, and G. Cauwenberghs, “Adaptive optics based on analog parallel stochastic optimization: analysis and experimental demonstration,” J. Opt. Soc. Am. A 17(8), 1440 (2000). [CrossRef]

3. MA. Vorontsov, “Decoupled stochastic parallel gradient descent optimization for adaptive optics: integrated approach for wave-front sensor information fusion,” J. Opt. Soc. Am. A 19(2), 356 (2010). [CrossRef]

4. R Yazdani, M Hajimahmoodzadeh, and HR. Fallah, “Adaptive phase aberration correction based on imperialist competitive algorithm,” Appl. Opt. 53(1), 132 (2014). [CrossRef]

5. Z. Fayyaz, N. Mohammadian, F. Salimi, A. Fatima, M.R.R. Tabar, and M.R Avanaki, “Simulated annealing optimization in wavefront shaping controlled transmission,” Appl. Opt. 57(21), 6233–6242 (2018). [CrossRef]

6. A. Anand, G. Pedrini, W. Osten, and P. Almoro, “Wavefront sensing with random amplitude mask and phase retrieval,” Opt. Lett. 32(11), 1584–1586 (2007). [CrossRef]

7. P.F. Almoro and S.G. Hanson, “Random phase plate for wavefront sensing via phase retrieval and a volume speckle field,” Appl. Opt. 47(16), 2979–2987 (2008). [CrossRef]

8. B.Y. Wang, L. Han, Y. Yang, Q.Y. Yue, and C.S. Guo, “Wavefront sensing based on a spatial light modulator and incremental binary random sampling,” Opt. Lett. 42(3), 603–606 (2017). [CrossRef]

9. Z. Wang, G.X. Wei, X.L. Ge, H.Q. Liu, and B.Y. Wang, “High-resolution quantitative phase imaging based on a spatial light modulator and incremental binary random sampling,” Appl. Opt. 59(20), 6148–6154 (2020). [CrossRef]

10. G. Xie, Y. Ren, H. Huang, M.P. Lavery, N. Ahmed, Y. Yan, C. Bao, L. Li, Z. Zhao, Y. Cao, and M. Willner, “Phase correction for a distorted orbital angular momentum beam using a Zernike polynomials-based stochastic-parallel-gradient-descent algorithm,” Opt. Lett. 40(7), 1197–1200 (2015). [CrossRef]

11. C. Lu, Q. Tian, X. Xin, B. Liu, Q. Zhang, Y. Wang, F. Tian, L. Yang, and R. Gao, “Jointly recognizing OAM mode and compensating wavefront distortion using one convolutional neural network,” Opt. Express 28(25), 37936–37945 (2020). [CrossRef]

12. L. Zhu, X. Xin, H. Chang, X. Wang, Q. Tian, Q. Zhang, R. Gao, and B. Liu, “Security enhancement for adaptive optics aided longitudinal orbital angular momentum multiplexed underwater wireless communications,” Opt. Express 30(6), 9745–9772 (2022). [CrossRef]

13. L. Zhu, H. Yao, H. Chang, Q. Tian, Q. Zhang, X. Xin, and F.R. Yu, “Adaptive Optics for Orbital Angular Momentum-Based Internet of Underwater Things Applications,” IEEE Internet Things J. 9(23), 24281–24299 (2022). [CrossRef]

14. J Antonello, X Hao, ES Allgeyer, J Bewersdorf, J Rittscher, and MJ. Booth, “Sensorless adaptive optics for isoSTED nanoscopy,” In Adaptive Optics and Wavefront Control for Biological Systems IV2018 Feb 23 (Vol. 10502, pp. 11–16). SPIE.

15. DJ Wahl, R Ng, MJ Ju, Y Jian, and MV. Sarunic, “Sensorless adaptive optics multimodal en-face small animal retinal imaging,” Biomed. Opt. Express 10(1), 252 (2019). [CrossRef]

16. A Camino, P Zang, A Athwal, S Ni, Y Jia, D Huang, and Y. Jian, “Sensorless adaptive-optics optical coherence tomographic angiography,” Biomed. Opt. Express 11(7), 3952 (2020). [CrossRef]

17. X Wei, TT Hormel, S Pi, B Wang, and Y. Jia, “Rodent swept-source wide-field sensorless adaptive optics OCTA,” In Optical Coherence Tomography and Coherence Domain Optical Methods in Biomedicine XXVI2022 Mar 7 (p. PC1194805). SPIE.

18. D Borycki, K Liżewski, S Tomczewski, E Auksorius, P Węgrzyn, and M. Wojtkowski, “Sensorless adaptive optics and angiography in spatiotemporal optical coherence (STOC) retinal imaging,” In Ophthalmic Technologies XXXI2021 Mar 5 (Vol. 11623, p. 116230F). SPIE. [CrossRef]

19. RR Iyer, JE Sorrells, L Yang, EJ Chaney, DR Spillman, BE Tibble, CA Renteria, H Tu, M Žurauskas, M Marjanovic, and SA. Boppart, “Label-free metabolic and structural profiling of dynamic biological samples using multimodal optical microscopy with sensorless adaptive optics,” Sci. Rep. 12(1), 1–5 (2022). [CrossRef]

20. Q Hu, J Wang, J Antonello, M Hailstone, M Wincott, R Turcotte, D Gala, and MJ. Booth, “A universal framework for microscope sensorless adaptive optics: Generalized aberration representations,” APL Photonics 5(10), 100801 (2020). [CrossRef]

21. M Ren, J Chen, D Chen, and SC. Chen, “Aberration-free 3D imaging via DMD-based two-photon microscopy and sensorless adaptive optics,” Opt. Lett. 45(9), 2656 (2020). [CrossRef]

22. Y Liu and P. Kner, “Sensorless adaptive optics for light sheet microscopy,” In Adaptive Optics: Analysis, Methods & Systems2020 Jun 22 (pp. OF2B-2). Optical Society of America.

23. MJ. Booth, “A universal framework for sensorless adaptive optics in microscopes,” In Adaptive Optics and Wavefront Control for Biological Systems VII2021 Mar 5 (Vol. 11652, p. 116520B). SPIE.

24. Y Li, T Peng, W Li, H Han, and J. Ma, “Laser beam shaping based on wavefront sensorless adaptive optics with stochastic parallel gradient descent algorithm,” In 14th National Conference on Laser Technology and Optoelectronics (LTO 2019)2019 May 17 (Vol. 11170, pp. 846–851). SPIE.

25. L Rinaldi, V Michau, N Védrenne, C Petit, LM Mugnier, C Lim, J Montri, L Paillier, and M. Boutillier, “Sensorless adaptive optics for optical communications,” In Free-Space Laser Communications XXXIII2021 Mar 5 (Vol. 11678, pp. 164–169). SPIE.

26. E Durech, W Newberry, J Franke, and MV. Sarunic, “Wavefront sensor-less adaptive optics using deep reinforcement learning,” Biomed. Opt. Express 12(9), 5423 (2021). [CrossRef]

27. Q Tian, C Lu, B Liu, L Zhu, X Pan, Q Zhang, L Yang, F Tian, and X. Xin, “DNN-based aberration correction in a wavefront sensorless adaptive optics system,” Opt. Express 27(8), 10765 (2019). [CrossRef]

28. H Ke, B Xu, Z Xu, L Wen, P Yang, S Wang, and L. Dong, “Self-learning control for wavefront sensorless adaptive optics system through deep reinforcement learning,” Optik 178, 785–793 (2019). [CrossRef]

29. Y Jin, Y Zhang, L Hu, H Huang, Q Xu, X Zhu, L Huang, Y Zheng, HL Shen, W Gong, and K. Si, “Machine learning guided rapid focusing with sensor-less aberration corrections,” Opt. Express 26(23), 30162 (2018). [CrossRef]

30. H Ma, H Liu, Y Qiao, X Li, and W. Zhang, “Numerical study of adaptive optics compensation based on convolutional neural networks,” Opt. Commun. 433, 283–289 (2019). [CrossRef]

31. L.C Andrews and R. L. Phillips, Laser Beam Propagation Through Random Media, 2nd Ed., SPIE Press, Bellingham, WA (2005). [CrossRef]

32. B. M. Welsh, “A Fourier series based atmospheric phase screen generator for simulating anisoplanatic geometries and temporal evolution,” Proc. SPIE3125, pp. 327–338 (1997). [CrossRef]

33. R. G. Lane, A. Glindemann, and J. C. Dainty, “Simulation of a Kolmogorov phase screen,” Waves in Random Media 2(3), 209–224 (1992). [CrossRef]

34. Yann. LeCun, “Generalization and network design strategies,” Connectionism in perspective 19, 143–155 (1989).

35. DP Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014).

36. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation 9(8), 1735–1780 (1997). [CrossRef]

37. T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theory 13(1), 21–27 (1967). [CrossRef]

38. D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning representations by back-propagating errors,” Nature 323(6088), 533–536 (1986). [CrossRef]

39. DG. Voelz, “Computational fourier optics: a MATLAB tutorial,” PIE press, Bellingham, Washington, pp. 127 (2011).

40. J.D. Schmidt, Numerical simulation of optical wave propagation: With examples in MATLAB, pp. 71–72, (SPIE, 2010).

41. D. Wu, J. Luo, Z. Li, and Y. Shen, “A thorough study on genetic algorithms in feedback-based wavefront shaping,” J. Innovative Opt. Health Sci. 12(04), 1942004 (2019). [CrossRef]

42. M. Born and E. Wolf, Principles of Optics, 6th ed. (Pergatnon, New York,1980).

Parameter	Dimension	value
turbulence strength ( $C_{n}^{2}$ )	[m^-2/3 ]	10⁻¹⁶ $< C_{n}^{2} < 10^{- 12}$
inner scale ( $l_{0}$ )	[m]	0.001 $< l_{0} <$ 0.01
outer scale ( $L_{0}$ )	[m]	1 $< L_{0} <$ 100
wavelength ( $λ$ )	[µm]	2.2

layer	type	activation	input Size	filter size	stride	pad	kernels
1	Input	-	128 × 128 × 2	-	-	-	-
2	Convolution	ReLU	128 × 128 × 2	11 × 11	2	0	64
3	Max pooling	-	59 × 59 × 64	3 × 3	2	0	-
4	Batch Normalization	-	29 × 29 × 64	-	-	-	-
5	Convolution	ReLU	29 × 29 × 64	5 × 5	1	2	192
6	Max pooling	-	29 × 29 × 192	3 × 3	2	1	-
7	Batch Normalization	-	15 × 15 × 192	-	-	-	-
8	Convolution	ReLU	15 × 15 × 192	3 × 3	1	1	256
9	Convolution	ReLU	15 × 15 × 256	3 × 3	1	1	128
10	Convolution	ReLU	15 × 15 × 128	3 × 3	1	1	128
11	Max pooling	-	15 × 15 × 128	3 × 3	2	0	-
12	Dense		7 × 7 × 128	-	-	-	-
13	Dropout	-	6272 × 1	-	-	-	-
14	Dense		6272 × 1	-	-	-	-
15	Dropout	-	4096 × 1	-	-	-	-
16	Output	Softmax	77 × 1	-	-	-	-
Trainable parameters: 66,329,108

number of Zernike terms	20	27	35	44	54	65	77	90	104
wavefront deviation (rad)	1.51	1.36	1.31	1.29	1.28	1.27	1.27	1.32	1.33

Parameter	Dimension	value
turbulence strength ( $C_{n}^{2}$ )	[m^-2/3 ]	10⁻¹⁶ $< C_{n}^{2} < 10^{- 12}$
inner scale ( $l_{0}$ )	[m]	0.001 $< l_{0} <$ 0.01
outer scale ( $L_{0}$ )	[m]	1 $< L_{0} <$ 100
wavelength ( $λ$ )	[µm]	2.2

layer	type	activation	input Size	filter size	stride	pad	kernels
1	Input	-	128 × 128 × 2	-	-	-	-
2	Convolution	ReLU	128 × 128 × 2	11 × 11	2	0	64
3	Max pooling	-	59 × 59 × 64	3 × 3	2	0	-
4	Batch Normalization	-	29 × 29 × 64	-	-	-	-
5	Convolution	ReLU	29 × 29 × 64	5 × 5	1	2	192
6	Max pooling	-	29 × 29 × 192	3 × 3	2	1	-
7	Batch Normalization	-	15 × 15 × 192	-	-	-	-
8	Convolution	ReLU	15 × 15 × 192	3 × 3	1	1	256
9	Convolution	ReLU	15 × 15 × 256	3 × 3	1	1	128
10	Convolution	ReLU	15 × 15 × 128	3 × 3	1	1	128
11	Max pooling	-	15 × 15 × 128	3 × 3	2	0	-
12	Dense		7 × 7 × 128	-	-	-	-
13	Dropout	-	6272 × 1	-	-	-	-
14	Dense		6272 × 1	-	-	-	-
15	Dropout	-	4096 × 1	-	-	-	-
16	Output	Softmax	77 × 1	-	-	-	-
Trainable parameters: 66,329,108

Using a deep learning algorithm in image-based wavefront sensing: determining the optimum number of Zernike terms

Abstract

1. Introduction

2. Theory

2.1. Wavefront generation

2.2. CNN

3. Results

3.1. Data generation

3.2. Deep learning model

4. Discussion

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (4)

Equations (20)

Optics Continuum

Morteza Hajimahmoodzadeh	https://orcid.org/0000-0002-1915-1322
Hamidreza Fallah	https://orcid.org/0000-0003-4696-211X

n	m	j
0	0	0
1	-1	1
1	+1	2
2	-2	3
	0	4
	+2	5
…	…	…
5	…	20
…	…	…
13	…	104

n	m	j
0	0	0
1	-1	1
1	+1	2
2	-2	3
	0	4
	+2	5
…	…	…
5	…	20
…	…	…
13	…	104