Generalization of learned Fourier-based phase-diversity wavefront sensing

Zhisheng Zhou; Qiang Fu; Qiang Fu; Jingang Zhang; Jingang Zhang; Yunfeng Nie

doi:10.1364/OE.484057

1. Introduction

Phase diversity wavefront sensing (PDWS) is an image-based technique that infers the wavefront from only a few intensity measurements [1,2]. It defines a model-based error metric [3] that describes the differences between the observations and predictions, then recovers the wavefront phase by minimizing the error metric using nonlinear optimization. Gradient descent optimization algorithms, e.g., steepest descent (SD), conjugate-gradient (CG), or Limited-Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) have been extensively used to minimize the error metric [4]. However, the iterative optimization process is highly time-consuming. Moreover, because the optimization is complex and non-convex, gradient descent algorithms can easily get stuck in a local minimum or a saddle point far from the true solution. Recently, artificial neural networks have been successfully applied to image-based wavefront sensing [5–15]. Due to the one-to-one trained correspondence between input images and output phases, neural networks can directly inference the unknown wavefront, which avoids time-consuming iterative optimization and the local minimum problem. Over the past few years, neural networks have offered new insights into the image-based wavefront sensing problem. Nevertheless, there is still insufficient evidence from statistical experimental results to support that neural networks can provide accuracy comparable to model-based methods. In addition, to the best of our knowledge, very few studies have reported wavefront sensing methods that rely solely on neural networks that can be well generalized to deal with testing data that significantly differ from training data.

Combining traditional model-based methods with neural networks and fully exploiting their respective advantages is an approach to finding a better solution. Since the initial estimate significantly impacts the performance of gradient descent algorithms [16] and neural networks can quickly give a prediction close to the true solution, it is promising to employ a neural network to predict well-behaved initial estimates for the gradient descent algorithms. In prior works [17,18], a convolutional neural network (CNN) with a modified Inception v3 architecture has been trained to predict Zernike coefficients as the initial estimate for the L-BFGS algorithm. However, CNN training is costly. It demands tens of thousands of accurate training samples and requires a long training time and high computational budget. To overcome these problems, Zhou et al. proposed an approach that utilizes a back-propagation neural network using low-frequency Fourier coefficients of images as the input to generate the initial estimate for the L-BFGS algorithm [19]. The network can be trained with as few as 1000 training samples and as little as about one minute of training time. Furthermore, since the differences between the low-frequency Fourier coefficients of simulated and real images are small, a network trained with simulated data can still preserve satisfactory accuracy when dealing with real-world data. This greatly relieves the difficulty of collecting massive amounts of accurate real-world training data. Statistical results demonstrate that the method works well in terms of accuracy and processing speed.

However, the benefits of using only a few training samples come at the cost of weak generalization. The network is highly dependent on the training settings. When a network is trained with simulated images with a specific imaging object and specific optical system parameters, the network performance usually degrades when dealing with images of other imaging objects and optical system parameters. To guarantee the best performance, we have to simulate images based on the actual settings and train the network each time for a new setting. This time-consuming and laborious process significantly reduces its practical value.

In this paper, we develop a method based on the Fourier-based network by significantly improving the generalization. A network that does not depend on the imaging object is trained with the ratios of the low-frequency Fourier coefficients, instead of the coefficients themselves. In addition, an image processing procedure that removes the optical system parameter dependency is adopted to further generalize the method. The proposed method is independent of both the imaging object and optical system parameters. Therefore, a network trained with images of a specific setting can be directly applied to images with other settings without retraining or parameter adjustments. The method also inherits the advantages of the Fourier-based neural network such that only a few simulated images and a short time are enough to train the network.

2. Generalized Fourier-based PDWS

2.1 Object-independent Fourier-based neural network

An intensity image capture process can be modeled by a multiplication operation in the Fourier domain

(1) $$I_1 = OH_1,$$

where $I_1$ is the Fourier transform of the captured image, $O$ is the Fourier tranform of the imaging object and $H_1$ is the Fourier transform of the intensity point spread function (PSF). The intensity PSF is the squared magnitude of the impulse response function,

(2)$$h_1 ={\mid} \mathcal{F} \left\{ \mathcal{P} \exp{\left[j\phi\right]} \right\} \mid ^2,$$

where $\mathcal {P}$ is the binary pupil amplitude, the $\mathcal {F}$ operator denotes the 2D Fourier transform, and $\phi$ is the unknown wavefront phase that is often described by Zernike polynomials and corresponding coefficients [20]. Adding an extra known phase diversity to the imaging channel changes the intensity PSF to

(3)$$h_2 ={\mid} \mathcal{F} \left\{ \mathcal{P} \exp{\left[ j \left(\phi+\theta\right)\right]} \right\} \mid ^2,$$

where $\theta$ represents the phase diversity. Similarly, the image capture process corresponding to $h_2$ is modeled as

(4) $$I_2 = OH_2,$$

where $I_2$ and $H_2$ are the Fourier transform of the captured image and $h_2$, respectively.

We define the Fourier ratio as

(5)$$r = \frac{I_2}{I_1} = \frac{\mathcal{F} \left\{{\mid} \mathcal{F} \left\{ \mathcal{P} \exp{\left[ j \left(\phi+\theta\right)\right]} \right\} \mid ^2\right\}}{\mathcal{F} \left\{{\mid} \mathcal{F} \left\{ \mathcal{P} \exp{\left[j\phi\right]} \right\} \mid ^2\right\}}.$$

The ratio $r$ has nothing to do with the imaging object, but only depends on the unknown wavefront phase and the phase diversity. However, it could become unstable when $\mid I_1 \mid$ approaches zero. We define two alternative ratios,

(6)$$r_1 = \frac{I_1}{I_1+I_2},$$

and

(7)$$r_2 = \frac{I_2}{I_1+I_2},$$

which are more stable than the ratio $r$ while providing identical information. Note that these alternative ratios can easily be extended to PDWS with more phase diversities. For example, for a method that uses three phase diversities from two symmetrically defocused images and a focused image [21], the ratios become

(8)$$r_i = \frac{I_i}{I_1+I_2+I_3}, \quad i=1, 2, 3.$$

Inspired by the low-frequency Fourier-based method [19], we employ only the low-frequency Fourier ratios to represent diversity images. Here, the low-frequency components are those in a region around the origin in the Fourier domain. For example, the values within a small circle centered at the origin in the Fourier domain represent the low-frequency components with a radial apodized frequency determined by the circle radius. It is worth noting that using low-frequency ratios can further eliminate the possibility of the ratio denominator approaching zero, making the ratios more stable. A back-propagation neural network [22] is used to learn the mapping relationship between the low-frequency Fourier ratios and their associated aberration coefficients. Since the ratios are complex numbers, we separate the real and imaginary parts of the ratios and concatenate them to form a real-number input vector. Once the network has been trained, it can predict the unknown aberration coefficients directly from the input vector.

Because the Fourier ratios are object-independent, the object-dependency of the neural network is eliminated. The network trained with images of a specific imaging object is still useful for images of other imaging objects. A network trained with images of a point source, for example, can be applied to images of an extended source.

2.2 System-independent image processing

Another important factor that should be considered is the network dependency on optical system parameters. Two different optical systems with the same aberration coefficients will very likely generate different images. If a network is trained with the images of a specific optical system, it is unlikely that it can be applied to test images of another system. This question is primarily concerned with the generalization ability of the network.

To explain this, we generate and evaluate two image datasets by using the same aberration coefficients for two different optical system settings. Table 1 details the settings, including the optical system key parameters such as pupil diameter ($D$), pupil-imaging plane distance ($l$), wavelength ($\lambda$), camera pixel size ($\Delta x$), and image pixel resolution ($N \times N$). A circular source with a 10 $\mu m$ diameter is used as the imaging object. For each setting, a focused image and a defocused image with the defocus diversity of 1.0 $\lambda$ peak-valley (PV) value are generated.

Table 1. Optical system settings used for image generation.

View Table | View all tables in this article

We calculate and compare the low-frequency Fourier ratios $r_1$ and $r_2$ between the two systems. Figure 1(a) and (b) show the focused images of the first and second systems, respectively. It demonstrates that images acquired by different optical systems, even when subjected to the same wavefront aberration and imaging object, might be different in shape and size. This difference leads to an incompatibility between the low-frequency Fourier ratios. For example, Fig. 1(c) plots the absolute value of the low-frequency Fourier ratios $r_1$ and $r_2$ of images from both the first (solid red line, circle symbol) and second systems (solid blue line, triangle symbol). The low-frequency Fourier ratios of images from the two systems are so different that there is no doubt that a network trained with images from one system will perform poorly when applied to images from the other.

Fig. 1. Motivation of system-independent image processing. We generate simulated images with the same wavefront aberration on two different optical systems. The focused images of the two settings are shown in (a) and (b), respectively. (c) and (d) plot the absolute values of low-frequency Fourier ratios $r_1$ and $r_2$, respectively.

Download Full Size | PDF

According to the imaging theory in Fourier optics [23], the Fourier transform of the generalized pupil function gives the impulse response function

(9)$$\tilde{h} (x,y) = \frac{U}{\lambda l}\iint \mathcal{P} (\tilde{x},\tilde{y}) \exp{\left[{-}j\frac{2\pi}{\lambda l}\left(x\tilde{x}+y\tilde{y}\right)\right]}d\tilde{x}d\tilde{y},$$

where $U$ and $\lambda$ are the constant amplitude and wavelength, respectively. $l$ denotes the distance between the pupil plane and the imaging plane. $(x, y)$ and $(\tilde {x},\tilde {y})$ are coordinates in the imaging and pupil planes, respectively. When we sample $\tilde {h}$ and $\mathcal {P}$ in $M \times N$ data points and rewrite the equation in the discrete Fourier transform form, the coordinate sampling intervals can be correlated as

(10)$$\Delta \tilde{x}=\frac{\lambda l}{M \Delta x}, \Delta \tilde{y}=\frac{\lambda l}{N \Delta y}.$$

Let the pupil diameter to be

(11)$$D=K_x \Delta \tilde{x}, D=K_y \Delta \tilde{y}.$$

Combining Eq. (10) and Eq. (11), $K_x$ and $K_y$ then become

(12)$$K_x= \frac{MD \Delta x}{\lambda l}, K_y= \frac{ND \Delta y}{\lambda l},$$

where $K_x$ and $K_y$ denote the pupil diameter in Fourier frequency units.

In order to apply a network trained with images from one system to images from another, the images from both systems should meet the following conditions: (1) The Fourier point numbers $(M, N)$ are equal, so the Fourier ratios are calculated at the same Fourier points. (2) The pupil diameters in Fourier frequency units are equal, so the generalized pupils are within the same region in the frequency coordinate. (3) The effect of system magnification should be removed.

We propose a simple yet effective procedure to process images so that the images meet the above requirements. Suppose an optical system is with a pupil diameter $D_0$, wavelength $\lambda _0$, pupil-imaging plane distance $l_0$, camera pixel size $\Delta x_0$, and magnification $M_{a_0}$. Images of $N_0\times N_0$ pixels from the system are used to train a network. For images of $N \times N$ pixels from another system with a pupil diameter $D$, wavelength $\lambda$, pupil-imaging plane distance $l$, camera pixel size $\Delta x$, and magnification $M_a$, the following steps are taken. The first step is to calculate the scaling factor as

(13)$$s= \frac{\lambda_0 l_0 D \Delta xM_{a_0}}{\lambda l D_0 \Delta x_0 M_a}.$$

Secondly, we perform image scaling by enlarging ($s>1$) or shrinking ($s<1$) the images of $N \times N$ pixels to $sN \times sN$ pixels using bilinear interpolation [24]. As a third step, the image truncation is implemented by truncating the scaled images to $N_0 \times N_0$ pixels with the region of interest in the middle. Note that all images should be truncated from the same location.

With this image processing, the processed images can meet the above-mentioned requirements, and the low-frequency Fourier ratios of the processed images can be sent directly to the network. We simply verify this by using the simulated images. We choose the images created with the first setting to be the training images and perform a process on the images created with the second setting. We again calculate the low-frequency Fourier ratios $r_1$ and $r_2$. Figure 1(c) plots the calculation result (dashed yellow line, square symbol). The result shows that the low-frequency Fourier ratios $r_1$ and $r_2$ of the processed images coincide with those of the training images well.

This image processing procedure helps to enhance the generalization ability of the Fourier-based network by reducing the optical system parameter dependency, i.e., the network trained with images from a specific optical system can be applied to images from other systems. However, it is necessary to point out that this image processing procedure is not the only way to reduce system dependency. Other methods, such as calculating low-frequency Fourier ratios at desired points using discrete integration, may also work. Nevertheless, our proposed processing is more straightforward and much faster.

2.3 Phase retrieval pipeline

Combining the ideas in Sections $2.1$ and $2.2$, we propose a generalized Fourier-based method for PDWS. The method reduces the dependencies on the imaging object and optical system parameters and provides an effective way to find the solution in PDWS.

Figure 2 depicts an illustrative schematic diagram of the network training process. The simulation setting, including the imaging object, optical system parameters, phase diversities, and wavefront range, should be carefully designed. A series of aberration coefficients are randomly generated, and diversity images are simulated based on the setting and aberrations. For each aberration, the low-frequency Fourier ratios of diversity images make up an input vector, while the corresponding aberration coefficients make up an output vector. All the input-output vector pairs form a training set, and the network is trained with the training set. Figure 2 also depicts the aberration inference process schematically. The recorded diversity images are first processed according to the system-independent processing procedure. Then the low-frequency Fourier ratio vector of the processed images is calculated. The ratio vector is sent to the network as input, and the network gives out a prediction. Finally, the nonlinear solver recovers the aberration using the network prediction as the initial estimate.

Fig. 2. Schematic illustration of the working procedure of the proposed method.

Download Full Size | PDF

3. Simulation

3.1 Simulation settings

Five imaging objects are employed: a circular source with a diameter of 10 $\mu m$, a source with four small squares, an annular source with an outer diameter of 100 $\mu m$, a source with three English words, and a source with an image of trees, labeled as $c1$, $s1$, $a1$, $w1$, and $t1$, respectively. Figure 3 shows the imaging objects ($s1$, $a1$, $w1$, and $t1$). We perform a numerical simulation to test the method. Nine sets of simulation settings are utilized. Table 2 details the settings, including the imaging object and the optical system parameters. The magnification of all settings is set to be one.

Fig. 3. Imaging objects. (a) Square object. (b) Annular object. (c) Word object. (d) Tree object.

Download Full Size | PDF

Table 2. Simulation settings used for image generation.

View Table | View all tables in this article

Zernike coefficients up to the 15th term are generated using the MATLAB random number generator, and the RMS wavefront errors of the combined aberrations are randomly distributed in [0.2 $\lambda$, 0.4 $\lambda$]. A total of 3000 sets of aberration coefficients are generated, including 2000 sets for training and 1000 sets for testing. A focused image and two defocused images with defocus diversities with -1.0 $\lambda$, and 1.0 $\lambda$ PV values are simulated for each training aberration. However, considering the defocus errors, for each testing aberration, the PV values of the defocus diversities are set to be -0.9 $\lambda$, and 0.9 $\lambda$, respectively. To simulate the noise condition, an additive Gaussian noise at a signal-to-noise level of 40 dB is introduced to all simulated images. Figure 4 shows an example of the simulated focused images of all settings with the same aberration. It demonstrates that the sizes and shapes of the image distributions vary dramatically depending on the settings.

Fig. 4. An example of the simulated images of all settings with the same aberration.

Download Full Size | PDF

3.2 Simulation results

A fully connected multi-layer perceptron (MLP) that consists of an input layer, a hidden layer, and an output layer is used as the network architecture [19]. The hidden layer contains 25 neurons. The network is trained only with simulation setting A1 using the 2000 training aberrations. The low-frequency Fourier ratios of the diversity images for each training aberration make up an input vector, while the aberration coefficients make up an output vector. The low-frequency zone is defined as an area inside a square centered at the origin with a side length of three frequency units. The samples are randomly divided into training, validation, and test sets, with ratios of $70\%$, $15\%$, and $15\%$, respectively. The Levenberg-Marquardt back-propagation algorithm is used to train the network. The aberration coefficient training RMS error is $0.024\lambda$. The training time is about 7 $min$ on a laptop with an AMD Ryzen 9 5900HS Processor (3.30GHz, 48GB RAM) without GPU acceleration.

We recover the 1000 testing aberrations for all settings using the proposed method with the trained network. The test images are first processed, then the low-frequency Fourier ratios of the processed images are calculated and sent to the network to produce a prediction. The prediction is used as the initial estimate for the L-BFGS algorithm to recover the final aberration. We calculate the RMS residual error between the recovered and actual wavefronts for each testing aberration and statistically calculate the mean RMS residual error and the percentage of RMS residual errors that are less than 0.05 $\lambda$ for all testing aberrations. Table 3 shows the statistical calculation results. The results of two other methods, i.e., the classic PDWS with the L-BFGS algorithm [25] and the PDWS with the Fourier-based network [19], are also presented for comparison. Note that in the method with the Fourier-based network, we similarly use the network trained with only simulation setting A1. In the case of training conditions (setting A1), both the Fourier-based network and the proposed method perform well in accuracy and processing speed. However, in the case of different imaging objects (settings A2 and A3), the performance of the Fourier-based network decreases significantly while the proposed method preserves high accuracy, demonstrating high resistance to changes in imaging object. In the case of different optical parameters (settings B1, B2, and B3), the performance of the Fourier-based network also degrades, whereas the proposed method maintains high accuracy, demonstrating high resistance to changes in optical parameters. In the case of combined changes (settings C1, C2, and C3), the proposed method outperforms the Fourier-based network in accuracy and processing speed. The results show that the proposed method greatly enhances the generalization ability.

Table 3. Statistical results of different PDWS methods on $1000$ simulated aberrations.

View Table | View all tables in this article

4. Experiments

4.1 Experimental setup

We also build an experimental setup to test the method. Figure 5 shows a schematic diagram of the optical setup used in the experiment. A laser module (CPS635R, Thorlabs. 635 $nm$, 2.9 $mm$ beam diameter) is used as the light source. The output beam is focused by a plano-convex lens (LA1951-A, Thorlabs. $f=25.4$ $mm$), and the focused beam then is collimated by another plano-convex lens (LA1509-A, Thorlabs. $f=100$ $mm$). A piezoelectric deformable mirror (DMP40/M-P01, Thorlabs. $10$ $mm$ pupil diameter) is used for wavefront modulation. The reflective modulated light beam is split into two parts by a non-polarizing beamsplitter cube. The transmitted beam is captured directly by a Shack-Hartmann wavefront sensor (WFS40-7AR/M, Thorlabs. AR-coated $400-900$ $nm$, round microlenses, pitch 150 $\mu m$). The reflective beam is focused by an achromatic doublet (GCL-010604, Daheng Optics. $f=100$ $mm$) and recorded by a CMOS camera (acA$3088-57 um$, Basler. 12-bit grey levels, $3088 \times 2064$ pixels and 2.4 $\mu m$ pixel size). The camera is placed at the focal plane of the achromatic doublet. A ring-actuated iris diaphragm adjusts the beam aperture and a turning mirror mount adjusts the beam direction so that the beam synchronizes with the wavefront sensor. The piezoelectric deformable mirror changes the mirror surface shape and modulates the wavefront phase depending on the applied voltage pattern. The wavefront sensor provides an accurate measurement of the wavefront aberration and reconstructs the wavefront using the highest Zernike order 4 utilizing a total number of $15$ Zernike terms. By randomly but carefully selecting the applied voltage patterns, the RMS wavefront errors of the measured wavefronts are bounded in [0.2 $\lambda$, 0.4 $\lambda$]. For each measured aberration, the wavefront sensor records the Zernike coefficients as the ground truth, and the camera records the corresponding intensity image. Besides, two defocus diversities of 1.0 $\lambda$ and -1.0 $\lambda$ PV values are separately added to the aberration using the deformable mirror to generate the second and third observations. In the experiment, we change the CMOS camera (acA$2040-120 um$, Basler. 12-bit grey levels, $2048 \times 1536$ pixels and 3.45 $\mu m$ pixel size) to generate the second system setting. Besides, we change the light source to an LED (LED450L, Thorlabs. 450 $nm$, 7 $mW$ power) and add two different pinholes in the light path separately, namely a square pinhole (S100QK, Thorlabs. 100 $\mu m$ square) and a circular pinhole (P200K, Thorlabs. 200 $\mu m$ pinhole diameter), to generate the third and fourth system settings. In this case, a ground glass diffuser (DW110-120D, LBTEK. $120$ grit) is put behind the LED to produce scattering light and a bandpass filter (MBF450-10, LBTEK. 450 $nm$, 10 $nm$ FWHM bandwidth) is used to produce quasi-monochromatic light. Figure 5 also depicts these optics in the dotted boxes.

Table 4 lists the optics devices in the four settings. For each setting, a total of 1000 aberrations are measured and recorded. All recorded images are truncated to 121 $\times$ 121 pixels. An example of the images of all settings with the same aberration are also shown in Table 4. Note that due to the effect of the glass diffuser, the light fields are speckled randomly inside the pinholes, acting as extended objects. In addition, considering the residual aberration between the wavefront sensor path and the CMOS camera path, an aberration close to a blank is measured by the wavefront sensor as well as the PDWS solver, respectively, and the residual error between the two measurements is treated as the background to be subtracted in subsequent phase retrieval.

Fig. 5. The schematic illustration of the experimental setup. Only when the LED is used as the light source are the optics in the dotted boxes used.

Download Full Size | PDF

Table 4. Optics devices used in different experimental settings.

View Table | View all tables in this article

4.2 Experimental results

Table 5 statistically compares the performance for 1000 experimental aberrations of the proposed method against some other methods, i.e., the classic PDWS with the L-BFGS algorithm [25], the PDWS with the L-BFGS algorithm and multiple starting points selected randomly [26], the PDWS that combines the L-BFGS algorithm and the Fourier-based network [19], in terms of accuracy and processing speed.

Table 5. Statistical results of different PDWS methods on $1000$ experimental aberrations.

View Table | View all tables in this article

In the classic method, the zero point is the only starting point. In the method with multiple starting points, the zero point and $19$ other random points act as the starting points. The RMS wavefront error of the combined aberration is within [0.2 $\lambda$, 0.4 $\lambda$] for each random starting point. Compared with the classic method, the multiple starting point method dramatically improves accuracy but at a high cost in time increments.

In the method with the Fourier-based network, for each system setting, we train a network based on the system setting independently and use the trained network to generate initial estimates. This method achieves high accuracy comparable to the multiple starting point method while processing at a much faster speed. However, when we use the Fourier-based network trained with simulation setting A1 mentioned in Section 3.2, the results (shown in parentheses) suffer greatly in terms of accuracy and processing speed. This demonstrates again the network’s poor generalization ability.

In the proposed method, the network trained with simulation setting A1 mentioned in Section 3.2 is used. The method achieves high accuracy and fast processing speed that are comparable to the Fourier-based network. Note that simulation setting A1 is different from the experimental settings. The fact that one network trained with a different setting works well in experimental setups shows that the proposed method has a much superior generalization ability. Moreover, the image processing step has little impact on the processing time. For example, the average processing time is about $0.33$ sec for setting A, which includes $0.03$ sec for image processing, $0.06$ sec for network prediction, and $0.24$ sec for phase retrieval. The mean RMS residual errors for settings A, B, C, and D are 0.032 $\lambda$, 0.039 $\lambda$, 0.035 $\lambda$, and 0.037 $\lambda$, respectively, and $98.9\%$ of the RMS residual errors are less than 0.05 $\lambda$.

5. Discussion

The proposed method has several benefits. The training is simple, low-cost, and fast; the network is object-independent; the method is system-independent; the phase retrieval is accurate and fast. Only a single network trained with inexpensive simulated data is required, and this network then can be applied to real-world data regardless of the settings. The utility of this approach is validated by both the simulation and experimental results. However, a number of key factors deserve a thorough analysis, as discussed below.

5.1 Investigations on phase diversities and wavefront range

First, the network training accuracy is under the combined effect of two major factors: the diversities and the wavefront range. We perform a numerical simulation to evaluate the impact. We train the networks independently using simulated images under different diversities and wavefront range settings. Four different sets of diversities are selected, namely defocus diversities of $\left \{0, \lambda \right \}$, $\left \{ -\lambda, 0, \lambda \right \}$, $\left \{ -\lambda, -0.5\lambda, 0, 0.5\lambda, \lambda \right \}$, and $\left \{ -3\lambda, 0, 3\lambda \right \}$ PV values, labeled as $d_1$, $d_2$, $d_3$, and $d_4$, respectively. The wavefront errors are bound in sub-ranges as [0.2 $\lambda$, (k+0.2) $\lambda$], $k=0.1, 0.2,\ldots, 1.0$. Each network is trained under 2000 aberrations. Figure 6(a) depicts the aberration coefficient training RMS error. The result shows that the training error increases as the wavefront error range expands. However, the training error decreases as the number of diversity channels and the scale of diversities increase.

Fig. 6. Influences of the diversities and wavefront range on network training (a) and phase retrieval (b).

Download Full Size | PDF

Second, the phase retrieval performance is also greatly influenced by the diversities and wavefront range. Here, we refer to two meanings: the performance when handling the wavefront within the training range and the performance when extrapolating the prediction to an untrained, larger wavefront range. We use networks trained independently under different sub-ranges and test the networks using wavefronts from the overall range [0.2 $\lambda$, 1.2 $\lambda$]. Three different sub-ranges are selected, namely [0.2 $\lambda$, 0.4 $\lambda$], [0.2 $\lambda$, 0.8 $\lambda$], and [0.2 $\lambda$, 1.2 $\lambda$], labeled as $sr_1$, $sr_2$, and $sr_3$, respectively. The defocus diversities are $\left \{ -3\lambda, 0, 3\lambda \right \}$. Each network is trained under 2000 aberrations and is tested under 1000 aberrations to calculate the percentage of the RMS residual errors that are below 0.05 $\lambda$. Figure 6(b) shows the testing result. The result shows that the method performs well in the training range as well as a certain extrapolation range, but it fails in lots of cases when handling wavefronts that are far from the training range.

In addition, we have made preliminary attempts to improve the performance of our model in solving large-scale wavefronts. We found that increasing the number of neurons in the hidden layer, such as to 40, does not result in a statistically significant improvement in accuracy. We also tried using two hidden layers, increasing the number of training samples and the number of low-frequency Fourier coefficients to reduce the training error to some extent. However, this also leads to a noticeable increase in training time, and a trade-off must be reached between training time and accuracy. Further research is necessary to investigate the impact of different parameters on the range of wavefronts that can be solved and to optimize the parameter combination for solving large-scale wavefronts.

From the above results, the influence of the diversities and the wavefront range is complicated. The larger the wavefront range is, the less ideal the training and retrieval performance is. Besides, increasing the number of diversity channels and enlarging the scale of the diversities can improve performance. Since the method depends on the wavefront range, it is most suitable for applications where the wavefronts are moderate and the imaging settings are frequently different.

5.2 Investigation on extended scenes

In the simulation and experiment, only images of point and extended sources are used. What is the system performance when subjected to images of extended scenes? Here extended sources are objects that are spread out in space but have clear boundaries with the background, whereas extended scenes are objects that are spread out in space but do not have clear boundaries. Figures 7(a), (b), and (c) show simulated images of a point source, an extended source, and an extended scene with the same wavefront aberration, respectively. The images without aberration are also shown for comparison. Because of the aberration, the shapes and edges of the objects are distorted, as shown in Fig. 7(a) and (b). However, some distorted structures may fall outside the image and be absent in the image of an extended scene, as Fig. 7(c) shows. Since low-frequency Fourier components mainly represent the basic structures, the absence of distorted structures will inevitably affect network prediction and phase retrieval. To evaluate the effect, we apply the network trained with simulation setting A1 to images of the extended scene. The averaged RMS residual error for 1000 aberrations is 1.126 $\lambda$. We also use $20$ random starting points to retrieve the aberrations, and the averaged RMS residual error is 0.064 $\lambda$. Though the structure absence decreases the accuracy of both methods, the proposed method fails in most cases, revealing that the network trained with a point source is unsuitable for extended scenes. Then we train a network with images of the extended scene and apply the trained network to images of another extended scene (an example image is shown in Fig. 7(d)). The averaged RMS residual errors for 1000 aberrations are 0.056 $\lambda$ and 0.074 $\lambda$ for the proposed method and $20$ random starting points, respectively. The proposed method achieves accuracy higher than $20$ random starting points with much faster processing speed, demonstrating that the network trained with an extended scene can still be useful for images of other extended scenes. Therefore, for extended scene images, networks trained with extended scenes should be used to guarantee the best performance.

Fig. 7. Simulated images of different objects with the same aberration (top) and without aberration (bottom). (a) and (b) are images of a point source and an extended source. (c) and (d) show images of extended scenes.

Download Full Size | PDF

5.3 Comparison with other deep learning architectures

The proposed method aims to promote the generalization ability of the Fourier-based neural network for PDWS. It reminds us of deep learning, which claims to exhibit good generalization behaviors across various tasks [27]. Here we briefly compare the proposed network and two deep networks in terms of generalization on simulated data. The first deep network is a CNN based on the Inception v3 modified architecture [17], one of the most accurate CNNs for ImageNet classification. The second deep network is a long short-term memory (LSTM) network utilizing an object-independent image feature as input [8]. The deep networks are trained under simulation setting A1 and tested under all settings. The trained deep networks are then used to generate initial estimates for the L-BFGS algorithm. A total of 100,000 aberrations are used to train both deep networks, and 1000 aberrations are used to test the trained network for each setting. Table 6 statistically shows the aberration retrieval results. Because CNN can automatically extract features insensitive to image scaling, the trained CNN works well in settings A1, A3, B1, B2, B3, and C3, demonstrating high resistance to the setting changes. However, the performance decreases when encountering imaging objects that differ significantly from the training object (settings A2, C1, and C2). Besides, the CNN network structure is so complex that about $0.8$ sec is spent loading the network, considerably increasing the processing time. Due to the object-independent image features, the LSTM works well in settings A1, A2, and A3, demonstrating high resistance to the changes in imaging object. However, since system dependency has not been considered and removed in the feature extraction, the network performance decreases when encountering changes in optical system parameters.

Table 6. Performance comparison between deep learning methods and proposed method on the simulated aberrations.

View Table | View all tables in this article

6. Conclusion

In this paper, we develop a rapid Fourier-based neural network method with high accuracy and high generalization ability for phase diversity wavefront sensing. Using a back-propagation neural network based on the ratios of low-frequency Fourier coefficients and an image processing procedure, the method is independent of both the imaging object and optical system parameters. With only a small amount of simulated training samples and a short training time, the network can be well-trained and applied to different real-world images regardless of the imaging objects and optical systems. Combining with the L-BFGS algorithm, the method achieves successful performance with high accuracy, fast speed, and high generalization, which is validated statistically by simulation and experimental results. For four different experimental settings and 1000 aberrations with RMS wavefront errors restricted in [0.2 $\lambda$, 0.4 $\lambda$], the mean RMS residual errors are 0.032 $\lambda$, 0.039 $\lambda$, 0.035 $\lambda$, and 0.037 $\lambda$, respectively, and a total of $98.9\%$ of the RMS residual errors are less than 0.05 $\lambda$, achieving high accuracy comparable to the method with $20$ random starting points with a much faster processing speed. In addition, we investigate the effects of phase diversities and wavefront range on the performance of the proposed method. Given 1000 simulated aberrations, our method shows a better performance in terms of generalization in comparison with other methods based on deep neural networks. With the above-mentioned promising results, the developed method should be useful for various applications, such as optical system aberration measurements, optical system alignments, and wavefront sensing in atmospheric turbulence.

Funding

Beijing Municipal Natural Science Foundation (JQ22029); Informatization Plan of Chinese Academy of Sciences (CAS-WX2021-PY-0110); Shenzhen Public Technical Service Platform program (GGFW2018020618063670); Fonds Wetenschappelijk Onderzoek (1252722N); Equipment Research Program of the Chinese Academy of Sciences (Y70X25A1HY, YJKYYQ20180039).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. R. L. Kendrick, D. S. Acton, and A. Duncan, “Phase-diversity wave-front sensor for imaging systems,” Appl. Opt. 33(27), 6533–6546 (1994). [CrossRef]

2. P. M. Blanchard, D. J. Fisher, S. C. Woods, and A. H. Greenaway, “Phase-diversity wave-front sensing with a distorted diffraction grating,” Appl. Opt. 39(35), 6649–6655 (2000). [CrossRef]

3. R. G. Paxman, T. J. Schulz, and J. R. Fienup, “Joint estimation of object and aberrations by using phase diversity,” J. Opt. Soc. Am. A 9(7), 1072–1085 (1992). [CrossRef]

4. F. Li and C. Rao, “Algorithms for phase diversity wavefront sensing,” Proc. SPIE 7853, 78532D (2010). [CrossRef]

5. J. Dong, L. Valzania, A. Maillard, T.-A. Pham, S. Gigan, and M. Unser, “Phase retrieval: From computational imaging to machine learning: A tutorial,” IEEE Signal Process. Mag. 40(1), 45–57 (2023). [CrossRef]

6. H. Guo, Y. Xu, Q. Li, S. Du, D. He, Q. Wang, and Y. Huang, “Improved machine learning approach for wavefront sensing,” Sensors 19(16), 3533 (2019). [CrossRef]

7. Y. Nishizaki, M. Valdivia, R. Horisaki, K. Kitaguchi, M. Saito, J. Tanida, and E. Vera, “Deep learning wavefront sensing,” Opt. Express 27(1), 240–251 (2019). [CrossRef]

8. Q. Xin, G. Ju, C. Zhang, and S. Xu, “Object-independent image-based wavefront sensing approach using phase diversity images and deep learning,” Opt. Express 27(18), 26102–26119 (2019). [CrossRef]

9. E. Durech, W. Newberry, J. Franke, and M. V. Sarunic, “Wavefront sensor-less adaptive optics using deep reinforcement learning,” Biomed. Opt. Express 12(9), 5423–5438 (2021). [CrossRef]

10. E. Vera, F. Guzmán, and C. Weinberger, “Boosting the deep learning wavefront sensor for real-time applications,” Appl. Opt. 60(10), B119–B124 (2021). [CrossRef]

11. M. Wang, W. Guo, and X. Yuan, “Single-shot wavefront sensing with deep neural networks for free-space optical communications,” Opt. Express 29(3), 3465–3478 (2021). [CrossRef]

12. K. Wang, M. Zhang, J. Tang, L. Wang, L. Hu, X. Wu, W. Li, J. Di, G. Liu, and J. Zhao, “Deep learning wavefront sensing and aberration correction in atmospheric turbulence,” PhotoniX 2(1), 8–11 (2021). [CrossRef]

13. B. de Bruijne, G. Vdovin, and O. Soloviev, “Extended scene deep learning wavefront sensing,” J. Opt. Soc. Am. A 39(4), 621–627 (2022). [CrossRef]

14. Y. Li, D. Yue, and Y. He, “Prediction of wavefront distortion for wavefront sensorless adaptive optics based on deep learning,” Appl. Opt. 61(14), 4168–4176 (2022). [CrossRef]

15. H. F. Rajaoberison, J. S. Tang, and J. R. Fienup, “Machine learning wavefront sensing for the James Webb Space Telescope,” Proc. SPIE 12180, 154 (2022). [CrossRef]

16. S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv, arXiv:1609.04747 (2016). [CrossRef]

17. S. W. Paine and J. R. Fienup, “Machine learning for improved image-based wavefront sensing,” Opt. Lett. 43(6), 1235–1238 (2018). [CrossRef]

18. S. W. Paine and J. R. Fienup, “Machine learning for avoiding stagnation in image-based wavefront sensing,” Proc. SPIE 10980, 109800T (2019). [CrossRef]

19. Z. Zhou, J. Zhang, Q. Fu, and Y. Nie, “Phase-diversity wavefront sensing enhanced by a Fourier-based neural network,” Opt. Express 30(19), 34396–34410 (2022). [CrossRef]

20. L. N. Thibos, R. A. Applegate, J. T. Schwiegerling, and R. Webb, “Standards for reporting the optical aberrations of eyes,” J. Refract. Surg. 18(5), S652–S660 (2002). [CrossRef]

21. P. Zhang, C. Yang, Z. Xu, Z. Cao, Q. Mu, and L. Xuan, “High-accuracy wavefront sensing by phase diversity technique with bisymmetric defocuses diversity phase,” Sci. Rep. 7, 1–10 (2017). [CrossRef]

22. R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in Neural Networks for Perception, (Elsevier, 1992), pp. 65–93.

23. J. W. Goodman, Introduction to Fourier Optics (Roberts and Company Publishers, 2005).

24. V. Patel and K. Mistree, “A review on different image interpolation techniques for image enhancement,” Int. J. Emerging Technol. Adv. Eng. 3, 129–133 (2013).

25. C. R. Vogel, “A limited memory BFGS method for an inverse problem in atmospheric imaging,” Methods and Applications of Inversion pp. 292–304 (2000).

26. D. B. Moore and J. R. Fienup, “Extending the capture range of phase retrieval through random starting parameters,” in Frontiers in Optics (Optica Publishing Group, 2014), paper FTu2C–2.

27. B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “Exploring generalization in deep learning,” Adv. Neural Inf. Process. Syst.30, 1–10 (2017).

Setting	$D$ (mm)	$l$ (mm)	$λ$ (nm)	$Δ x$ $(μ m)$	$N \times N$
$# 1$	6	75	532	2.4	$121 \times 121$
$# 2$	5	100	635	3.45	$101 \times 101$

Setting	Classic method [25]			Fourier network [19]			Ours
	RMS error ( $λ$ )		Time	RMS error ( $λ$ )		Time	RMS error ( $λ$ )		Time
	Mean	$< 0.05$	(s)	Mean	$< 0.05$	(s)	Mean	$< 0.05$	(s)
A1	0.116	76.9%	0.67	0.025	99.9%	0.36	0.025	99.6%	0.36
A2	0.156	73.2%	0.71	2.512	0.2%	1.23	0.022	99.9%	0.39
A3	0.048	97.6%	0.42	2.360	0%	1.20	0.025	99.9%	0.31
B1	0.130	67.2%	1.15	0.157	73.3%	1.47	0.023	99.1%	0.65
B2	0.082	83.6%	2.29	0.058	91.1%	1.88	0.027	99.2%	1.17
B3	0.129	69.7%	1.80	0.114	74.1%	1.70	0.022	99.9%	0.85
C1	0.189	43.4%	1.10	0.726	17.9%	1.82	0.019	99.8%	0.63
C2	0.132	63%	2.17	0.107	73.5%	2.23	0.024	99.4%	1.20
C3	0.047	97.6%	1.36	0.316	63.3%	3.20	0.025	99.6%	1.06

Setting	Classic method [25]		Random points [26]		Fourier network [19]		Proposed method
Setting	RMS error	Time	RMS error	Time	RMS error	Time	RMS error	Time
	< 0.05 $λ$	(s)	< 0.05 $λ$	(s)	< 0.05 $λ$	(s)	< 0.05 $λ$	(s)
A	67.33%	0.59	99.4%	14.85	100% (50.85%)^a	0.30 (0.88)^a	99.2%	0.33
B	67.53%	0.60	99.6%	17.37	99.3% (47.95%)^a	0.32 (0.94)^a	97.6%	0.38
C	80.4%	0.54	99.9%	11.5	99.8% (87.21%)^a	0.35 (0.60)^a	99.7%	0.27
D	80.02%	0.44	100%	11.39	98% (42.06%)^a	0.41 (0.74)^a	99.1%	0.42

Setting	CNN [17]			LSTM [8]			Proposed method
	RMS error ( $λ$ )		Time	RMS error ( $λ$ )		Time	RMS error ( $λ$ )		Time
	Mean	$< 0.05$	(s)	Mean	$< 0.05$	(s)	Mean	$< 0.05$	(s)
A1	0.025	99.9%	1.12	0.025	99.7%	0.40	0.025	99.6%	0.36
A2	0.113	75.4%	1.68	0.023	99.8%	0.39	0.022	99.9%	0.39
A3	0.052	95.4%	1.44	0.025	99.8%	0.33	0.025	99.9%	0.31
B1	0.018	100%	1.42	0.049	89.7%	0.91	0.023	99.1%	0.65
B2	0.027	99.5%	2.08	0.043	93.1%	1.78	0.027	99.2%	1.17
B3	0.022	99.9%	1.84	0.075	81.4%	1.57	0.022	99.9%	0.85
C1	0.208	68.4%	2.03	0.080	82.8%	0.98	0.019	99.8%	0.63
C2	0.177	63.1%	3.90	0.077	79.5%	1.80	0.024	99.4%	1.20
C3	0.049	95.4%	2.57	0.024	99.4%	1.23	0.025	99.6%	1.06

Setting	$D$ (mm)	$l$ (mm)	$λ$ (nm)	$Δ x$ $(μ m)$	$N \times N$
$# 1$	6	75	532	2.4	$121 \times 121$
$# 2$	5	100	635	3.45	$101 \times 101$

Generalization of learned Fourier-based phase-diversity wavefront sensing

Abstract

1. Introduction

2. Generalized Fourier-based PDWS

2.1 Object-independent Fourier-based neural network

2.2 System-independent image processing

2.3 Phase retrieval pipeline

3. Simulation

3.1 Simulation settings

3.2 Simulation results

4. Experiments

4.1 Experimental setup

4.2 Experimental results

5. Discussion

5.1 Investigations on phase diversities and wavefront range

5.2 Investigation on extended scenes

5.3 Comparison with other deep learning architectures

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (6)

Equations (13)

Optics Express