Fast optimal wavefront reconstruction for multi-conjugate adaptive optics using the Fourier domain preconditioned conjugate gradient algorithm

Curtis R. Vogel; Qiang Yang

doi:10.1364/OE.14.007487

1. Introduction

Multi-conjugate adaptive optics (MCAO) [1, 2] refers to the use of several deformable mirrors (DMs), each conjugate to a different altitude, to correct for volume atmospheric turbulence and thereby increase the corrected field of view in an astronomical adaptive optics system. By wavefront reconstruction we mean the determination of DM actuator commands, given wavefront sensor signals. In this paper we assume a linear model for sensor signals as a function of the turbulence profile, we assume DM displacements depend linearly on actuator commands, and we employ an optimal, or minimum variance, approach to wavefront reconstruction [3]. This gives rise to a pair of linear systems—one to estimate the volume turbulence profile and second to determine the actuator commands. Solution of the first system is referred to as the estimation step, or tomography, and solution of the second is called the fitting step.

With planning for a future 30-meter class telescope (TMT) that will rely on MCAO underway [4], it is important to be able to solve both the estimation step and the fitting step very quickly. Since TMT will require tens of thousands of sensors and actuators, conventional wavefront reconstructors, which are based on direct matrix-vector multiplication, may be prohibitively expensive. This has motivated the development of alternative approaches that rely on sparse direct matrix equation solvers [5] and on iterative methods like the preconditioned conjugate gradient algorithm (PCG) [6] with multigrid preconditioning (MG-PCG) [7, 8, 9] and Fourier domain preconditioning (FD-PCG) [10].

It should be noted that these iterative methods yield approximations to the optimal wavefront reconstruction. In simulations [9, 10] both MG-PCG and FD-PCG have been shown to be very rapidly convergent, so few iterations are needed. In addition, the cost per iteration is low, so both algorithms provide essentially optimal reconstructions at relatively low cost.

This paper is a follow-up to [10], where FD-PCG was first introduced. We deal with some important issues that either were inadequately addressed or not considered at all in the original paper. These include the fitting step, the cone-coordinate transformation problem, and problems related to computational grids that do not correspond to sensor subaperture corner locations. The cone-coordinate problem is especially important; failure to address it correctly will lead to field distortions in images obtained after MCAO correction.

This paper is organized as follows. In section 2 we review basic MCAO concepts and introduce notation. Section 3 contains a review of optimal volume turbulence estimation. In subsection 3.1 we present the cone-coordinate transformation problem. In subsections 4.1 and 4.2 we review the basic ideas underlying the FD-PCG algorithm and we present 2 different implementations of the algorithm. In subsection 4.3 we describe how to deal with computational grids that are finer than the sensor subaperture grid. Section 5 contains a review of the fitting step and an outline of how to efficiently apply FD-PCG to fitting. In subsection 5.1 we demonstrate FD-PCG fitting for a simulated MCAO system for a 30-meter class telescope. We close with a summary in section 6.

2. Basic Concepts and Notation

We assume a layered atmosphere with a refractive index profile which we denote by ψ(x,z_l ), l = 1,…,n _layer, where the discrete layer heights z_l are known, and x = (x,y) denotes lateral position. Since rapid refractive index variations are concentrated in regions of high turbulence, ψ is often referred to as the atmospheric volume turbulence profile. (Note that one can obtain the $C_{n}^{2}$ profile, a standard measure of turbulence strength as a function of height z, by integrating the square of ψ.) We further assume that at any instant in time, ψ is a realization of a wide-sense stationary stochastic process characterized by a Kolmogorov power spectral density function. See [11] for details.

The phase, or wavefront aberration, is obtained by propagating light from an idealized point light source, either at infinity in the case of a natural guidestar (NGS) or at a known finite altitude in the case of a laser guidestar (LGS), through the atmosphere to the pupil plane of the telescope. We denote dependence of phase on volume turbulence by

ϕ (x, θ) = [PΨ] (x, θ),

where θ denotes direction of the guidestar and x denotes location in the pupil plane. Using a geometric optics approximation, we model the propagator P as

[Pψ] (x, θ) = \sum_{ℓ = 1}^{n_{layer}} ψ (x + z_{ℓ} θ, z_{ℓ}) .

In our model, both ψ and ϕ represent optical path differences and have units of length.

We assume that wavefront sensor measurements can be modelled as noisy, discrete approximations to the gradient of the phase. For example, suppressing the dependence on direction θ, the output of the (i,j)th element of a Shack-Hartmann wavefront sensor can be modelled [12] as

s_{i, j}^{x} = \frac{1}{h} [\frac{(ϕ (x_{i} + h, y_{j}) - ϕ (x_{i}, y_{j}))}{2} + \frac{(ϕ (x_{i} + h, y_{j} + h) - ϕ (x_{i}, y_{j} + h))}{2}] + η_{i, j}^{x},

where h represents the sensor subaperture width and $η_{i, j}^{x}$ and $η_{i, j}^{y}$ represent sensor noise and grid discretization effects. Thus $s_{i, j}^{x} \approx \frac{\partial ϕ}{\partial x} (x_{i}, y_{j})$ and $s_{i, j}^{y} \approx \frac{\partial ϕ}{\partial y} (x_{i}, y_{j})$ . We use the symbol Γ to represent the (discrete gradient) mapping from phase ϕ to sensor measurements s = (s^x , s^y ).

Discretization in the direction variable ψ arises naturally because of the finite number of guidestars in a practical MCAO system. We impose a nodal discretization on ψ in the lateral variable x at each layer z_l and nodal discretization of ϕ in each (discrete) direction θ. Hence the operators P and Γ have discrete matrix (approximate) representations. In addition, the Kolmogorov spectrum for (discretized) volume turbulence has a block diagonal matrix representation in the Fourier domain, with one diagonal block for each layer, due to independence of the layers.

In order to efficiently implement computational techniques based on the discrete Fourier transform, it is necessary to impose equispaced rectangular grids on the lateral variable x = (x,y) at each altitude z_l . However, telescope apertures are typically circular or annular. Hence it is necessary to introduce an additional masking operator M. We take [Ms](x_j ,y_j ) to equal s(x_i ,y_i ) if (x_i ,y_j ) lies within the aperture and we take it to equal zero for (x_i ,y_i ) outside the telescope aperture. In an abuse of notation, we will also refer to the masking function whose value M(x_j ,y_j ) = 1 for (x_j ,y_j ) inside the aperture and M(x_j ,y_j ) = 0 for (x_j ,y_j ) outside the aperture.

The composition of discretized propagation, discrete gradient (corresponding to sensing), and masking will be represented by the volume turbulence-to-sensor measurement operator

G = M Γ P .

By atmospheric turbulence tomography, or volume turbulence estimation, we mean the task of estimating the (discretized) volume turbulence profile ψ from noisy sensor measurements

s = Gψ + η .

In the implementation of the fitting step, one first propagates the light to the pupil plane from guidestars at infinity in several specified sample directions to obtain an estimated phase, ϕ _est. The sampling must be dense enough to well-represent the phase, but not so dense as to make the computations overly expensive. The estimation plus this first stage of the fitting step can together be viewed as a glorified interpolation scheme to extend the phase to directions that have not been sampled by the guidestars.

The next stage of fitting is to determine deformable mirror (DM) actuator commands in order to compensate for the estimated phase. To this end, let m(x,z_k ) denote the displacement of the DM at conjugate altitude z_k . The correction for pupil-plane phase aberrations due to the DMs can be represented as

ϕ^{DM} = Pm .

Here P is again a geometric optics propagation operator with a representation as in Eq. (1); the DM displacements are analogous to turbulence layers placed at the DM conjugate altitudes. We assume the DM displacements can be represented as

m (x, z_{k}) = [H_{k} a_{k}] (x), k = 1, \dots, n_{DM} .

The vector a _k represents the actuator commands for the kth DM, and H_k is the actuator-to-DM influence operator. We represent the combination of Eqs. (4) and (5) as

ϕ^{DM} = PH a,

where a = (a ₁,…,a _{n_DM}) is the concatenation of the actuator commands. As in the estimation step, nodal discretization of the DM displacements gives rise to a discrete matrix to represent H. The second stage of fitting can then be formulated as follows: Find the actuator command vector a for which H a best matches (in a sense that we will precisely define in section 5) the estimated phase ϕ _est.

3. Turbulence Estimation for an LGS-NGS MCAO System

To obtain sufficiently bright light sources to perform accurate volume turbulence profile estimation, it is necessary to make use of LGS beacons. Unfortunately, these beacons provide limited information about certain low-order components of the profile like tip-tilt, so additional NGS beacons are needed [5]. In this section we address technical issues that arise from a mixture of LGS and NGS beacons.

As in the previous section, we let ψ represent a nodal discretization of the volume turbulence profile. We separate wavefront sensor measurements into high-order components s_h (from the LGSs) and low-order components s_t (from the NGSs),

s = [\begin{matrix} s_{h} \\ s_{t} \end{matrix}] = [\begin{matrix} G_{h} ψ \\ G_{t} ψ \end{matrix}] + [\begin{matrix} η_{h} \\ η_{t} \end{matrix}] .

Each of G_h , G_t has form similar to G in Eq. (3). From [3], the optimal volume turbulence estimate is then given by

ψ_{est} = \underset{ψ}{\arg min} {{∥ G_{h} ψ - s_{h} ∥}_{C_{h}^{- 1}}^{2} + {∥ G_{t} ψ - s_{t} ∥}_{C_{t}^{- 1}}^{2} + {∥ ψ ∥}_{C_{ψ}^{- 1}}^{2}}

where ${∥ x ∥}_{A}^{2} \overset{def}{=} x^{T} Ax, C_{h}$ is the covariance matrix for high order sensor noise,C_t is the covariance matrix for low order sensor noise, and C_ψ is the covariance matrix for volume turbulence. The first two terms in (8) correspond to the pair of entries in (7), while the third is a prior, or regularization term, whose role is to incorporate prior statistical information about the volume turbulence.

By setting the gradient of the bracketed quantity in Eq. (8) equal to zero, we obtain

ψ_{est} = {(G_{h}^{T} C_{h}^{- 1} G_{h} + G_{t}^{T} C_{t}^{- 1} G_{t} + C_{ψ}^{- 1})}^{- 1} (G_{h}^{T} C_{h}^{- 1} s_{h} + G_{t}^{T} C_{t}^{- 1} s_{t})

= [C_{ψ} G_{h}^{T} C_{ψ} G_{t}^{T}] {[\begin{matrix} G_{h} C_{ψ} G_{h}^{T} + C_{h} & G_{h} C_{ψ} G_{t}^{T} \\ G_{t} C_{ψ} G_{h}^{T} & G_{t} C_{ψ} G_{h}^{T} + C_{t} \end{matrix}]}^{- 1} [\begin{matrix} s_{h} \\ s_{t} \end{matrix}]

Following [5, 10], we express the covariance matrix for high-order noise in the presence of tip-tilt uncertainty as

C_{h}^{- 1} = N_{h}^{- 1} - N_{h}^{- 1} T {(T^{T} N_{h}^{- 1} T)}^{- 1} T^{T} N_{h}^{- 1}

where N_h describes the noise within the high-order sensors, and T is a low-rank block matrix with 2n _LGS columns (n _LGS denotes number of LGSs). The product $C_{h}^{- 1}$ s_h can be interpreted as a noise-weighted, tip-tilt-removed version of the high-order sensor signals sh. We then rewrite Eq. (29) as

ψ_{est} = {(A_{h} - A_{lr})}^{- 1} (b_{h} + b_{t})

where

A_{h} = G_{h}^{T} N_{h}^{- 1} G_{h} + C_{ψ}^{- 1},

A_{lr} = G_{h}^{T} N_{h}^{- 1} T {(T^{T} N_{h}^{- 1} T)}^{- 1} T^{T} N_{h}^{- 1} G_{h} - G_{t}^{T} C_{t}^{- 1} G_{t},

b_{h} = G_{h}^{T} C_{h}^{- 1} s_{h}, b_{t} = G_{t}^{T} C_{t}^{- 1} s_{t} .

The two matrix products on the right hand side of Eq. (14) have rank 2n _LGS and n_t , respectively, where n_t is the total number of low-order sensor measurements. Hence we can decompose

A_{lr} = U_{1} U_{1}^{T} - U_{2} U_{2}^{T},

where U ₁ has 2n _LGS columns and U ₂ has n_t columns. On the other hand, the matrix A_h in Eq. (13) has full rank. The Sherman-Morrison formula [6] allows to write

{(A_{h} - A_{lr})}^{- 1} = {[A_{h} - (U_{1} U_{2}) (\begin{matrix} U_{1}^{T} \\ - U_{2}^{T} \end{matrix})]}^{- 1}

where

W_{1} = A_{h}^{- 1} U_{1}, W_{2} = A_{h}^{- 1} U_{2} .

Then setting b = b_h + b _lr, we obtain from (12) and (17)

ψ_{est} = A_{h}^{- 1} b + Lb .

From (17) we see that the product Lb can be computed by (i) taking a few dot products to obtain $W_{1}^{T}$ b and -W ² ^T b; (ii) applying the inverse of a small (2n _LGS + n_t × 2n _LGS + n_t ) matrix to the result of (i); and (iii) adding together scalar multiples of the columns of W ₁, W ₂, where the scalars come from (ii). The product $A_{h}^{- 1}$ b is computed iteratively using FD-PCG.

3.1. The Cone-Coordinate Transformation

With geometric optics propagation, the contribution to the pupil-plane phase at position x due to light that has passed through a turbulence layer at altitude z_l from a guidestar at height H with orientation θ is proportional to

ψ (c x + s, z_{ℓ}) \overset{def}{=} {\tilde{ψ}}_{ℓ} (\tilde{x} + \tilde{s}) .

Here c = 1 - z_l /H and s = z_lθ are the cone compression factor and the shift, respectively, and the mapping

x \mapsto c x \overset{def}{=} \tilde{x} .

is referred to as the cone coordinate transformation.

From Eq. (20) we see that the cone coordinate transformation allows us to replace the combination of a rescaling and a shift with a simple shift. This simplifies the representation of the high-order geometric optics propagator Ph and facilitates the FD-PCG method. Figyre 1 illustrates the discrete grids that conform to cone coordinates and to standard coordinates.

Fig. 1. Illustration of cone-coordinate grid vs conventional equispaced grid. Cone-coordinate grid points in figure on the left have grid spacings that decrease with layer height. The conventional grid on the right has the same number of grid points at each layer height, but the grid spacing does not vary with layer height.

Download Full Size | PDF

The difficulty that arises with a (nodal) cone coordinate representation of the volume turbulence profile is that the low-order propagator P_t then has a more complicated representation. To overcome this problem, we interpolate back to standard coordinates before applying P_t . In cone coordinates, Eq. (8) takes the form

{\tilde{ψ}}_{est} = \underset{\tilde{ψ}}{\arg min} {{∥ {\tilde{G}}_{h} \tilde{ψ} - s_{h} ∥}_{C_{h}^{- 1}}^{2} + {∥ G_{t} I_{c} \tilde{ψ} - s_{t} ∥}_{C_{t}^{- 1}}^{2} + {∥ \tilde{ψ} ∥}_{C_{\tilde{ψ}}^{- 1}}^{2}}

The G̃_h in the first term on the right hand side denotes the cone-coordinate representation of G_h in Eqn. (7). The I_c in the second term denotes the transformation from cone coordinates back to standard coordinates. In practice all operators have discrete matrix representations and I_c is a 2-D interpolation matrix, which is very sparse. In cone coordinates, Eqs. (12)-(15) must be slightly modified, with G̃_h taking the place of G_h and G_tI_c taking the place of G_t . Given $\tilde{ψ}$ _est, one computes

ψ_{est} = I_{c} {\tilde{ψ}}_{est}

to get an estimate for the volume turbulence profile in standard coordinates.

4. FD-PCG Implementation of the Estimation Step

The effectiveness of the FD-PCG algorithm is a consequence of the fact that the components of the matrix A_h in Eq. (13) nearly all have nice Fourier domain representations. In particular, the discrete gradient operator Γ corresponding to the Fried model (2) has a diagonal Fourier representation due to the Fourier shift theorem. The Fourier representer for the inverse Kolmogorov turbulence covariance matrix $C_{ψ}^{- 1}$ is block diagonal with diagonal blocks. In cone coordinates, the high-order propagator P_h has a Fourier representer which has a block decomposition with diagonal blocks, again as a consequence of the shift theorem. The mask M unfortunately does not have a compact representation, but it can be well-approximated by a scalar multiple of a diagonal matrix, as we demonstrated in [10].

We decompose the propagator P_h into blocks P_kj , where index k corresponds to LGS direction and index j corresponds to turbulence layer. Then from Eqs. (13) and (3), Ah also has a block decomposition,

{[A_{h}]}_{ij} = \sum_{k = 1}^{n_{LGS}} P_{ki}^{T} S_{k} P_{kj} + δ_{ij} B_{j}, i, j = 1, \dots, n_{L},

with

S_{k} = Γ_{x}^{T} M N_{h}^{- 1} M Γ_{x} + Γ_{y}^{T} M N_{h}^{- 1} M Γ_{y}, k = 1, \dots, n_{LGS},

the δ_ij is the Kronacker delta, the regularization term B_j corresponds to the inverse covariance matrix for the jth layer of turbulence, and Γ_x, Γ_y are the x- and y-components of the discrete gradient Γ.

In cone coordinates, each component propagator P_kj is a simple shift operator. Hence it has a Fourier representation

P_{kj} = F^{- 1} {\hat{P}}_{kj} F,

where F represents the 2-D discrete Fourier transform on an n_x × n_x grid and P̂_kj is a simple filter (component-wise scalar multiplication) operator. We assume the components Γ_x, Γ_y of the gradient operator have analogous Fourier-domain filter representations $\hat{Γ}$ _x, Γ̂_y. In addition, we assume the tip-tilt removed noise covariance is scalar, $N_{h}^{- 1}$ = $σ_{h}^{- 2}$ I. Then

{[A_{h}]}_{ij} = F {\hat{A}}_{ij} F^{- 1} with {\hat{A}}_{ij} \overset{def}{=} \sum_{k} {\hat{P}}_{ki}^{*} {\hat{S}}_{k} {\hat{P}}_{kj} + δ_{ij} {\hat{B}}_{j},

where B̂_j = FB_jF ^-1, ∗ denotes complex conjugate transpose, and

{\hat{S}}_{k} = σ_{h}^{- 2} ({\hat{Γ}}_{x}^{*} \hat{M} {\hat{Γ}}_{x} + {\hat{Γ}}_{y}^{*} \hat{M} {\hat{Γ}}_{y}), \hat{M} \overset{def}{=} {FMF}^{- 1}

\approx σ_{h}^{- 2} ({∣ {\hat{Γ}}_{x} ∣}^{2} + {∣ {\hat{Γ}}_{y} ∣}^{2})

As in [10], we have approximated the Fourier transformed pupil mask M̂ by a scalar multiple of the identity to obtain (29) from (28). For simplicity of presentation we have taken the scalar to be 1 in Eq. (29). We will revisit the issue of masking in Section 4.3 below.

4.1. Direct Implementation of FD-PCG

PCG iteration to solve A_hx = b requires storage of 4 vectors, each the size of the right-hand-side b; see [6] for details. The dominant costs at each iteration are typically from the operator multiplications h = A_hd and the preconditioner applications z = C ^-1 r.

PCG iteration can be applied directly to compute x = $A_{h}^{- 1}$ b in Eq. (19). In this case, motivated by Eq. (27) and approximation (29), we take the preconditioner to be the matrix C having block components

{[C]}_{ij} = F^{- 1} {\hat{C}}_{ij} F with {\hat{C}}_{ij} = \sum_{k = 1}^{n_{LGS}} {\hat{P}}_{ki}^{*} {\tilde{S}}_{k} {\hat{P}}_{kj} + δ_{ij} {\hat{B}}_{i},

Let C denote the matrix comprised of blocks Ĉ_ij . We demonstrated in [10] that there exists a permutation, or reordering of rows and columns, for which Ĉ is block diagonal and the diagonal block size is quite small. (Diagonal block size will be addressed in detail in Section 4.3 below.) Hence Ĉ can be inverted directly and efficiently stored. The preconditioning step z = C ^-1 r is then implemented as follows:

2-D Fourier transforms are applied to the blocks of r (note that blocks correspond to turbulence layers; there are n _layer blocks), yielding a vector f.
The entries of r̂ are permuted, yielding a vector r̃.
r̃ is multiplied by the block diagonal inverse of the reordered Ĉ, yielding ẑ.
The inverse permutation is applied to ẑ, yielding z.
Inverse Fourier transforms are applied to the blocks of ẑ to obtain z.

In the direct implementation of PCG, computations h = A_hd are carried out as sparse matrix-vector multiplications using the block decomposition (24). The propagators P_kj correspond to simple shift-and-adds and need not be stored, and the matrices S_k are very sparse. We employ Ellerbroek’s biharmonic approximation to the inverse turbulence covariance [5], so the regularization matrices B_j are also very sparse. Computations z = C ^-1 r are dominated by the layer-wise 2-D Fourier transforms in steps (i) and (v).

4.2. Transformed Implementation of FD-PCG and Comparison with Direct Implementation

Alternatively, one can apply PCG to the transformed system Âx̂ = b, where Â has blocks given in (27) and b has blocks b̂_j = F[b]_j. In this case the preconditioning step ẑ = C ^-1 r̂ is carried out as in the direct PCG approach, but the Fourier transforms in steps (i) and (v) are omitted.

To carry out the step ĥ = Âd̂, we see from (27) that Fourier domain propagations can be carried out as component-wise product, or filter operations, involving (complex-valued) scalar multiplications. Similarly, the regularization terms B̂_i give rise to filtering operations. However, we see from Eq. (28) that since the transformed pupil mask M̂ has a full matrix representation, a Fourier transform / inverse Fourier transform pair is required for multiplication by each Ŝ_k .

If n _layer = n _LGS, then the number of Fourier transforms applied during each PCG iteration is the same for the direct implementation and the transformed implementation. The cost of propagation and regularization is very similar using both approaches. However, the transformed implementation requires n _layer additional Fourier transforms to obtain b̂ from b and n _layer additional inverse Fourier transforms to obtain x from x̂. Moreover, the transformed approach requires 4 complex-valued PCG vectors, while the direct approach requires 4 real-valued vectors, which need half as much storage.

4.3. Grid Masking for High Order Sensor Subapertures

In our model for high-order sensors we assume sensor subapertures are square and have vertices (corners) which lie on the computational grid. To facilitate very accurate turbulence profile estimation, we take a computational grid that contains additional points that are not subaperture vertices. In order for the preconditioner C to closely approximate the operator A_h , we must incorporate information about this grid structure in the Fourier domain. To this end we extend the subaperture vertex grid to the entire computational domain, as shown in Fig. 2, and we define the subaperture mask

M_{S} (x) = {\begin{matrix} 1, if x is a subaperture vertex \\ 0, otherwise. \end{matrix}

We then replace the S̃_k in Eq. (29) with

{\tilde{S}}_{k} = σ_{h}^{- 2} ({\hat{Γ}}_{x}^{*} {\hat{M}}_{S} {\hat{Γ}}_{x} + {\hat{Γ}}_{y}^{*} {\hat{M}}_{S} {\hat{Γ}}_{y}), {\hat{M}}_{S} = F^{- 1} M_{S} F,

and we construct the preconditioner C from Eq. (30). As in [10], we reorder the rows and columns of the Fourier representer Ĉ according to spatial frequency, so the permuted Ĉ is block diagonal with diagonal blocks that are n_b × n_b , where n_b = n _layer(Δs/Δx)², with Δs equal to the sensor subaperture spacing and Δx equal to the spacing of the computational grid. For example, for a 6-layer atmospheric model with the grid shown in Fig. 2, Δs/Δx = 2 and the diagonal blocks of the permuted Ĉ are 24 × 24.

Fig. 2. Sensor subaperture grid for a simulated 4-meter telescope. Green circles represent points in the computational grid. Blue stars represent vertices of 1/2 meter × 1/2 meter square high-order sensor subapertures. The red circle represents the outer edge of the clear aperture, or pupil.

Download Full Size | PDF

5. FD-PCG for the Fitting Step

Given a volume turbulence profile ψ _est, one propagates to the pupil plane to obtain an estimated phase, ϕ _est. Slightly modifying the notation introduced at the end of section 2, we model the masked DM corrections to the phase as

ϕ^{DM} = {MP}_{DM} H a,

where M is a pupil mask, a is the actuator command vector, H a represents PDM figures at the conjugate altitudes, and P _DM represents propagation from the nsample virtual guidestars, through “layers” that correspond to DM displacements at the conjugate altitudes, to the pupil plane. P _DM has an n _sample × n _DM block decomposition, the actuator influence operator H is block diagonal with n _DM blocks, and a can be decomposed into n _DM blocks.

The optimal actuator command vector is given by

a_{pot} = \arg \min_{a} {{∥ {MP}_{DM} H a - ϕ^{est} ∥}_{W}^{2} + {∣ a ∣}_{R}^{2}}

where

A_{fit} = H^{T} P_{DM}^{T} MWM P_{DM} H + R .

The purpose of the matrix W is to allow selective weighting in certain sampling directions, and the role of R is to stabilize the actuator commands. One can also accommodate fast tip-tilt mirrors or other low-order corrective elements. These give rise to low-rank terms and can be handled as in the estimation step using the Sherman-Morrison formula.

The matrix A _fit takes the form of A_h in Eq. (13) provided we identify MP _DM H, W, and R in (34) with G_h , $N_{h}^{- 1}$ , and $C_{ψ}^{- 1}$ in (13). In order to efficiently implement FD-PCG, components of A _fit must be well-approximated by matrices with sparse Fourier domain representations. For this reason, we select W and R to have blocks with translation invariant structure (the matrix representers are then block circulant with circulant blocks, or BCCB). The propagator P _DM should automatically be translation invariant, and many DM models assume that the blocks of H are translation invariant. As was the case in the estimation step, only the pupil mask does not have a sparse Fourier representer, but it can be approximated as before. We can again employ either of the two FD-PCG solutions strategies described in Sections 4.1 and 4.2.

5.1. An Illustrative Example

We present results for the fitting step applied to the simulated MCAO system for a 30-meter telescope described in [10]. This incorporates a 6-layer Cerro Pachon model for the atmosphere, 5 laser guidestars at 90 Km, 4 natural guidestars to deal with tip-tilt uncertainty, two DMs (one conjugate to ground and the second to 12 Km) with a corrected field of view of 30 arcseconds, and 1/2 meter sensor subaperture and actuator spacings. For the estimation step we now use 256 × 256 computational grids at each of the 6 layers, with twice the grid resolution in [10] (1/4 meter grid spacing on a square domain with a 64 meter width). The performance of the FD-PCG algorithm for estimation is much the same as in [10], so we present no additional details here.

We propagated the estimated volume turbulence profile to the pupil plane and then carried out the rest of the fitting step using the transformed implementation of the FD-PCG algorithm; see Section 4.2. The actuator influence functions, which determine the matrix H in Eq. (4), are taken to be bilinear splines with control points that coincide with the high-order sensor subaperture vertices (see Fig. 2). The matrix H has translation invariant structure and is quite sparse. FD-PCG performance is summarized in Fig. 3.

Let a _k denote the approximation to a _opt in Eq. (33) after k FD-PCG iterations, and let $ϕ_{k}^{DM}$ denote the corresponding DM correction to the phase, obtained by substituting a _k for a in Eq. (32). By the residual phase error we mean ϕ ^true - $ϕ_{k}^{DM}$ , where ftrue is the true phase, obtained by propagating through the (simulated) true volume turbulence profile to the pupil plane.

Note that at most 3 iterations are required to reduce the residual phase error to its asymptotic level, where the difference between a _k and a _opt is negligible and produces negligible change in $ϕ_{k}^{DM}$ . Any further reduction in phase error would require adding more DMs or more DM actuators to reduce the fitting error or decreasing sensor noise to reduce the estimation error.

Fig. 3. FD-PCG performance for the fitting step. The blue line shows the on-axis residual phase error, and the red line shows phase error at the edge of the 30-arcsecond corrected field of view of the telescope.

Download Full Size | PDF

In [10] we observed similar behavior of the RMS phase error as we varied the number of FD-PCG iterations in the estimation step. The main difference was that 6 to 8 iterations were required to achieve asymptotic error levels for the estimation step, while only 2 or 3 iterations are required to reach asymptotic error levels in the fitting step.

5.2. Cost of Estimation and Fitting for the Example

For the example above, elements are required to store the nodal discretization of ψ in the estimation step. Standard implementations of PCG require 4 copies of ψ. There is some additional overhead to store the components of A_h , but by taking advantage of special structure (sparsity and translation invariance), this can be kept relatively small. However, the W ₁ and W ₂ matrices in the low-rank matrix L in Eq. (17) require a great deal more storage. Each column of each W_i has N _est elements, and there are 2n _LGS + n_t = 2×5 + 8= 18 columns in W ₁, W ₂ for the simulation. Hence the storage costs in the estimation step are dominated by the low-rank terms.

Again in the estimation step, the low-rank computation L_b (which is O(N _est)) is much cheaper than the cost of computing $A_{h}^{- 1}$ b via FD-PCG iteration. The FD-PCG costs are the cost per iteration multiplied by the total number of iterations. Dominant costs per iteration are multiplications by A_h and by the inverse preconditioner C ^-1. Whether we use the direct approach in section 4.1 or the transformed approach in section 4.2 we must apply forward and inverse discrete Fourier transforms to block vectors with N _est entries (each block is 256 × 256 in our simulation). Using the fast Fourier transform, the asymptotic cost for this is then O(N _estlogN _est). There are some additional order N _est costs, e.g., from the multiplication by the relatively small blocks of Ĉ ^-1.

While a detailed discussion of parallel implementation is beyond the scope of this paper, it should be noted that the layered structure of ψ and the block structure of the components of A_h and of Ĉ ^-1 provide ample opportunities to dramatically reduce computational costs in a parallel computing environment.

Storage requirements for the fitting step are significantly lower than for the estimation step. In the above simulation, there are only 2 DMs, so with a 256 × 256 nodal discretization of each DM, only $N_{fit} \overset{def}{=} 6 \times 256^{2} \approx 130,000$ storage elements are needed to represent the DMs. PCG again requires 4 copies, but there are no low-rank terms.

Computational costs in the fitting step are also significantly smaller than for estimation. The asymptotic costs of the FFTs at eachfitting iteration are much smaller because N _fit = N _est/3 (due to 6 layers in the estimation vs 2 DMs in the fitting). In addition, the 2 or 3 FD-PCG iterations required for the fitting step are much fewer than the 6 to 8 iteration needed for estimation.

6. Discussion and Conclusions

In [10] we introduced the FD-PCG algorithm to efficiently carry out the estimation, or tomography, step in minimum variance wavefront reconstruction. Advances in this paper included

A demonstration that the FD-PCG algorithm can be adapted to efficiently solve the fitting step in optimal MCAO wavefront reconstruction. We show that for a simulated MCAO system for a 30-meter telescope, at most 3 FD-PCG iterations are needed for the fitting step.
Solution to the cone coordinate transformation problem. This problem arises with mixed LGS-NGS systems and can lead to field distortion if it is not handled correctly. Our solution involves a cone-coordinate-to-standard coordinate interpolation and is efficiently implemented with sparse matrix techniques.
Solution of problems related to computational grids that have higher resolution than the high-order sensor subaperture grids. These problems are solved with a subaperture mask. This leads to a somewhat more complex preconditioner, but the additional computational overhead is small relative to the cost of the 2-D Fourier transforms.
Two separate implementations of the FD-PCG algorithm. In the direct approach, PCG vectors lie in the spatial domain and are real-valued, while with the transformed approach the PCG vectors lie in the Fourier domain and are complex valued. The direct approach requires slightly fewer Fourier transforms and slightly less storage than does the transformed approach. Perhaps the only advantage of the transformed approach is that allows easy implementation of wave optics propagators. These have a sparse Fourier domain representation but have a convolution representation in the spatial domain.

Acknowledgments

This research was supported in part by the Air Force Office of Scientific Research through grant F49620-02-1-0297 and by a grant from the Optical Sciences Company.

References and links

1. J. M. Beckers, “Increasing the size of the isoplanatic patch with multi-conjugate adaptive optics,” in Proceedings of European Southern Observatory Conference and Workshop on Very Large Telescopes and Their Instrumentation, M. H. Ulrich, ed., Vol. 30 of ESO Conference and Workshop Proceedings (European Southern Observatory, Garching, Germany, 1988), pp. 693–703.

2. D. C. Johnston and B. M. Welsh, “Analysis of multi-conjugate adaptive optics,” J. Opt. Soc. Am. A 11, 394–408 (1994). [CrossRef]

3. T. Fusco, J. M. Conan, G. Rousset, L. M. Mugnier, and V. Michau, “Optimal wave-front reconstruction strategies for multi-conjugate adaptive optics,” J. Opt. Soc. Am. A 18, 2527–2538 (2001). [CrossRef]

4. R. G. Dekany, M. C. Britton, D. T. Gavel, B. L. Ellerbroek, G. Herriot, C. E. Max, and J-P. Veran, “Adaptive optics requirements definition for TMT,” Advancements in Adaptive Optics, edited by D. B. Calia, B. L. Ellerbroek, and R. Ragazzoni, Proc. SPIE 5490, 879–890 (2004). [CrossRef]

5. B. L. Ellerbroek, “Efficient computation of minimum-variance wave-front reconstructors with sparse matrix techniques,” J. Opt. Soc. Am. A , 19, 1803–1816 (2002). [CrossRef]

6. G. Golub and C. VanLoan, Matrix Computations, 2nd Edition, Johns Hopkins University Press, 1989.

7. L. Gilles, C. R. Vogel, and B. L. Ellerbroek, “Multigrid preconditioned conjugate-gradient method for large-scale wave-front reconstruction,” J. Opt. Soc. Am. A , 19, 1817–1822 (2002). [CrossRef]

8. L. Gilles, B. L. Ellerbroek, and C. R. Vogel, “Preconditioned conjugate gradient wave-front reconstructors for multi-conjugate adaptive optics,” Appl. Opt. 42, 5233–5250 (2003). [CrossRef] [PubMed]

9. B. L. Ellerbroek, L. Gilles, and C. R. Vogel, “Numerical simulations of multi-conjugate adaptive optics wavefront reconstruction on giant telescopes,” Appl. Opt. 42 (2003), pp. 4811–4818. [CrossRef] [PubMed]

10. Q. Yang, C.R. Vogel, and B.L. Ellerbroek, “Fourier domain preconditioned conjugate gradient algorithm for atmospheric tomography,” Appl. Opt. 45, No. 21 (2006). [CrossRef] [PubMed]

11. J. W. Hardy, Adaptive Optics for Astronomical Telescopes, Oxford University Press, 1998.

12. D. Fried, “Least-square fitting a wave-front distortion estimate to an array of phase-difference measurements,” J. Opt. Soc. Am. , Vol. 67 No. 3, (1977), pp.370–375 [CrossRef]

Fast optimal wavefront reconstruction for multi-conjugate adaptive optics using the Fourier domain preconditioned conjugate gradient algorithm

Abstract

1. Introduction

2. Basic Concepts and Notation

3. Turbulence Estimation for an LGS-NGS MCAO System

3.1. The Cone-Coordinate Transformation

4. FD-PCG Implementation of the Estimation Step

4.1. Direct Implementation of FD-PCG

4.2. Transformed Implementation of FD-PCG and Comparison with Direct Implementation

4.3. Grid Masking for High Order Sensor Subapertures

5. FD-PCG for the Fitting Step

5.1. An Illustrative Example

5.2. Cost of Estimation and Fitting for the Example

6. Discussion and Conclusions

Acknowledgments

References and links

Cited By

Figures (3)

Equations (41)

Optics Express