Principles of image reconstruction in optical interferometry: tutorial

Éric Thiébaut; John Young

doi:10.1364/JOSAA.34.000904

1. INTRODUCTION

Astronomical interferometers sample the so-called visibility which is the Fourier transform of the angular brightness distribution of the observed object [1,2]. These instruments achieve unrivaled angular resolution but only provide an irregular and sparse coverage of the Fourier frequencies. Forming an image from the interferometric observables is then not a simple matter of performing an inverse Fourier transform of the data. Additional a priori assumptions about the regularity (i.e., “smoothness”) or the simplicity of the object are needed in order to interpolate the voids in the frequency coverage of the measurements. Image restoration is then an inverse problem whose solution is a compromise between fitting the available data and keeping the image as regular (or simple) as possible [3–7]. In radio astronomy, image reconstruction algorithms have a long and successful history [8–10] with renewed interest due to the emergence of the compressive sensing theory [11]. There are additional issues at optical wavelengths which make the image reconstruction much more difficult than in radio astronomy. First, the smaller number of recombined telescopes yields a much sparser frequency coverage. Second, to cancel the effects of the turbulence, nonlinear observables such as the power spectrum and the closure phase must be formed from the visibilities. With such observables, more information is lost and recovering an image becomes a more difficult nonconvex inverse problem.

Although there now exist a number of good image reconstruction algorithms dedicated to the processing of optical interferometry data [12], they are not unsupervised black boxes magically producing an image without user control. To use these algorithms successfully, astronomers should be accustomed to the underlying general principles of image reconstruction from sparse data. This is of prime importance for choosing the prior constraints and setting the tuning parameters of the methods. This knowledge is also helpful for choosing between the available algorithms (most being publicly available) and their numerous options. It is also necessary to understand how interferometric observables are measured and their intrinsic limitations, some of which directly impact the image reconstruction and the uniqueness of the solution. It must be stressed that assuming the existence of a single best image for a given data set is, owing to the small number of measurements, a naïve belief. As a consequence, there is not a single best algorithm, and the observer should try different methods and settings to analyze her/his data. This, again, requires some understanding of the underlying recipes. Although we encourage the reader to dive into the vast literature of inverse problems and image reconstruction, this paper aspires to provide a didactic and yet detailed introduction to these methods. The general framework of inverse problems has already been used to formally describe many algorithms for image reconstruction from interferometric data [7,13]. A similar framework is used in this paper, but with an emphasis on the specific issues in optical interferometry and with an overview of the most recent algorithms in this context. The benefits of the most common regularization methods when applied to very sparse interferometric data have already been studied [14]. Here we present an updated list of regularization methods (e.g., the ones implementing sparsity priors) under a more general formulation which is consistent with the adopted framework.

This paper is organized as follows. Starting with a very simple description of an interferometer, we derive the kind of measurements that are provided by such an instrument. We then introduce the principles of image restoration from incomplete data. We explain how to quantitatively compare the image and the data. We detail the available regularizations and their properties. Finally, we provide a short review of the algorithms that have been developed for optical interferometry.

2. SIMPLE INTERFEROMETER

A stellar interferometer consists of two or more telescopes which sample the light wavefronts from a celestial source at spatially separated locations. These wavefront samples are interfered in some place, the recombiner, where a detector is located. Delay lines are inserted in the optical path to compensate for geometrical optical delays and to introduce small phase shifts. Figure 1 shows the layout of a typical two-telescope interferometer. For the sake of simplicity, we consider in the following the case of a monomode stellar interferometer observing an object of small angular size.

Fig. 1. Simple two-telescope stellar interferometer. The two telescopes are denoted by T1 and T2, and the corresponding delay lines are denoted by DL1 and DL1.

Download Full Size | PDF

A. Observed Signal

What can be measured by an interferometer is determined by the intensity of the light at the recombiner for a given observed object. The object is characterized by $A_{obj} (θ)$ , the complex amplitude of the light collected by the telescopes of the interferometer in the direction $θ = s - s_{0}$ relative to the center of the field of view ( $s$ and $s_{0}$ being unit vectors pointing in the considered direction and to the center of the field of view, respectively; see Fig. 1). For a monomode stellar interferometer [15–17], the monochromatic complex amplitude transmitted by a given telescope is a scalar quantity. The general expression of the complex amplitude transmitted by the $j$ th telescope to the recombiner is therefore given by

A_{j} = \iint T_{j} (θ) A_{obj} (θ) d^{2} θ,

with

T_{j} (θ)

a complex amplitude transmission. The total complex amplitude at the recombiner is just the sum of the contributions of the recombined telescopes:

A_{T} = \sum_{j \in T} A_{j} = \iint A_{obj} (θ) \sum_{j \in T} T_{j} (θ) d^{2} θ,

where

T

is the list of recombined telescopes. At optical wavelengths, only the squared modulus of the complex amplitude averaged over a few wave periods can be measured. This quantity is known as the intensity which, for a detector placed at the recombiner, is given by

I_{T} = ⟨ {| A_{T} |}^{2} ⟩ = \iint \iint ⟨ A_{obj} (θ) A_{obj}^{⋆} (θ^{'}) ⟩ \sum_{(j, j^{'}) \in T^{2}} T_{j} (θ) T_{j^{'}}^{⋆} (θ^{'}) d^{2} θ d^{2} θ^{'},

where

⟨ ⟩

denotes the considered time averaging and the exponent

⋆

denotes complex conjugation. For an incoherent source object, averaging the product of complex amplitudes yields

⟨ A_{obj} (θ) A_{obj}^{⋆} (θ^{'}) ⟩ = {\begin{cases} ⟨ A_{obj} (θ) ⟩ ⟨ A_{obj}^{⋆} (θ^{'}) ⟩ = 0 & if θ \neq θ^{'} \\ ⟨ | A_{obj} (θ) |^{2} ⟩ = I_{obj} (θ) & if θ = θ^{'} \end{cases} = I_{obj} (θ) δ (θ^{'} - θ),

where

I_{obj} (θ)

is the brightness distribution of the object. Then,

I_{T} = \iint I_{obj} (θ) \sum_{(j, j^{'}) \in T^{2}} T_{j} (θ) T_{j^{'}}^{⋆} (θ) d^{2} θ .

In order to further simplify this equation, we shall consider an object of small angular size compared to the primary beam [10] of the light collectors (here the telescopes coupled with their monomode filters). Then, the modulus of the amplitude transmission $T_{j} (θ)$ does not depend on $θ$ :

T_{j} (θ) = τ_{j} e^{i ϕ_{j} (θ)},

with

τ_{j} = | T_{j} (θ) | \geq 0

and

ϕ_{j} (θ)

a phase term. For such a small object, the intensity at the recombiner becomes

I_{T} = \iint I_{obj} (θ) [\sum_{j \in T} τ_{j}^{2} + 2 \sum_{j < j^{'}} τ_{j} τ_{j^{'}} \cos (Δ ϕ_{j, j^{'}} (θ))] d^{2} θ,

with

Δ ϕ_{j, j^{'}} (θ) = ϕ_{j^{'}} (θ) - ϕ_{j} (θ)

.

The phase difference $Δ ϕ_{j, j^{'}} (θ)$ has three different contributions: (i) a quasi-static instrumental part due to the optics; (ii) a geometrical optical path delay (denoted by $d$ in Fig. 1); (iii) an unknown random phase due to the turbulence. For a monomode stellar interferometer, the instrumental phase does not depend on the direction of the incident wave. Provided the angular size of the object is smaller than the isoplanatic patch [18], the turbulent phase term is also independent of the direction of the incident wave. In that case, only the geometrical optical path delay depends on $θ$ (or $s$ ). More specifically, the geometrical optical path delay of telescope $j^{'}$ relative to telescope $j$ is

d = ⟨ s, b_{j, j^{'}} ⟩,

with

b_{j, j^{'}} = r_{j^{'}} - r_{j}

the baseline, which is the difference between the respective positions

r_{j}

and

r_{j^{'}}

of the two telescopes. Here

⟨ \cdot, \cdot ⟩

denotes the scalar product between two vectors. Introducing

Δ ϕ_{j, j^{'}} (0)

and

d_{0} = ⟨ s_{0}, b_{j, j^{'}} ⟩

the phase difference and the geometrical delay for the center of the field of view and since

s = s_{0} + θ

, the phase difference between the two telescopes is

Δ ϕ_{j, j^{'}} (θ) = Δ ϕ_{j, j^{'}} (0) - \frac{2 π}{λ} (d - d_{0}) = Δ ϕ_{j, j^{'}} (0) - \frac{2 π}{λ} ⟨ θ, b_{j, j^{'}} ⟩,

where

λ

is the wavelength. For a small field of view,

θ

is in the plane which is tangential to the celestial sphere at the center of the field of view. Let

b_{j, j^{'}}^{proj}

be the projection of the baseline

b_{j, j^{'}}

on this plane and define

ν_{j, j^{'}} = b_{j, j^{'}}^{proj} / λ;

then,

⟨ θ, b_{j, j^{'}} / λ ⟩ = ⟨ θ, b_{j, j^{'}}^{proj} / λ ⟩ = ⟨ θ, ν_{j, j^{'}} ⟩ .

The phase difference $Δ ϕ_{j, j^{'}} (0)$ for the center of the field of view accounts for an instrumental phase $ψ_{j, j^{'}}^{inst}$ and for random phase terms $ϕ_{j}^{atm}$ due to the atmospheric turbulence above each telescope:

Δ ϕ_{j, j^{'}} (0) = ψ_{j, j^{'}}^{inst} + ϕ_{j^{'}}^{atm} - ϕ_{j}^{atm} .

The instrumental phase $ψ_{j, j^{'}}^{inst}$ must be calibrated and will be assumed known. The random atmospheric phases $ϕ_{j}^{atm}$ and $ϕ_{j^{'}}^{atm}$ can also be calibrated—despite the fact that they vary much faster—if a phase reference such as a nearby star is available [19], but most of the time this is not the case for current optical interferometers. Combining the above relations yields the phase difference in the direction $θ$ :

Δ ϕ_{j, j^{'}} (θ) = ψ_{j, j^{'}}^{inst} + ϕ_{j^{'}}^{atm} - ϕ_{j}^{atm} - 2 π ⟨ θ, ν_{j, j^{'}} ⟩ .

For a two-telescope interferometer, say, $T = (j_{1}, j_{2})$ , substituting this expression into Eq. (6) and simplifying yields

I_{j_{1}, j_{2}} = (τ_{j_{1}}^{2} + τ_{j_{2}}^{2}) {\tilde{I}}_{obj} (0) + 2 τ_{j_{1}} τ_{j_{2}} Re ({\tilde{I}}_{obj} (ν_{j_{1}, j_{2}}) e^{i (ψ_{j_{1}, j_{2}}^{inst} + ϕ_{j_{2}}^{atm} - ϕ_{j_{1}}^{atm})}),

where

{\tilde{I}}_{obj} (ν)

is the Fourier transform of the brightness distribution of the object at the spatial frequency

ν

:

{\tilde{I}}_{obj} (ν) = \iint I_{obj} (θ) e^{- i 2 π ⟨ θ, ν ⟩} d^{2} θ .

Equation (13) shows that, for given telescope positions relative to the observed object and during times short enough to freeze the turbulence effects, $I_{j_{1}, j_{2}}$ as a function of the instrumental phase $ψ_{j_{1}, j_{2}}^{inst}$ displays a fringe pattern much like that in Young’s double slit experiment. A quantity of interest which can be extracted from this fringe pattern is the so-called coherent flux [20,21]:

C_{j_{1}, j_{2}} = τ_{j_{1}} τ_{j_{2}} {\tilde{I}}_{obj} (ν_{j_{1}, j_{2}}) e^{i (ϕ_{j_{2}}^{atm} - ϕ_{j_{1}}^{atm})} .

This shows that a two-telescope interferometer gives access to the Fourier transform of the brightness distribution at a spatial frequency equal to the projected baseline divided by the wavelength; see Eq. (9). This ability, however, depends on whether the complex gains $τ_{j} \exp (i ϕ_{j}^{atm})$ are known, as discussed next.

B. Coverage of the Frequency Plane and Index Notation

For image reconstruction, measuring a single spatial frequency is obviously insufficient. The coverage of the frequency plane can be improved by collecting measurements obtained with more baselines (with more than two telescopes), at different wavelengths and at different times (by moving the telescopes or just because the projected baselines change due to the Earth’s rotation). In practice, however, this sampling remains very sparse (see top-left panel of Fig. 2), and not all frequencies are measured. This is one of the key issues for successful image reconstruction in this context.

Fig. 2. Top left: simulated frequency coverage sampled with a six-station Navy prototype optical interferometer (NPOI). Top right: model of LkHa-101 representing the observed object. Bottom left: dirty beam. Bottom right: dirty image. Object model and frequency coverage are from the “2004 Interferometric Beauty Contest” [22].

Download Full Size | PDF

The notation in Eq. (9), which only indicates the two interfering telescopes, is insufficient to account for the temporal variation of the projected baselines and for spectral dependence of the measurements. We should denote as $ν_{j_{1}, j_{2}, ℓ, m}$ the spatial frequency sampled by the pair of telescopes $(j_{1}, j_{2})$ in a spectral channel at effective wavelength $λ_{ℓ}$ and at exposure time $t_{m}$ . As interferences must be simultaneous, measured quantities should also be indexed to reflect that. For instance, $C_{j_{1}, j_{2}, ℓ, m}$ denotes the coherent flux of the interferences between the telescopes $j_{1}$ and $j_{2}$ in the $ℓ$ th spectral channel and during the $m$ th exposure. When this distinction is not necessary and to simplify the notation, we will simply denote a spatial frequency sampled by the data as $ν_{k}$ and the coherent flux at that frequency as $C_{k}$ . In the following, we maintain a consistent index notation ( $j$ for a telescope, $k$ for a datum or a frequency, $ℓ$ for a spectral channel, $m$ for an exposure, $n$ for a pixel, etc.), so that there should be no ambiguities.

C. Interferometric Observables

When the effects of the turbulence are stable or measured in real-time, the terms $τ_{j} \exp (i ϕ_{j}^{atm})$ can be calibrated (or compensated) to directly infer ${\tilde{I}}_{obj} (ν_{j_{1}, j_{2}})$ from the measured coherent flux. Otherwise, photometric channels may be used to estimate the flux transmitted by a given telescope,

P_{j} = τ_{j}^{2} {\tilde{I}}_{obj} (0),

and hence, in order to remove the varying amplitude transmissions

τ_{j}

, measure the quantity

V_{j_{1}, j_{2}} = \frac{C_{j_{1}, j_{2}}}{\sqrt{P_{j_{1}} P_{j_{2}}}} = \tilde{I} (ν_{j_{1}, j_{2}}) e^{i (ϕ_{j_{2}}^{atm} - ϕ_{j_{1}}^{atm})},

where

\tilde{I} (ν) = {\tilde{I}}_{obj} (ν) / {\tilde{I}}_{obj} (0)

is the Fourier transform of the normalized brightness distribution of the object:

I (θ) = \frac{I_{obj} (θ)}{\iint I_{obj} (θ^{'}) d^{2} θ^{'}} .

The observable $V_{j_{1}, j_{2}}$ in Eq. (17) still depends on an unknown phase shift $ϕ_{j_{2}}^{atm} - ϕ_{j_{1}}^{atm}$ due to the turbulence. In order to get rid of this nuisance phase shift, the power spectrum can be measured:

S_{j_{1}, j_{2}} = {| V_{j_{1}, j_{2}} |}^{2} = {| \tilde{I} (ν_{j_{1}, j_{2}}) |}^{2} .

The power spectrum only carries information about the modulus of the Fourier transform of the brightness distribution. Given the simultaneous coherent fluxes for the three baselines formed by three different telescopes ( $j_{1}$ , $j_{2}$ and $j_{3}$ ), the bispectrum [23] can be formed,

B_{j_{1}, j_{2}, j_{3}} = V_{j_{1}, j_{2}} V_{j_{2}, j_{3}} V_{j_{3}, j_{1}} = \tilde{I} (ν_{j_{1}, j_{2}}) \tilde{I} (ν_{j_{2}, j_{3}}) \tilde{I} (ν_{j_{3}, j_{1}}),

to get information about the phase of the Fourier transform of the brightness distribution. Some image reconstruction algorithms consider the closure phase, which is the phase of the bispectrum,

β_{j_{1}, j_{2}, j_{3}} = arg (B_{j_{1}, j_{2}, j_{3}}) = φ (ν_{j_{1}, j_{2}}) + φ (ν_{j_{2}, j_{3}}) + φ (ν_{j_{3}, j_{1}}) \mod 2 π,

where

\arg : C \mapsto (- π, + π]

yields the phase of its argument and

φ (ν) = \arg (\tilde{I} (ν))

is the phase of the Fourier transform of the brightness distribution.

The so-called differential visibilities [24,25] are formed from the cross-product of the coherent flux and a reference coherence flux:

D_{j_{1}, j_{2}, ℓ, m} = {(V_{j_{1}, j_{2}, ℓ, m}^{ref})}^{⋆} V_{j_{1}, j_{2}, ℓ, m} .

The purpose of the reference coherence flux $V_{j_{1}, j_{2}, ℓ, m}^{ref}$ is to cancel most of the randomness due to the turbulent phase in the coherent flux $V_{j_{1}, j_{2}, ℓ, m}$ . To that end, the reference coherence flux is defined as some spectral average of the coherent flux in the other spectral channels. Introducing a matrix $W^{ref}$ of nonnegative weights, a general expression for the reference flux is

V_{j_{1}, j_{2}, ℓ, m}^{ref} = \sum_{ℓ^{'} \neq ℓ} W_{ℓ, ℓ^{'}}^{ref} V_{j_{1}, j_{2}, ℓ^{'}, m} .

In the above two equations, we used the index notation previously introduced to indicate clearly that the differential visibilities are measured from the calibrated coherent fluxes in different spectral channels, but for the same baseline and exposure. The phases of the differential visibilities are known as the differential phases and are used by some image reconstruction algorithms to recover $I (θ, λ)$ , the multispectral brightness distribution of the object.

The power spectra, the bispectra, and the differential visibilities are all insensitive to the random time-varying phase shifts due to the turbulence and can therefore be temporally integrated to improve their signal-to-noise ratio (SNR). As a matter of fact, these observables are invariant to a translation of the object. Thus, its absolute position may not be recovered from such data. Note that correctly measuring these nonlinear observables involves removing biases [21,26,27] whose expressions are not shown in the above expressions.

D. Complex Visibilities and Normalization

In this paper, we will adopt the same convention as in radio astronomy and use complex visibilities or just visibilities to mean the samples of the Fourier transform of the object brightness distribution at the observed frequencies [10], whether it is normalized or not. In optical interferometry, to cope with the turbulence, the observables are formed after $V_{j_{1}, j_{2}}$ , defined in Eq. (17), and thus only depend on the normalized visibilities, which are the Fourier transform of the brightness distribution given in Eq. (18), and which are equal to unity at the zeroth frequency. In radio astronomy, it is often the case that the visibility at the zeroth frequency cannot be measured by the interferometric array and has to be determined by other means or imposed during the image reconstruction. To handle all these cases consistently, we will hereinafter denote by $I (θ)$ the brightness distribution of interest and explicitly introduce

ϱ = \iint I (θ) d^{2} θ = \tilde{I} (0),

the total brightness, which is also the visibility at the zeroth frequency. The OI-FITS data format [28] used in optical interferometry assumes normalized visibilities and thus that

ϱ = 1

. The proposed new OI-FITS 2 format [25] adds support for coherent fluxes (among other things).

3. IMAGING FROM SPARSE FOURIER DATA

Image reconstruction from sparse data is a well studied but difficult problem which is further complicated when dealing with nonlinear data such as the power spectra, the bispectra (or the closure phases) or the differential visibilities (or the differential phases). We first consider reconstruction from the sparse measurements of the Fourier transform of the brightness distribution. This is a typical problem for radio astronomy or for optical interferometry when a phase reference [19] is used, and thus complex visibilities are available. Understanding the principles of image reconstruction [4,6] is recommended for the proper use of a given algorithm. Fortunately, most, if not all, image reconstruction methods follow similar approaches [7,12,13], which we will review here.

A. Image and Complex Visibility Models

For practical reasons, the image sought must be a parametric approximation of $I (θ)$ , the (possibly normalized) object brightness distribution. The image assumed by almost all existing reconstruction algorithms can be described by a linear expansion such as

I (θ) \approx I_{lin} (θ) = \sum_{n = 1}^{N} x_{n} h_{n} (θ),

where

{h_{n} : Θ \mapsto R}_{n = 1, \dots, N}

is a given basis of real-valued functions on the field of view

Θ

and the coefficients

x \in R^{N}

are the image parameters. The total intensity of this class of model image is exactly computable:

ϱ = \iint I (θ) d θ \approx \iint I_{lin} (θ) d θ = ⟨ c, x ⟩,

where

c \in R^{N}

has components

c_{n} = \iint h_{n} (θ) d θ

. The Fourier transform of the model image is also exactly computable:

{\tilde{I}}_{lin} (ν) = \sum_{n = 1}^{N} x_{n} {\tilde{h}}_{n} (ν),

with

{\tilde{h}}_{n} (ν)

the Fourier transform of

h_{n} (θ)

. For the observed spatial frequencies, the model of the visibility

V_{k}

is then

\tilde{I} (ν_{k}) \approx {(H \cdot x)}_{k} = \sum_{n = 1}^{N} H_{k, n} x_{n},

where

H \in C^{K \times N}

is a linear operator whose coefficients are

H_{k, n} = {\tilde{h}}_{n} (ν_{k})

and

K

is the number of sampled frequencies.

Although the linear expansion in Eq. (25) is general and can represent specific models, such as wavelet-based decompositions [12,29] or a mixture of a pixel-based image and a parametric model [30], a more typical example is

I (θ) \approx I_{grd} (θ) = \sum_{n = 1}^{N} x_{n} h (θ - θ_{n}),

which is the model in Eq. (25) with

h_{n} (θ) = h (θ - θ_{n})

, where

θ_{n}

are angular positions forming a rectangular equispaced grid. The function

h : Θ \mapsto R

may be thought of as an interpolation function; it is called the clean beam in the Clean method [8], the neat beam in the Wipe method [31], a building block [32] or, more simply, the pixel shape of a conventional image representation. This family of models yields

c_{n} = \iint h (θ) d θ

, which is constant, and

H_{k, n} = \tilde{h} (ν_{k}) \exp (- i 2 π ⟨ θ_{n} ν_{k} ⟩)

.

Finally, we note that most image reconstruction algorithms assume that the image parameters are just samples of the brightness distribution,

x_{n} \approx I (θ_{n}) (\forall n),

and compute the visibilities with

H_{k, n} = \exp (- i 2 π ⟨ θ_{n}, ν_{k} ⟩) (\forall k, \forall n) .

This corresponds to the choice $h_{n} (θ) = h (θ - θ_{n}) = δ (θ - θ_{n})$ in the previous models. Computing the visibilities then amounts to performing a nonuniform Fourier transform. There are fast algorithms, such as nonuniform FFT (NUFFT) [33] or nonequispaced FFT (NFFT) [34], to compute a good approximation of $H \cdot x$ by means of the fast Fourier transform (FFT). Another consequence of this choice is that $c_{n} = 1$ ( $\forall n$ ). Unless otherwise specified, we will assume this kind of image model in this paper.

The size of the synthesized field of view and of the image pixels has to be chosen according to the extension of the observed object and to the resolution of the interferometer. To avoid biases and inaccurate approximations caused by the particular image model, the grid spacing $Δ θ$ should be well beyond the limit imposed by the longest (projected) baseline $b_{\max}^{proj}$ between any pairs of interfering telescopes:

Δ θ ≪ \frac{λ}{2 b_{\max}^{proj}} .

Oversampling by a factor of at least 2 is normally used; hence, the pixel size is typically given by $Δ θ ≲ λ / (4 b_{\max}^{proj})$ . To avoid aliasing and image truncation, the field of view chosen must be large enough and without forgetting that the reciprocal of the width of the field of view also sets the sampling step of the spatial frequencies.

B. Obtaining a Dirty Image

We now consider a first attempt for image restoration from sparse sampling of the spatial frequency plane in the simplest case, that is, assuming that complex visibilities have been measured. In this context, let ${ν_{k}}_{k = 1, \dots, K}$ be the list of sampled frequencies and $y \in C^{K}$ be the data ( $y_{k} = V_{k}$ is the measured complex visibility at frequency $ν_{k}$ ). According to Eq. (28), $y \approx H \cdot x$ with $x \in R^{N}$ the image parameters and where $\approx$ accounts for the model approximations and for the noise. In practice, there are more pixels in the sought image than measurements ( $N ≫ 2 K$ ), so the image restoration problem is ill-posed. However, the relationship between $y$ and $x$ is linear, and a possible solution that comes to mind is to estimate $x$ by means of the Moore–Penrose pseudoinverse of $H$ applied to the data $y$ (hence neglecting the noise). This is equivalent to finding the minimal norm image $\hat{x}$ such that $H \cdot \hat{x} = y$ , as stated by the constrained optimization problem

\min_{x} {‖ x ‖}_{2} s.t. H \cdot x = y,

where

{‖ x ‖}_{2} = \sqrt{⟨ x, x ⟩}

is the usual Euclidean (

ℓ_{2}

) norm. The Lagrangian of the above problem is

L (x, u) = (1 / 2) {‖ x ‖}_{2}^{2} - ⟨ u, H \cdot x ⟩ = (1 / 2) {‖ x ‖}_{2}^{2} - ⟨ H^{⋆} \cdot u, x ⟩,

with

H^{⋆}

the adjoint of

H

and

u

the Lagrange multipliers associated with the constraints

H \cdot x = y

. Since

\nabla_{x} L (x, u) = x - H^{⋆} \cdot u

, the solution takes the form

\hat{x} = H^{⋆} \cdot \hat{u}

, where

\hat{u}

is such that the equality constraints hold, that is,

H \cdot H^{⋆} \cdot \hat{u} = y

. Considering the simple case in Eq. (31) and assuming a given frequency is only measured once, an approximation of

H \cdot H^{⋆}

is given by

{(H \cdot H^{⋆})}_{k, k^{'}} = \sum_{n = 1}^{N} \exp (- i 2 π ⟨ θ_{n}, ν_{k} - ν_{k^{'}} ⟩) ≍ N δ_{k, k^{'}},

because the sum is equal to

N

when

ν_{k} = ν_{k^{'}}

and approximately equal to zero when

ν_{k} \neq ν_{k^{'}}

(which is equivalent to

k \neq k^{'}

under our assumptions). Thus

\hat{u} \approx y / N

and

\hat{x} \approx (1 / N) H^{⋆} \cdot y,

which is known as the dirty image [10]. If the object is a point-like source (situated at the center of the field of view), all complex visibilities are equal to

ρ

, and the corresponding dirty image is called the dirty beam. Examples of a dirty beam and dirty image are shown in the lower left and right panels of Fig. 2; clearly, the general inverse does not provide a good solution to the image restoration problem.

In radio astronomy, where the complex visibilities can be measured, image restoration is often treated as the deconvolution of the dirty image by a point spread function (PSF), which is the dirty beam [35,36]. This, however, assumes a stationary distribution of the noise. The deconvolution approach is therefore not applicable if one wants to account for more realistic noise statistics or if complex visibilities are not measured directly (as is the case in optical interferometry).

C. Improving the Reconstruction

The dirty image arose from an attempt to solve an ill-posed problem; by reducing the number of possible solutions, one can hope to improve the situation. Being a brightness distribution, the image must be nonnegative everywhere and may be normalized according to Eq. (24). As a consequence, the sought image should be restricted to belong to the convex set $Ω \subset R^{N}$ of nonnegative and normalized images,

Ω = {x \in R^{N} | x \geq 0, ⟨ c, x ⟩ = ϱ},

where

c

is defined in Eq. (26) and the relation

x \geq 0

taken component-wise expresses the nonnegativity of the pixel values, assuming the simple image model in Eq. (30). Strict constraints, such as the image being nonnegative, are effective for image reconstruction from sparse visibilities but are yet insufficient to select a single image out of all the ones which fit the data.

Selecting a single image was the purpose of minimizing the Euclidean norm in Eq. (33). By Parseval’s theorem, the Euclidean norm of the image is also that of its Fourier transform; thus the Fourier transform of the dirty image is equal to the measured visibilities at the sampled frequencies and equal to zero elsewhere. This explains the ripples and the poor quality of the dirty image. Intuitively, a more appealing solution would be an image whose Fourier spectrum interpolates the sparse data in a smoother way than that of the dirty image. In other words, a simpler or more regular image than the dirty image is to be preferred. Accounting for the strict constraints implemented by $Ω$ , we are then inclined to reformulate Eq. (33) as

\min_{x \in Ω} f_{prior} (x) s.t. H \cdot x = y,

where

y

denotes measured complex visibilities as before and minimizing

f_{prior} : R^{N} \mapsto R

is intended to favor a regular image in agreement with our prior beliefs. Clearly the solution will depend on the type of priors implemented by

f_{prior}

and known as the regularization. For instance, taking

f_{prior} (x) = {‖ x ‖}_{2}

and

Ω = R^{N}

yields the dirty image.

Before discussing suitable choices for the regularization, there are, however, other issues to consider related to the equality constraint $y = H \cdot x$ in Eq. (36). First, visibilities may not be directly measured and the actual data $y$ may comprise nonlinear quantities, such as the power spectrum and the bispectrum (or the closure phase). A more general form, say, $H (x, ω)$ , must be introduced to model such nonlinear data (here $ω$ denotes any other parameters besides $x$ , like the pixel size $Δ θ$ , which impact the model). Second, real data are corrupted by noise, and the equality constraint should be replaced by something like $y \approx H (x, ω)$ . To be more specific, this can be expressed by the following direct model (also called forward model [37]) of the data:

y = H (x, ω) + ϵ,

where

ϵ

is a random perturbation term due to the noise. Because of this random perturbation, an exact fit of the measurements is not only unexpected, but undesirable (we do not want to fit the noise). In fact, any image should be considered as acceptable provided that the corresponding model data differ from the actual data by amounts consistent with the noise level. In order to judge quantitatively whether an image

x

is statistically consistent with the measurements, some numerical criterion, say,

f_{data} : R^{N} \mapsto R

, is needed. By convention, the smaller

f_{data} (x)

, the closer the model data are to the measurements, so

f_{data} (x)

can be thought of as a distance between the model and the data. In other words, and using the metric

f_{data} (x)

, an image should be assumed to be compatible with the data whenever

f_{data} (x)

is below some threshold, say,

f_{data} (x) \leq η .

According to this discrepancy criterion, the image reconstruction problem becomes

\min_{x \in Ω} f_{prior} (x) s.t. f_{data} (x) \leq η,

which formally expresses that we search for “the most regular image which is compatible with the observations and for which the strict constraints hold.” The usual way to solve the constrained problem (Eq. (39)) is to use the associated Lagrangian

L (x, α) = f_{prior} (x) + α f_{data} (x),

with

α \geq 0

the Lagrange multiplier related to the constraint

f_{data} (x) \leq η

. Technically, the multiplier

α

must be nonnegative, because the constraint is an inequality [38]. If

α = 0

, the constraint has no effects (it is said to be inactive) which, in our case, means that the data are completely ignored in determining the resulting image. This is only possible if the threshold

η

is sufficiently high that the inequality

f_{data} (x) \leq η

holds at the minimum of

f_{prior} (x)

. In other words, our priors are sufficient to determine an acceptable image regardless of the data. This is not generally the case, and we want to take the data into account (the constraint must be active), which implies that

α > 0

. It turns out that this also amounts to assuming that the constraint be

f_{data} (x) = η

at the solution [38].

To summarize, image reconstruction is recast as a constrained optimization problem, where the solution $\hat{x}$ is given by

\hat{x} = \underset{x \in Ω}{\arg \min} {L (x, α) = f_{prior} (x) + α f_{data} (x)},

with

α > 0

chosen so that

f_{data} (\hat{x}) = η

holds. Since

α > 0

, taking

μ = 1 / α

yields the following equivalent formulation:

\hat{x} = \underset{x \in Ω}{\arg \min} {f (x, μ) = f_{data} (x) + μ f_{prior} (x)},

where

μ > 0

is tuned so that the constraint holds. The solution

\hat{x}

depends on the feasible set

Ω

, on the penalties

f_{data}

and

f_{prior}

, and on the value of a so-called hyperparameter (here

η

,

α

or

μ

). The hyperparameter

μ > 0

(respectively,

α > 0

) can be seen as a relative weight which sets the compromise between fitting the data (and the noise) and obeying the priors.

Depending on the image reconstruction algorithm being considered, one of the equivalent problems [Eqs. (39), (41), or (42)] is solved and the hyperparameter (equivalently $η$ , $α$ , or $μ$ ) is explicitly required or automatically tuned by the method. Algorithms mostly depend on the choice for $f_{data}$ and $f_{prior}$ —the following section provides some hints to define these penalties.

D. Bayesian Inference

Another approach to derive a general expression for the solution of the image reconstruction problem is to consider that all information is expressed in terms of probabilities. In this Bayesian framework, the best image given the data is the maximum a posteriori (MAP) solution which has the maximum posterior probability given the data and, possibly, some supplementary parameters $ω$ :

{\hat{x}}_{MAP} = \underset{x}{\arg \max} \Pr (x | y, ω),

where

y

represents all the available data. The supplementary parameters

ω

represent anything other than the data that influence the expression of the probabilities (the covariance of the noise, etc.). Bayes’s rule gives two equivalent expressions for the joint probability of

x

and

y

(knowing

ω

):

\Pr (x, y, | ω) = \Pr (x | ω) \Pr (y | x, ω) = \Pr (y | ω) P r (x | y, ω),

from which it can be deduced that

{\hat{x}}_{MAP} = \underset{x}{\arg \max} \frac{\Pr (y | x, ω) \Pr (x | ω)}{\Pr (y | ω)},

where

\Pr (y | x, ω)

is the likelihood of the data,

\Pr (x | ω)

is the a priori distribution of

a

and the denominator

\Pr (y | ω)

is called the evidence. Discarding the denominator (which does not depend on the unknowns

x

) and taking the cologarithm of the probabilities (which converts the product into a sum and the maximum into a minimum) yields

{\hat{x}}_{MAP} = \underset{x \in Ω}{\arg \min} {- \log \Pr (y | x, ω) - \log \Pr (x | ω)},

where the feasible set

Ω = {x \in R^{N} | \Pr (x | ω) > 0}

is introduced so that taking the logarithm causes no problems. Minimizing the first term,

- \log \Pr (y | x, ω)

, amounts to maximizing the likelihood of the data and thus the agreement of the model with the data. This is the same objective as minimizing

f_{data} (x)

. Similarly, minimizing the second term,

- \log \Pr (x | ω)

, enforces the agreement with the priors: exactly the purpose of

f_{prior} (x)

. The analogy with the previous section is evident and may be formalized by taking in Eq. (42)

f_{data} (x) = - c_{0} \log \Pr (y | x, ω) + c_{1},

μ f_{prior} (x) = - c_{0} \log \Pr (x | ω) + c_{2},

where the factor

c_{0} > 0

and the additive constants

c_{1}

and

c_{2}

are irrelevant and can be chosen so as to simplify the resulting expressions. Clearly, the hyperparameter

μ

is part of

ω

. Bayesian inference can be invoked to justify the heuristic approach of the previous section but also to derive expressions for the terms of the criterion to optimize.

Without any priors, the solution would be

{\hat{x}}_{ML} = \underset{x \in Ω}{\arg \min} f_{data} (x) = \underset{x}{\arg \max} \Pr (y | x, ω),

which is nothing else but the maximum likelihood (ML) solution. In many estimation problems the ML may provide an estimator with excellent properties, but for solving an ill-posed or ill-conditioned problem as our image restoration problem, taking into account the priors is crucial, and

{\hat{x}}_{MAP}

will be superior to

{\hat{x}}_{ML}

[4–6,39].

4. LIKELIHOOD OF THE DATA

Following the Bayesian prescription, Eq. (46) states that the correct way to express $f_{data} (x)$ is to take the cologarithm of the likelihood of the data $y$ knowing the image parameters $x$ and perhaps some other parameters $ω$ . We recall here the direct model given in Eq. (37),

y = H (x, ω) + ϵ,

with

H (x, ω)

a parametric model and

ϵ

a stochastic term. If we were able to repeat the same observations (under the same conditions, with the same instrument and for the same object) many times, we could average these data so that the nuisance term

ϵ

vanishes in the averaged data. In other words, the deterministic part of the direct model is the expectation of the data, given the parameters

x

and

ω

:

E {y | x, ω}, = H (x, ω),

where

E

denotes expectation. From this it immediately follows that the noise is conditionally centered in the sense that

E {ϵ | x, ω}, = 0 .

From these equations, it is clear that the distribution of the data $y$ knowing $x$ and $ω$ is simply the distribution of the noise $ϵ$ (also knowing $x$ and $ω$ ) to which a given bias $H (x, ω)$ has been added. Note that these definitions and their consequences are very general: the model $H (x, ω)$ may account for any kind of data, and the noise $ϵ$ may have any distribution provided it is conditionally centered as stated by Eq. (50).

A. Gaussian Noise and Linear Model

Because the noise term $ϵ$ results from different contributions (detector noise, photon noise, etc.), its actual distribution may be quite complex. A flexible approximation which works well in practice is to assume that the noise, knowing $x$ and $ω$ , follows a centered Gaussian distribution. Then applying Eq. (46) with $c_{0} = 2$ (a usual choice) and $c_{1}$ chosen to discard all additive constants yields

f_{data} (x) = {(y - H (x, ω))}^{t} \cdot W \cdot (y - H (x, ω)),

where the weighting matrix

W = Cov {(ϵ | x, ω)}^{- 1}

is the inverse of the covariance of the noise. The above expression is the usual

χ^{2}

(chi-squared) of the data. There is a slight issue because, in interferometry, we may be dealing with complex valued data. Without loss of generality and since complex numbers are just pairs of real, complex valued vectors (such as

y

,

ϵ

, and

H (x, ω)

) can be flattened into ordinary real vectors (with doubled size) to use standard linear algebra notation and define the covariance matrix as

Cov (ϵ | x, ω) = E {ϵ \cdot ϵ^{t} | x, ω}

. This is what is assumed in Eq. (51).

When the complex visibilities are directly measured, the model given by Eq. (28) is linear and the likelihood $f_{data} (x)$ in Eq. (51) is quadratic (by construction) and convex because its Hessian, $\nabla^{2} f_{data} (x) = H^{t} \cdot W \cdot H$ , is positive semi-definite. Having $f_{data} (x)$ convex in $x$ is particularly suitable for optimization, because it generally guarantees that there exists a unique minimum, which can be easily found by means of the simplest methods, such as gradient-based optimization algorithms described in Section 6.A [38,40]. Conversely, nonconvex functions are more difficult to minimize, especially when they are multimodal, that is to say, when they have multiple minima. The simplest example of a quadratic convex likelihood is given by

f_{data} (x) = \sum_{k} w_{k} {| {(H \cdot x)}_{k} - V_{k} |}^{2},

where

w_{k} \geq 0

are statistical weights. The above expression, which has been largely used in radio astronomy [9,41,42,43], assumes that the data are independent and that the real and imaginary parts of a measured visibility, say,

V_{k}

, are independent and have the same variance equal to the reciprocal of

w_{k}

. The properties of such a distribution are reviewed in Goodman’s book [44].

Real data may, however, have different statistics. For instance, the OI-FITS [28] exchange file format for optical interferometric data assumes that the errors of complex data (complex visibilities or bispectra) along the complex vector and perpendicular to the complex vector are independent but not necessarily of the same variance (the standard deviations of the amplitude and the phase are provided by the OI-FITS format). It has, however, been empirically observed that the errors for bispectrum data have a banana-shaped distribution [39], which is compatible with assuming that the amplitude and phase of the bispectrum are independent. Meimon et al. [45] have proposed quadratic convex approximations of the colog likelihood of complex data independent in phase and amplitude and have shown that their so-called local approximation yields the best results, notably when dealing with low SNR data. This approximation amounts to the model assumed in OI-FITS [28]:

f_{data} (x) = \sum_{k} {\frac{Re {(ϵ_{k} e^{- i φ_{k}})}^{2}}{Var (ρ_{k})} + \frac{Im {(ϵ_{k} e^{- i φ_{k}})}^{2}}{ρ_{k}^{2} Var (φ_{k})}},

where

ρ_{k}

and

φ_{k}

are the amplitude and the phase of the

k

th complex datum, while

ϵ = y - H (x, ω)

denotes the complex residuals introduced in Eq. (37). Equation (52) with

w_{k} = 1 / Var (ρ_{k})

is retrieved when

Var (φ_{k}) = Var (ρ_{k}) / ρ_{k}^{2}

, which is likely to result owing to the measurement process [46].

B. Nonlinear Data

Dealing with measured complex visibilities is only the simplest case. In order to account for heterogeneous data consisting of the various interferometric observables previously described, the likelihood term may be written as

f_{data} (x) = f_{vis} (x) + f_{pow} (x) + f_{bisp} (x) + f_{clos} (x),

where the different terms are for measured complex visibilities, power spectra, bispectra, and closure phases, respectively. Such an expression assumes that the measurements taken into account by different terms are independent. The bispectra and closure phases, which are obviously not independent, should not be considered at the same time. A revised OI-FITS format has been proposed to provide, among other things, the correlations between the measurements [25] but, to our knowledge, this information is not yet used by existing image reconstruction algorithms.

For power spectrum data, most algorithms assume independent Gaussian measurements and take

f_{pow} (x) = \sum_{k} w_{k}^{pow} {(S_{k} - {| \tilde{I} (ν_{k}) |}^{2})}^{2},

where

S_{k}

is the power spectrum measured at frequency

ν_{k}

,

w_{k}^{pow} = 1 / Var (S_{k})

and

\tilde{I} (ν_{k}) = {(H \cdot x)}_{k}

is the model of the visibility. Note that there are certainly other distributions more appropriate than this one [44].

For bispectrum data, the OI-FITS data model would suggest approximating $f_{bisp} (x)$ following Eq. (53), but a common expression is

f_{bisp} (x) = \sum_{k_{1}, k_{2}} w_{k_{1}, k_{2}}^{bisp} {| B_{k_{1}, k_{2}} - \tilde{I} (ν_{k_{1}}) \tilde{I} (ν_{k_{2}}) {\tilde{I}}^{⋆} (ν_{k_{1}} + ν_{k_{2}}) |}^{2},

where

B_{k_{1}, k_{2}}

is the measured bispectrum for frequencies

ν_{k_{1}}

and

ν_{k_{2}}

. This expression is consistent if, as for Eq. (52), the real and imaginary parts of

B_{k_{1}, k_{2}}

are independent Gaussian variables with the same variance equal to the reciprocal of the weights

w_{k_{1}, k_{2}}^{bisp}

. However, the building-block [32] and image reconstruction software using the bispectrum (IRBis) [47] algorithms assume a different form of the weights, which attempts to compensate for the uneven sampling of the frequency plane.

For phase-only data, it is appropriate to assume a von Mises distribution [48] for the wrapped phase (see Appendix A). For closure phase data, defined in Eq. (21), this leads to

f_{clos} (x) = 2 \sum_{k_{1}, k_{2}} κ_{k_{1}, k_{2}}^{clos} [1 - \cos (β_{k_{1}, k 2} - φ (ν_{k_{1}}) - φ (ν_{k 2}) + φ (ν_{k_{1}} + ν_{k_{2}}))],

where

κ_{k_{1}, k_{2}}^{clos} > 0

can be computed from the angular variance

Var (β_{k_{1}, k_{2}})

of the closure phase and using Eq. (A5) derived in Appendix A. An expression similar to Eq. (57) can be used to fit the differential phases [49].

5. REGULARIZATIONS

While the Bayesian formalism rather straightforwardly provides a suitable definition of the likelihood term, the situation is not as clear for the regularization term, which enforces the priors. Indeed, in most realistic cases, the prior probability cannot be objectively inferred, and one has to take a more practical point of view (for instance the one introduced in Section 3.C). When selecting a specific regularization, one has to pay great attention to the following: (i) the ability of the priors to help interpolate the missing Fourier samples; and (ii) the type of bias that is imposed in the restored image. The default solution $x^{def}$ , which is retrieved when there are no available data (or when $μ \to \infty$ in Eq. (42)), may be used to determine the type of bias imposed by the regularization. The default image is given by

x^{def} \in \underset{x \in Ω}{\arg \min} f_{prior} (x),

being aware that the solution of this problem may not be unique.

Existing software exploit various regularizations and a review of their relative merits is available [14]. To avoid confusion when discussing the many possible regularizations, a simple classification is needed. First, a regularization function can be separable (i.e., the sum of univariate functions) or not. Nonseparable regularizations usually impose some sort of smoothness for the solution, while separable regularizations implement a white prior and do not impose smoothness (in the image domain). A second property to consider is whether the regularization is quadratic or not. Quadratic functions are easier to optimize, but nonquadratic ones may have better properties, such as suppression of ripples or a preference for sparse solutions.

A. Quadratic Regularizations

Although it does not impose positivity, the well-known Wiener filter yields the MAP solution given by Eq. (44) when assuming Gaussian distributions (for the prior and the likelihood) with a linear model, as in Eq. (28). Assuming a Gaussian prior leads to a quadratic prior whose general expression is

f_{prior} (x) = {‖ D \cdot x - a ‖}_{2}^{2},

with

D

a given linear operator transforming the space of image parameters to some other space and

a

a given element of this latter space. Quadratic regularizations are very popular because they are easy to optimize and are quite versatile for imposing various types of prior knowledge. For instance, in many image restoration problems, the smoothness of the solution is favored by taking

a = 0

and

D

a finite difference operator whose output collects the differences between neighboring pixels. Taking

D

equal to the identity and

a

equal to zero yields classical Tikhonov regularization [50].

Without the positivity constraint, Tikhonov regularization yields the dirty image of Eq. (34). For this reason, quadratic regularizations have long been considered as being badly adapted to interferometric image reconstruction [9]. The following simple quadratic regularization has, however, proven [51] to be very effective in this context:

f_{prior} (x) = \sum_{n} q_{n} x_{n}^{2} = {‖ x ‖}_{Q}^{2},

where the components of

q \in R_{+}^{N}

are nonnegative weights and

Q = diag (q)

is the diagonal matrix whose diagonal elements are given by

q

. This regularization amounts to taking

a = 0

and

D^{t} \cdot D = Q = diag (q)

in Eq. (59). As this regularization is separable, the default solution can be computed following Appendix B (and provided

q > 0

):

x^{def} = \underset{x \in Ω}{\arg \min} {‖ x ‖}_{Q}^{2} = \frac{ϱ c / q}{⟨ c, c / q ⟩},

where the division in

c / q

is performed component-wise. Hence, the shape of the default solution is simply given by

c / q

. This behavior may be exploited to impose a compactness prior by having

q_{n}

be an increasing function of

‖ θ_{n} ‖

, the angular distance of the

n

th pixel to the center of the field of view, and thus favor the concentration of the flux in the central part of the image [14,51]. Figure 3 shows that such a regularization is suitable for image reconstruction from optical interferometric data provided strict positivity is imposed. Comparing the power spectra displayed by this figure with the frequency coverage in Fig. 2, it is clear that the unmeasured frequencies have been set to zero in the unconstrained solution, while they are smoothly interpolated when the constraint is imposed. The reason for this ability is that imposing the positivity plays the role of a floating support constraint which results in a smoothing in the frequency domain. This example demonstrates how critical the positivity constraint is for interferometric image reconstruction. Note that the unconstrained solution in Fig. 3 is similar but not exactly like the dirty image in Fig. 2, because it was obtained from simulated power spectra and closure phases (not from complex visibilities).

Fig. 3. Importance of imposing the positivity. Top: reconstruction with the positivity constraint. Bottom: unconstrained reconstruction. Left: restored images. Right: power spectrum. All reconstructions were performed with MiRA on the simulated data of the “2004 Interferometric Beauty Contest” and with a compactness constraint such that the default solution has a fairly large FWHM of 50 mas.

Download Full Size | PDF

The Wiener filter is usually written as a separable filter in the frequency space; then it amounts to implementing a regularization on the power spectrum of the image of the form

f_{prior} (x) = \sum_{k} {\tilde{q}}_{k} {| {\tilde{x}}_{k} |}^{2},

where

\tilde{x} = F \cdot x

is the discrete Fourier transform (DFT) of

x

,

F

is the DFT operator, and

{\tilde{q}}_{k} \geq 0

are nonnegative spectral weights. Taking

D = diag ({\tilde{q}}^{1 / 2}) \cdot F

and

a = 0

, yields the general expression in Eq. (59). In the case of the Wiener filter,

{\tilde{q}}_{k}

is the reciprocal of the expected spectral density

E {{| {\tilde{x}}_{k} |}^{2}}

of the image, and such a regularization has been successfully used for blind or myopic deconvolution [52]. In the context of interferometry, taking

{\tilde{q}}_{k} = 0

below a chosen frequency and

{\tilde{q}}_{k} = 1

above that frequency has been used to loosely impose a cutoff frequency in the restored image [31]. Quadratic regularization of the power spectrum

{| {\tilde{x}}_{k} |}^{2}

has also been proposed [37], which results in a quartic penalty in

x

.

Without the strict constraints implemented by $Ω$ , the default solutions favored by a quadratic regularization form a subset,

x^{def} \in D^{†} \cdot a + \ker (D),

where

\ker (D)

and

D^{†}

are the null space and the pseudoinverse of

D

. Owing to the crucial role of the constraints implemented by

Ω

, notably the positivity, this default solution is merely of academic interest.

B. Improved Smoothness

Even though the assumption of smoothness for an extended object is generally justified, it seems a bit counterproductive to impose this as a strong prior when we worked so hard to measure the high frequencies of the object. Moreover, imposing smoothness via a quadratic regularization yields artifacts in the form of ripples around sharp structures, such as point-like sources or straight edges. To avoid these artifacts, the penalty implemented by the regularization must be less demanding than a quadratic cost for large differences between nearby pixels. This leads to the introduction of a nonquadratic smoothness regularization given by

f_{prior} (x) = \sum_{n} ζ (D_{n} \cdot x)

with

D_{n}

a linear operator such that

D_{n} \cdot x

approximates the spatial gradient

\nabla I (θ_{n})

of the image

x

around the positions

θ_{n}

of the grid of pixels, and

ζ : R^{2} \mapsto R

is a nonquadratic measure of the length of its argument. Usually

D_{n}

is implemented by means of finite differences. The actual effects of this family of regularizations is determined by the function

ζ

. Note that quadratic smoothness is imposed if

ζ (v) = ‖ v ‖_{2}^{2}

, where

‖ v ‖_{2}

denotes the usual Euclidean (

ℓ_{2}

) norm.

A very popular example of this family is total variation (TV) [53], which amounts to taking

ζ_{TV} (v) = ‖ v ‖_{2},

which is simply the Euclidean norm of

v = D_{n} \cdot x

(though not squared). Such a regularization promotes the sparsity of the spatial gradients of the reconstructed image (see Section 5.E), that is to say, an image where most gradients are zero. Using TV, therefore, favors piecewise flat images which produce undesirable cartoon-like artifacts (see top-left panel of Fig. 5).

Fig. 4. Reconstructions of LkH $α - 101$ with various regularizations. (a) Original object smoothed at the resolution of the interferometer ( $FWHM \sim 0.4 mas$ ); (b) BSMEM reconstruction; (c) MiRA reconstruction with MEM regularization as in Eq. (73); (d) MiRA reconstruction with a compactness quadratic regularization given by Eq. (60); (e) MiRA reconstruction with edge-preserving regularization implemented by Eq. (67) in Eq. (64); (f) Squeeze reconstruction with the $ℓ_{0}$ norm of the à trous wavelet coefficients.

Download Full Size | PDF

Fig. 5. Reconstructions of LkH $α - 101$ with various regularization parameters. All images were obtained from the “2004 Interferometric Beauty Contest” simulated data by MiRA algorithm with the same edge-preserving regularization as in Fig. 4 but for different values of the hyperparameters $μ$ and $τ$ . The images in the left panel have been obtained with a very small value of the threshold $τ$ to mimic the behavior of TV regularization.

Download Full Size | PDF

Edge-preserving smoothness [54] is able to preserve sharp structures while avoiding the cartoon effect of TV. It involves designing $ζ (v)$ to have the following asymptotic behavior:

{\begin{matrix} ζ (v) = O ({‖ v ‖}_{2}^{2}) & for {‖ v ‖}_{2} ≪ τ, \\ ζ (v) = O {(‖ v ‖}_{2}) & for {‖ v ‖}_{2} ≫ τ, \end{matrix}

where

τ > 0

is some chosen threshold which sets the transition between the quadratic (

ℓ_{2}

) and the linear (

ℓ_{1}

) behavior of

ζ (v)

, which is called an

ℓ_{1} - ℓ_{2}

norm of

v

. Note that using an

ℓ_{1} - ℓ_{2}

norm requires tuning of two hyperparameters:

τ

and

μ

. There are many possible ways to implement an

ℓ_{1}

–

ℓ_{2}

norm, for instance, the hyperbolic function,

ζ_{hyperbolic} (v) = τ \sqrt{‖ v ‖_{2}^{2} + τ^{2}} - τ^{2},

the Huber semi-norm,

ζ_{Huber} (v) = {\begin{matrix} (1 / 2) {‖ v ‖}_{2}^{2} & if ‖ v ‖_{2} \leq τ, \\ {‖ v ‖}_{2} τ - τ^{2} / 2 & otherwise, \end{matrix}

or the fair loss function [55] used in Wisard (weak-phase interferometric sample alternating reconstruction device) [56],

ζ_{fair} (v) = {‖ v ‖}_{2} τ - \log (1 + {‖ v ‖}_{2} / τ) τ^{2} .

C. Weighted $ℓ_{p}$ Norm

Generalizing the weighted quadratic norm of Eq. (60), simple nonquadratic regularizations can be implemented by means of $ℓ_{p}$ norms:

f_{prior} (x) = σ \sum_{n} q_{n} {| x_{n} |}^{p},

where

σ = \pm 1

and

q \in R_{+}^{N}

are given nonnegative weights. For this regularization to be strictly convex, the weights must all be strictly positive,

p > 0

and

p \neq 1

(taking

p = 0

or 1 is discussed in Section 5.E) with

σ = + 1

if

p > 1

and

σ = - 1

if

p < 1

. Under these conditions and provided

ϱ \neq 0

, the default solution is unique and applying Eq. (B5), it can be found to be

x_{n}^{def} = \frac{ϱ sign (c_{n}) {| c_{n} / q_{n} |}^{\frac{1}{p - 1}}}{\sum_{n^{'}} {| c_{n^{'}}^{p} / q_{n^{'}} |}^{\frac{1}{p - 1}}} (\forall n) .

Hence, $x^{def} \propto {(c / q)}^{1 / (p - 1)}$ , where the division and the exponentiation are done component-wise, which shows that the shape of the default image is driven by the weights $q$ . With $p = 2$ , the default image in Eq. (61) is retrieved. Taking $p = 1 / 2$ and $σ = - 1$ to ensure strict convexity corresponds to the square-root entropy [9] given in Eq. (75).

A notable feature of the regularization by the weighted $ℓ_{p}$ norm in Eq. (70) with $p \in (0, 1)$ (and $σ = - 1$ ) is that it acts as a strong barrier, preventing the parameters from approaching zero. This trick is a simple means to enforce positivity.

D. Maximum Entropy

From a strict Bayesian point of view, the logarithm of the prior probability is called the entropy, and the solution of the image restoration problem in Eq. (39) is then the image which maximizes the entropy while being compatible with the data. However, the name maximum entropy method (MEM) is usually restricted to specific forms of the regularization penalty called the negentropy because it is the opposite of the entropy and originally derived from information theory [57]. Frieden [58] was the first to apply MEMs for image restoration with the following regularization:

f_{prior} (x) = \sum_{n} x_{n} \log (x_{n} / r_{n}),

which assumes that strict normalization holds and where

r \in Ω

is a given prior image. Without the normalization constraint, the MEM regularization becomes [42]

f_{prior} (x) = \sum_{n} [r_{n} - x_{n} + x_{n} \log (x_{n} / r_{n})] .

MEMs for image reconstruction from sparse visibilities were reviewed some time ago [9] and alternative expressions for the negentropy are

f_{prior} (x) = - \sum_{n} r_{n} \log (x_{n}),

or

f_{prior} (x) = - \sum_{n} \sqrt{r_{n} x_{n}},

which have the property of imposing strict positivity of the image (provided the prior image is also strictly positive). In view of the results shown by Fig. 3 and obtained with a simple separable quadratic regularization, it is clear that strict positivity is beneficial for interpolating the voids in the frequency coverage.

All these MEM regularizations are separable, but a nonseparable variant has been proposed by Horne [59], who introduced a floating prior defined by $r = R \cdot x$ with $R$ a linear operator such that $r$ is a smoothed version of the image $x$ or perhaps a version of it with some symmetries imposed.

The separable MEM regularizations are all strictly convex and, like the $ℓ_{p}$ norm with $p < 1$ , impose the condition that the sought image be strictly positive everywhere ( $x > 0$ ). Using the results of Appendix B, it can be shown that the prior image is also the default solution: $x^{def} = r$ .

E. Sparsity Promoting Priors

Seeking the simplest image can be seen as seeking the image which is explained by the fewest number of parameters. This idea has led to the Clean algorithm [8], which attempts to fit the interferometric data with the fewest point-like sources, possibly with an extended smooth component which is added after recovering the point sources, and perhaps under a support constraint. Using the formalism of Section 3.C, the Clean approach to recover point-like sources could be approximated by

\min_{x \in Ω} {‖ x ‖}_{0} s.t. f_{data} (x) \leq η,

where the pseudo

ℓ_{0}

norm

{‖ x ‖}_{0}

simply equals the number of nonzero pixels in the image

x

. Minimizing the

ℓ_{0}

norm favors sparse solutions, where many parameters are exactly equal to zero. Finding the global optimum of Eq. (76) is, however, exceptionally difficult in terms of computational effort. For instance, to find a solution with

P

nonzero parameters out of

N

requires trying each possible choice, that is,

(\begin{matrix} N \\ P \end{matrix}) = \frac{N!}{P! (N - P)!}

possibilities. For a

N = 32 \times 32

pixel image with

P = 10

nonzero pixels yields more than

3 \times 10^{23}

possibilities! Fortunately, many works, both theoretical and practical, have shown that replacing the

ℓ_{0}

norm by the

ℓ_{1}

one is almost as effective at selecting a sparse solution which is, in many cases, a good approximation to the sparsest solution compatible with the data [60]. The enormous advantage of using the

ℓ_{1}

norm is that it leads to convex optimization problems, and, even though they are nonsmooth, there are many algorithms able to solve them exactly. These properties of the

ℓ_{1}

norm are the cornerstone of the development of so-called compressed sensing methods [11].

The sparsity principle can be extended to favor other structures besides point-like sources. To that end, it is assumed that the pixel values of the image are given by $x = B \cdot z$ , where $z$ are some other parameters and the columns of $B$ form a dictionary of structures such that a few of them ought to describe the object of interest. Then the problem to solve is

\min_{z} ‖ z ‖_{1} s.t. f_{data} (B \cdot z) \leq η and B \cdot z \in Ω,

which can be seen as an instance of the least absolute shrinkage and selection operator (Lasso) algorithm [61]. The dictionary

B

can be built from a basis of wavelet functions [12,29], which has proven effective for multiscale structures. Depending on the type of object, the result obtained while imposing sparsity of the wavelet coefficients can be impressive. Figure 4 shows such an example: the image reconstructed by Squeeze is almost identical to the true object shown in Fig. 2, whereas other methods fail to reproduce the fine-scale structures accurately.

Note that if $x \in Ω$ , then $x \geq 0$ and ${‖ x ‖}_{1} = \sum_{n} | x_{n} | = \sum_{n} x_{n} = ϱ$ , which is a constant; thus the least ${‖ x ‖}_{1}$ is not a useful regularization in this context. Wisard implements a so-called white $ℓ_{1} - ℓ_{2}$ regularization, which can be seen as a variant of the $ℓ_{p}$ -norm regularizations.

F. Choosing the Regularization

We have seen that there are a variety of possible regularization terms $f_{prior} (x)$ that can be used. A comparison of some of the more popular ones is presented in Fig. 4.

In most practical cases, it is necessary to enforce positivity of the reconstructed image, and to choose the weight of the regularization in order to strike the correct balance between enforcing simplicity and overfitting the data. The weight can be treated as a nuisance parameter in the Bayesian formalism, as is done by image reconstruction bispectrum MEM (BSMEM), which tunes $μ$ so that $f_{data} (x)$ takes its expected value, corresponding to unit reduced $χ^{2}$ . However, the optimal $μ$ value can often be determined with sufficient accuracy by visual inspection of the reconstructed images. Figure 5 clearly shows the effects of overregularization (which oversimplifies the image) and underregularization (which yields more artifacts).

A more sophisticated alternative to visual inspection for selecting the optimum value for $μ$ is the so-called “L-curve” method. This involves computing the solution for, say, 10 values of $μ$ , and plotting $f_{prior} (x)$ against $f_{data} (x)$ . This relation should exhibit an “L-shaped” curve [62]. For regions along the curve, where $f_{data}$ changes rapidly compared with $f_{prior}$ , the reconstructed image is overregularized. However, if the opposite is observed, the solution is underregularized. The elbow of the L-curve gives the optimum value of $μ$ .

On the other hand, the choice of regularization function itself is a more subjective issue. If little is known about the object to be reconstructed, we recommend using several different regularizations in order to compare their effects on the solution, and hence ensure that scientific conclusions drawn from the images are robust.

In certain situations, prior knowledge about the object can be encoded in the choice of regularization. For example, astrophysical considerations tell us that a stellar disc (unless very cool) will have a sharp edge; hence, there is an objective justification for preferring regularizations—such as edge-preserving smoothness [54]—that favor sharp edges over those that do not (e.g., MEM).

6. IMAGE RECONSTRUCTION ALGORITHMS

We are now equipped to describe the image reconstruction methods that have been successful when applied to realistic optical interferometric data and which are sufficiently mature to be used with real data. In addition to coping with sparse Fourier data, these methods were specifically designed to tackle the nonlinear direct model of the data, to account for the particular noise statistics [45], and to handle the OI-FITS data format [28]. In spite of this bias toward optical wavelengths, a method which can deal with measured complex visibilities would be perfectly suitable to process radio interferometry data.

All the methods presented can be understood as an instance of an algorithm to find a solution to one of the equivalent optimization problems introduced in Section 3.C. These methods differ in the type of data which they take into account, in the expression of $f_{data} (x)$ , which measures the discrepancy between the data and the model visibilities and in the assumed priors implemented via the regularization function $f_{prior} (x)$ . The number of possibilities explains why there are so many different algorithms. The many optimization strategies exploited to solve the inverse problem also contribute to the diversity of the algorithms. As not all strategies are equivalent, before describing the image reconstruction algorithms, we give some guidelines for understanding the benefits or the shortcomings of the different optimization methods. A summary of the methods is given in Table 1.

Table 1. Summary of the Algorithms for Image Reconstruction from Optical Interferometric Data Described in Section 6

View Table

A. Optimization Strategies

We have shown that image reconstruction amounts to solving a constrained optimization problem, such as

\min_{x \in Ω} {f (x) = f_{data} (x) + μ f_{prior} (x)} .

Typically, four kinds of strategies are used: gradient, augmented Lagrangian, greedy, and stochastic methods. Owing to the complexity of the problem, no closed-form solution exists, and all these methods are iterative.

Gradient-based optimization algorithms update the variables in a way that is inspired by Newton’s method, which requires that the objective function $f (x)$ be at least twice continuously differentiable, hence smooth enough. As there are too many variables to manipulate the whole Hessian of the objective function, nonlinear conjugate gradients or limited memory variants of variable metric methods have to be used [38]. Such methods require the computation of the objective function $f (x)$ and its gradient $\nabla f (x)$ . Gradient-based methods can be modified to account for simple bound constraints (to implement the positivity) by means of active sets. These methods yield a reduction of the objective function at each iteration, and their convergence is usually fast. When applied to a multimodal objective function, they are, however, only able to find a local optimum that is better than the initial solution. In the context of optical interferometry imaging, using nonlinear observables, such as the power spectrum or the closure phase, yields a multimodal objective function. A local optimum found by a gradient-based optimization method may, however, provide an acceptable solution. When only the power spectrum is available, image restoration implies phase retrieval. Projection methods such as the Gerchberg–Saxton algorithm [63] have been used to solve phase-retrieval problems, but not in the context of stellar interferometry, where there is the added issue of the sparse frequency coverage. Moreover, it has been shown that the Gerchberg–Saxton algorithm is outperformed by gradient-based methods [64].

When $f (x)$ is convex but not smooth, the optimization can be carried out by an augmented Lagrangian method like the alternating direction method of multipliers (ADMM) [65], which exploits a variable splitting strategy to separate the complex optimization problem into subproblems which are easier to solve, sometimes even exactly. ADMM also provides a flexible way to impose the strict constraints implemented by $Ω$ . ADMM has been successfully used in a number of algorithms for interferometric imaging [36,49,66,67], but it requires as many control parameters as there are splittings and constraints. These must be treated as additional hyperparameters needed to tune the algorithm.

The well-known Clean method [8,10] is an example of a greedy algorithm. Such algorithms, also called matching-pursuit methods [68], are intended to solve a data-fitting problem under a sparsity constraint (see Section 5.E) expressed by the $ℓ_{0}$ norm of the variables, as in Eq. (77). Greedy methods proceed by adding (or removing) a single variable [i.e., a pixel of $x$ or a coefficient of $z$ if a dictionary is used as in Eq. (77)] to the current solution in order to improve the agreement with the data. This strategy may be efficient when a very small number of nonzero variables are sufficient to fit the data, but they are nevertheless only able to find a local solution which improves on the starting solution. Note that the efficiency of a greedy method strongly depends on the criterion for selecting which variable to change.

The objective of stochastic methods is to find a close approximation of the global minimum of the objective function $f (x)$ . Their name comes from the fact that they perform a random walk to explore the objective function. During this exploration, a new candidate ${\underline{x}}^{[t + 1]}$ is generated by a random perturbation of the current estimate $x^{[t]}$ , and the new candidate is either kept or rejected, depending on how much it improves the solution compared to $x^{[t]}$ . If there is an improvement, that is, if $f ({\underline{x}}^{[t + 1]}) \leq f (x^{[t]})$ , the new candidate is kept; otherwise, the algorithm decides randomly whether to keep ${\underline{x}}^{[t + 1]}$ with a probability decreasing with the value of $f ({\underline{x}}^{[t + 1]}) - f (x^{[t]}) \geq 0$ . If the new candidate is kept, then the next estimate is $x^{[t + 1]} = {\underline{x}}^{[t + 1]}$ ; otherwise, it is rejected and $x^{[t + 1]} = x^{[t]}$ . By repeating this procedure, a Monte Carlo Markov chain (MCMC) is built [69,70]. In the case of simulated annealing, the probability of keeping a bad estimate is slowly reduced so that the Markov chain cools down to the global optimum of the objective function. Taking $f (x) = - \log \Pr (x | y, ω)$ and without the cooling, MCMC is a means to sample the posterior probability $\Pr (x | y, ω)$ . Then straightforward sample statistics can be applied to the elements of the Markov chain to get the mode (that is the best solution found), the posterior mean, or other moments. Stochastic methods sound very appealing in the context of optical interferometry because $f (x)$ is necessarily multimodal with nonlinear observables. Another attractive feature is the possibility of obtaining not only an image but also error bars on the pixel values. Finally, as only the value of $f (x)$ is required, nondifferentiable and nonconvex objective functions, such as those based on the $ℓ_{0}$ -norm, may be considered. However, compared to local optimization methods, stochastic methods take much more time to converge to a solution and a fair amount of experience is necessary to choose the parameters of a stochastic method correctly.

B. Algorithms for Gray-Image Reconstruction

We first start by reviewing algorithms tailored for recovering a monochromatic image. Assuming a gray object (i.e., one whose appearance does not change with wavelength) or processing the spectral channels independently, these methods can be used on multispectral data.

(1) The BSMEM algorithm [71,72] makes use of MEM (Section 5.D) to regularize the problem of image restoration from the measured bispectrum (hence its name). The improved BSMEM version [72] uses the Gull and Skilling entropy [see Eq. (73)], and a likelihood term with respect to the complex bispectrum, which assumes independent Gaussian noise statistics for the amplitude and phase of the measured bispectrum. The optimization engine is MemSys, which implements the strategy proposed by Skilling and Bryan [42] to solve the problem in Eq. (39) and automatically finds the most likely value for the associated Lagrange multiplier. Because it makes no attempt to convert the data into complex visibilities, a strength of BSMEM is that it can handle any type of data sparsity (such as missing closure phases). Recently, BSMEM has been updated to be usable from the unified graphical user interface (GUI) developed in the framework of Opticon [73]. BSMEM contains proprietary software, but is available for academic use subject to a no-fee license agreement (http://www.mrao.cam.ac.uk/research/optical-interferometry/bsmem-software/). The GUI is intended to be usable with any software which implements a specified common command line interface [74], and is available at the Jean-Marie Mariotti Center (JMMC, http://www.jmmc.fr/oimaging).
(2) The Building Block Method [32] is similar to the Clean method but designed for reconstructing images from bispectrum data obtained by means of speckle or long-baseline interferometry. The method proceeds iteratively to reduce a cost function $f_{data}^{bisp} (x)$ equal to that in Eq. (56) with weights set to a constant or to an expression motivated by Wiener filtering. The minimization of the penalty is achieved by a matching pursuit algorithm which imposes sparsity of the solution. The image is given by the building block model in Eq. (29) and, at the $t$ th iteration, the new image $I^{[t]} (θ)$ is obtained by adding a new building block at location $θ^{[t]}$ with a weight $α^{[t]}$ to the previous image, so as to maintain the normalization: $I^{[t]} (θ) = (1 - α^{[t]}) I^{[t - 1]} (θ) + α^{[t]} h (θ - θ^{[t]}) .$
The weight and location of the new building block is derived by minimizing the criterion $f_{data}^{bisp}$ with respect to these parameters. Strict positivity and support constraints can be trivially enforced by limiting the possible values for $α^{[t]}$ and $θ^{[t]}$ . To avoid superresolution artifacts, the final image is convolved with a smoothing function, with size set according to the spatial resolution of the instrument.
(3) The IRBis algorithm [47] is an evolution of the Building Block Method. It performs regularized image reconstruction by fitting bispectrum data while imposing positivity by means of the active set algorithm and conjugate gradient (ASA_CG) [75] method, which is a nonlinear conjugate gradient method with active set. This means that the result depends on the settings but also on the initial image. IRBIS implements various regularization methods: simple Tikhonov [see Eq. (59)], maximum entropy [see Eq. (73)], and edge-preserving smoothness [see Eq. (64)] with the hyperbolic semi-norm given in Eq. (67). The bispectrum data considered by IRBIS are synthesized from the power spectrum and closure phase data of an OI-FITS file, which provide the synthetic modulus and the phase of the bispectrum, respectively. The bispectra with one of the frequencies set to zero are also included so that there is no loss of information.
(4) The Macim algorithm [76], for MArkov Chain IMager, aims at maximizing the posterior probability: $\Pr (x | y, ω) \propto \exp (- \frac{1}{2} f_{data} (x) - \frac{μ}{2} f_{prior} (x)) .$
Macim implements MEM regularization and a specific darkness prior via a regularizer which favors large regions of dark space in between bright regions. For this latter regularization, $f_{prior} (x)$ is the sum of all pixels with zero flux on either side of their boundaries. Macim attempts to maximize $\Pr (x | y, ω)$ by a simulated annealing algorithm with Gibbs sampling [69], which can in principle solve the global optimization problem of maximizing $\Pr (x | y, ω)$ , but convergence can be very slow, especially for objects comprising multiple components.
(5) The multi-telescope image reconstruction algorithm (MiRA) [77] defines the sought image as the minimum of the penalty function in Eq. (42). Minimization is done by variable metric limited memory with bounds constraints (VMLM-B), a limited variable memory quasi-Newton method (based on Broyden–Fletcher–Goldfarb–Shanno (BFGS) updates) with bound constraints for the positivity [78]. MiRA is written in a modular way closely following the expression of the likelihood in Eq. (54): any type of data can be taken into account by providing a function that computes the corresponding penalty and its gradient. For the moment, MiRA handles complex visibility, power spectrum, and closure-phase data via penalty terms given by Eqs. (53), (55), and (57). Like BSMEM, MiRA can cope with any missing data; in particular, it can be used to restore an image given only the power spectrum (i.e., without any Fourier phase information) with at least a 180° orientation ambiguity. An implementation of MiRA in Yorick [79] is freely available (https://github.com/emmt/mira), which can be used from the command line or from the Yorick interpreter.
(6) The Wisard algorithm [56] recovers an image from power spectrum and closure phase data. It exploits a self-calibration approach [80,81] to recover missing Fourier phases. Given a current estimate of the image and the closure phase data, Wisard first derives missing Fourier phase information in such a way as to minimize the number of unknowns. Then, the synthesized Fourier phases are combined with the square root of the measured power spectrum to generate pseudocomplex visibility data which are fitted by the image restoration step. This step is performed by using the chosen regularization and a quadratic penalty with respect to the pseudocomplex visibility data [see Eq. (53)]. This approach gives a unique solution for the image restoration step, although overall, the global problem remains multimodal. Wisard has been implemented in interactive data language (IDL) and can also be used with GNU data language (GDL) (http://gnudatalanguage.sourceforge.net/); it was originally implemented at Office national d’études et de recherches aérospatiales (ONERA) and is now maintained by the JMMC from where it is freely available (http://www.jmmc.fr/wisard_page.htm).

MiRA and Wisard have been developed in parallel and share some common features. They use the same optimization engine [78] and means to impose positivity and normalization [51]. However, they differ in the way missing data are taken into account: Wisard explicitly solves for missing Fourier phase information, while MiRA implicitly accounts for any lack of information through the direct model of the data [51]. Both implement many different regularizers (negentropy, quadratic or edge-preserving smoothness, compactness, TV, etc.).

C. Multispectral Methods

All current interferometers provide multispectral data and recovering a multispectral image could be done in a naïve way with one of the previous algorithms by processing each spectral channel independently. Jointly processing the available data at all wavelengths is, however, much more powerful for several reasons. First, it reduces the voids in the spatial frequency coverage, since the measured frequencies are wavelength dependent [see Eq. (9)]. Second, it offers the opportunity to exploit the regularity of the specific brightness distribution of the object along the spectral dimension, which can significantly improve the quality of the restored image [66,82]. Using a monochromatic image reconstruction algorithm and an image prior which is identical for the different wavelengths, it is possible to retrieve a multispectral image of the object [83], but exploiting the full potential of multispectral imaging requires the development of specific algorithms such as the ones described below.

(7) The Squeeze algorithm [84] was developed recently by Fabien Baron and John Monnier at the University of Michigan, with the collaboration of Brian Kloppenborg from the University of Denver. Squeeze has a very comprehensive set of features, including both MCMC [69,70] (as in Macim) and gradient-based optimization engines, and the ability to combine geometric model-fitting with model-independent imaging. Thanks to its powerful optimization strategies, Squeeze can deal with the most demanding regularizations, such as those based on a truly $ℓ_{0}$ norm. As a result, Squeeze has a vast choice of priors: sparsity via the $ℓ_{0}$ or $ℓ_{1}$ norms either separable or applied on wavelet coefficients (various wavelet transforms are available), TV, maximum entropy, darkness, etc. Squeeze is one of the few algorithms capable of multispectral imaging. Novel capabilities such as imaging on spheroids are also being implemented. A C implementation of Squeeze is freely available (https://github.com/fabienbaron/Squeeze) and is usable from the command line. The superiority of some Squeeze results (see, e.g., Fig. 4) over other methods suggests that $ℓ_{0}$ sparsity priors can be very appropriate even though they require a global optimization strategy.
(8) The Sparco algorithm [30] is a semi-parametric approach that has been proposed by Jacques Kluska to deal with multispectral data. The fundamental idea is to assume that the astronomical object comprises several components whose specific brightness distributions are separable functions of the angular direction and the wavelength, say, $I_{c} (θ, λ) = F_{c} (λ) I_{c} (θ)$ , for the $c$ th component and for $I_{c} (θ)$ normalized. Then the model visibility at wavelength $λ$ and projected baseline $b$ is $V (ν, λ) = \frac{\sum_{c} F_{c} (λ) V_{c} (ν)}{\sum_{c} F_{c} (λ)},$ with $ν = b / λ$ the frequency and $V_{c} (ν) = {\tilde{I}}_{c} (ν)$ the Fourier transform of $I_{c} (θ)$ . It is further assumed that the spectral energy distributions (SEDs) $F_{c} (λ)$ follow simple power laws (which is justified for young stellar objects in the near infrared). In the case where there are two components, an unresolved star and some extended environment, the model simplifies to $V (ν, λ) = \frac{1 + {(λ / λ_{0})}^{p} ξ V_{env} (ν)}{1 + {(λ / λ_{0})}^{p} ξ},$ where $ξ = F_{env} (λ_{0}) / F_{star} (λ_{0})$ is the ratio of the SED of the environment and of the star at some reference wavelength $λ_{0}$ , $p$ is the difference between the spectral indexes of these two SEDs, and $V_{env} (ν)$ is the normalized visibility of the environment. This model adds just two parameters ( $ξ$ and $p$ ) to a monochromatic image reconstruction method (the sought image being $I_{env} (θ)$ the normalized brightness distribution of the environment), and Sparco has been implemented in MiRA and in Squeeze.
(9) The Painter algorithm [49] aims to recover the specific brightness distribution of the object, that is, a 3D multispectral image which is the solution of an optimization problem like the one in Eq. (42), except that the regularization is replaced by the following two terms: $μ f_{prior} (x) = μ_{angl} f_{angl} (D_{angl} \cdot x) + μ_{sptrl} f_{sptrl} (D_{sptrl} \cdot x) .$ $D_{angl}$ are $D_{sptrl}$ are finite difference operators along the angular and spectral dimensions, respectively. The spatial regularization is implemented by the function $f_{angl}$ and the operator $D_{angl}$ , while the spectral regularization is implemented by the function $f_{sptrl}$ and the operator $D_{sptrl}$ . The hyperparameters $μ_{angl} \geq 0$ and $μ_{sptrl} \geq 0$ set the relative importance of the regularity of the image along its spatial and angular dimensions. A variety of functions, $f_{angl}$ and $D_{angl}$ , are available to impose quadratic, edge-preserving or TV smoothness. Painter solves the problem by an ADMM [65]. A Julia [85] implementation of Painter is freely available (https://github.com/andferrari/PAINTER.jl).
(10) The MiRA-3D algorithm [67] by Ferréol Soulez is designed for multispectral image reconstruction from optical interferometric data. The 3D multispectral image is the solution of the problem given in Eq. (42), where a joint spatiospectral prior is imposed: $f_{prior} (x) = \sum_{n, ℓ} ζ (D_{n, ℓ} \cdot x),$ which is similar to the one in Eq. (64), except that indexes $n$ and $ℓ$ , respectively, run along the spatial and spectral dimensions of the multispectral image $x$ and that $D_{n, ℓ} \cdot x$ is the 3D (spatiospectral) gradient of $x$ at the position of the spaxel $(θ_{n}, λ_{ℓ})$ . Operator $D_{n, ℓ}$ expands as $D_{n, ℓ} = (\begin{matrix} D_{n, ℓ}^{angl} \\ α D_{n, ℓ}^{sptrl} \end{matrix}),$ where $D_{n, ℓ}^{angl}$ and $D_{n, ℓ}^{sptrl}$ are finite difference operators along the spatial and angular dimensions, respectively, and the hyperparameter $α$ is tuned to adjust the relative importance of the spatial and spectral gradients. Compared to the separable regularization of Painter, the advantage of the joint spatiospectral regularization proposed by MiRA-3D is that, with $ζ$ given by Eq. (67) or Eq. (65), it favors the synchronization of strong spectral and spatial variations. This is consistent with an astronomical object made of distinctive components with different spectra. Another original possibility offered by MiRA-3D is to reconstruct a temperature map of the object by assuming the SED of each pixel is given by Planck’s law for a black body [86].

7. DISCUSSION

We have presented the methods for image reconstruction from interferometric data, with particular attention to the specific issues pertaining at optical wavelengths. The existing algorithms can be understood in a common framework where the inverse problem of image reconstruction amounts to solving a constrained optimization problem. The objective function to minimize has two terms which reflect a balance between fitting the data (via a likelihood term) and enforcing the priors (via a regularization term). This balance may be adjustable via a so-called hyperparameter. The constraints are the normalization and the positivity of the image; the latter behaves like a floating support, which is very important in helping to interpolate missing frequencies.

A range of existing algorithms for image reconstruction from optical interferometry data is described. Most of them have been the challengers for the successive “Interferometric Imaging Beauty Contests” [22,87,88–92], which have demonstrated that there is not a single best algorithm and that the quality of the result also depends on the user’s ability to select the proper settings. Understanding the principles of image reconstruction is therefore mandatory for choosing a particular algorithm and its parameters. Part of the future success of optical interferometers will depend on the ability of astronomers to develop the skills needed to use the image reconstruction tools correctly. In this regard, the present paper may be useful.

Image reconstruction for interferometry is still a vivid domain of research and development. New algorithms are being developed for multispectral imaging. Efforts are being devoted to make existing algorithms more user friendly [73]. Global optimization by means of stochastic methods seems a very promising approach [12] to avoid underoptimal results and to implement nonconvex sparsity constraints.

APPENDIX A: PENALTY FOR ANGULAR DATA

We consider deriving a suitable form of the likelihood penalty for angular data, like the closure phase given in Eq. (21). Consistency with the likelihood penalties for other kinds of data [see, for instance, Eq. (51)] dictates that the phase penalty be given by

f (ϕ_{\mod}) = c - 2 \log [\Pr (ϕ | ϕ_{\mod})],

where

\Pr (ϕ | ϕ_{\mod})

is the probability density function of the measured phase

ϕ

conditioned to the knowledge of the model phase

ϕ_{\mod}

and

c

is a constant which can be chosen so that

f (ϕ) = 0

. There is, however, no consensus on the form of the distribution

\Pr (ϕ | ϕ_{\mod})

, and image reconstruction algorithms may assume different expressions for the penalty

f (ϕ_{\mod})

and hence different phase distribution.

As the considered phase is the argument of a complex number, it is only defined modulo $2 π$ . To account for this, Haniff [93] proposed defining the phase penalty as

f_{Haniff} (ϕ_{\mod}) = arc {(ϕ - ϕ_{\mod})}^{2} / σ_{ϕ}^{2},

with

σ_{ϕ}^{2}

some estimation of the variance of the measured phase and

arc : R \mapsto (- π, + π]

, a function which wraps its argument in the interval

(- π, + π]

. The above expression amounts to approximating the distribution

\Pr (ϕ | ϕ_{\mod})

of the wrapped phase by a truncated Gaussian.

It has been empirically found [49,77] that a distance based on the phasors has a better behavior with respect to numerical optimization than the penalty proposed by Haniff. This leads to the following penalty:

f_{phasor} (ϕ_{\mod}) = κ {| e^{i ϕ_{\mod}} - e^{i ϕ} |}^{2} = 2 κ [1 - \cos (ϕ - ϕ_{\mod})],

with

κ \geq 0

a weight which has to be determined. Imposing in Eq. (A1) that

f (ϕ_{\mod}) = f_{phasor} (ϕ_{\mod})

(up to an additive constant) leads to

\Pr (ϕ | ϕ_{\mod}) \propto e^{κ \cos (ϕ - ϕ_{\mod})} .

The factor can be computed so as to normalize the distribution $\Pr (ϕ | ϕ_{\mod})$ on an interval of width $2 π$ , which yields

\Pr (ϕ | ϕ_{\mod}) = \frac{e^{κ \cos (ϕ - ϕ_{\mod})}}{2 π I_{0} (κ)},

with

I_{n}

the modified Bessel function of the first kind and order

n

. The above distribution is known as the von Mises distribution [48]. In order to derive the value of the parameter

κ

, the following phase variance can be empirically estimated from the data, and its expression can be computed, assuming von Mises distribution:

σ_{ϕ}^{2} = E ({{| e}^{i ϕ_{\mod}} - e^{i ϕ} |}^{2}) = 2 \int_{- π}^{+ π} [1 - \cos (ϕ - ϕ_{\mod})] \Pr (ϕ | ϕ_{\mod}) d ϕ = 2 - 2 I_{1} (κ) / I_{0} (κ) .

This equation gives an implicit definition of the parameter $κ$ in the penalty defined in Eq. (A3). For a small angular variance (or equivalently a large value of $κ$ ), the following approximation holds:

κ \approx 1 / σ_{ϕ}^{2} .

In addition, and in the limits of small differences between the data and the model phases, the two expressions in Eqs. (A2) and (A3) are equivalent and

f (ϕ_{\mod}) \approx {(ϕ - ϕ_{\mod})}^{2} / σ_{ϕ}^{2} .

Note that the angular variance $σ_{ϕ}^{2}$ defined in Eq. (A5) is exactly twice the value of the so-called circular variance of the wrapped phase [48].

APPENDIX B: SEPARABLE REGULARIZATIONS

Separable regularizations take the form

f_{prior} (x) = \sum_{n} f_{n} (x_{n}),

where

{{f}_{n} : R \mapsto R}_{n = 1, \dots, N}

forms a family of functions. We assume that these functions are strictly convex.

With such a regularization function, the default solution is

x^{def} = \underset{x \in Ω}{\arg \min} f_{prior} (x) = \underset{x \in Ω}{\arg \min} \sum_{n} f_{n} (x_{n});

when the feasible set

Ω

is defined as in Eq. (35), the Lagrangian of this constrained problem is

L (x; α) = f_{prior} (x) - α ⟨ c, x ⟩,

with

α \in R

the multiplier for the normalization constraint. Taking into account the nonnegativity, the Karush–Kuhn–Tucker (KKT) necessary conditions of optimality for a feasible solution are [38]

{\begin{matrix} \forall n, either & \partial_{x_{n}} L (x; α) = 0 and x_{n} \geq 0, \\ or & \partial_{x_{n}} L (x; α) > 0 and x_{n} = 0; \\ ⟨ c, x ⟩ = ϱ . \end{matrix}

As $\partial_{x_{n}} L (; α) = f_{n}^{'} (x_{n}) - α c_{n}$ and if all $f_{n}^{'}$ are continuous and bijective (which holds for the $f_{n}$ are strictly convex and therefore their derivatives are strictly monotonous) there is a unique default solution:

x_{n}^{+} (α) = \max {0, {(f_{n}^{'})}^{- 1} (α c_{n})} (\forall n) .

It remains to find $α$ so that $⟨ c, x^{+} (α) ⟩ = ϱ$ . At least for the regularizations considered in this paper, this turns out to be a simple task. For instance, in most cases, we found that

x_{n}^{+} (α^{'}) = χ (α^{'} / α) x_{n}^{+} (α)

for some function

χ : R_{+} \mapsto R_{+}

. In other words, changing

α

only changes the normalization, not the shape, of

x^{+} (α)

. Then the default solution is simply

x^{def} = \frac{ρ}{⟨ c, x^{+} (α) ⟩} x^{+} (α),

computed for any

α > 0

.

Funding

Seventh Framework Programme (FP7) Ideas: (IDEAS-ERC) (FP7/2013-2016, 312430).

Acknowledgment

Éric Thiébaut wishes to thank Michel Tallon for fruitful discussions about modeling the physical behavior of an interferometer. Acknowledgements are also due to the two anonymous reviewers for their numerous comments and suggestions, which helped us to improve the paper and fix some weaknesses.

REFERENCES

1. J. Monnier, “Optical interferometry in astronomy,” Rep. Prog. Phys. 66, 789–857 (2003). [CrossRef]

2. A. Quirrenbach, “The development of astronomical interferometry,” Exp. Astron. 26, 49–63 (2009). [CrossRef]

3. A. Tarantola and B. Valette, “Inverse problems = quest for information,” J. Geophys. 50, 159–170 (1982).

4. D. M. Titterington, “General structure of regularization procedures in image reconstruction,” Astron. Astrophys. 144, 381–387 (1985).

5. A. Tarantola, Inverse Problem Theory (Elsevier, 1987).

6. G. Demoment, “Image reconstruction and restoration: overview of common estimation structures and problems,” IEEE Trans. Acoust. Speech Signal Process. 37, 2024–2036 (1989). [CrossRef]

7. É. Thiébaut and J.-F. Giovannelli, “Image reconstruction in optical interferometry,” IEEE Signal Process. Mag. 27(1), 97–109 (2010). [CrossRef]

8. J. A. Högbom, “Aperture synthesis with a non-regular distribution of interferometer baselines,” Astron. Astrophys. Suppl. 15, 417–426 (1974).

9. R. Narayan and R. Nityananda, “Maximum entropy image restoration in astronomy,” Annu. Rev. Astron. Astrophys. 24, 127–170 (1986). [CrossRef]

10. A. R. Thompson, J. M. Moran, and G. W. Swenson Jr., Interferometry and Synthesis in Radio Astronomy, 3rd ed. (Springer, 2017).

11. E. Candes and M. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008). [CrossRef]

12. F. Baron, “Image reconstruction in optical interferometry: an up-to-date overview,” in Astrophysics and Space Science Library, H. M. J. Boffin, G. Hussain, J.-P. Berger, and L. Schmidtobreick, eds. (2016), Vol. 439, p. 75.

13. É. Thiébaut, “Image reconstruction with optical interferometers,” New Astron. Rev. 53, 312–328 (2009). [CrossRef]

14. S. Renard, É. Thiébaut, and F. Malbet, “Image reconstruction in optical interferometry: benchmarking the regularization,” Astron. Astrophys. 533, A64 (2011). [CrossRef]

15. C. Froehly, “Coherence and interferometry through optical fibers,” in Scientific Importance of High Angular Resolution at Infrared and Optical Wavelengths, M. H. Ulrich and K. Kjaer, eds. (1981), pp. 285–293.

16. S. B. Shaklan and F. Roddier, “Single-mode fiber optics in a long-baseline interferometer,” Appl. Opt. 26, 2159–2163 (1987). [CrossRef]

17. V. C. du Foresto, S. Ridgway, and J.-M. Mariotti, “Deriving object visibilities from interferograms obtained with a fiber stellar interferometer,” Astron. Astrophys. Suppl. Ser. 121, 379–392 (1997). [CrossRef]

18. F. Roddier, J. M. Gilli, and J. Vernin, “On the isoplanatic patch size in stellar speckle interferometry,” J. Opt. 13, 63–70 (1982). [CrossRef]

19. F. Delplancke, F. Derie, F. Paresce, A. Glindemann, F. Lévy, S. Lévêque, and S. Ménardi, “PRIMA for the VLTI—science,” Astrophys. Space Sci. 286, 99–104 (2003). [CrossRef]

20. E. Tatulli, F. Millour, A. Chelli, G. Duvert, B. Acke, O. Hernandez Utrera, K.-H. Hofmann, S. Kraus, F. Malbet, P. Mège, R. G. Petrov, M. Vannier, G. Zins, P. Antonelli, U. Beckmann, Y. Bresson, M. Dugué, S. Gennari, L. Glück, P. Kern, S. Lagarde, E. Le Coarer, F. Lisi, K. Perraut, P. Puget, F. Rantakyrö, S. Robbe-Dubois, A. Roussel, G. Weigelt, M. Accardo, K. Agabi, E. Altariba, B. Arezki, E. Aristidi, C. Baffa, J. Behrend, T. Blöcker, S. Bonhomme, S. Busoni, F. Cassaing, J.-M. Clausse, J. Colin, C. Connot, A. Delboulbé, A. Domiciano de Souza, T. Driebe, P. Feautrier, D. Ferruzzi, T. Forveille, E. Fossat, R. Foy, D. Fraix-Burnet, A. Gallardo, E. Giani, C. Gil, A. Glentzlin, M. Heiden, M. Heininger, D. Kamm, M. Kiekebusch, D. Le Contel, J.-M. Le Contel, T. Lesourd, B. Lopez, M. Lopez, Y. Magnard, A. Marconi, G. Mars, G. Martinot-Lagarde, P. Mathias, J.-L. Monin, D. Mouillet, D. Mourard, E. Nussbaum, K. Ohnaka, J. Pacheco, C. Perrier, Y. Rabbia, S. Rebattu, F. Reynaud, A. Richichi, A. Robini, M. Sacchettini, D. Schertl, M. Schöller, W. Solscheid, A. Spang, P. Stee, P. Stefanini, M. Tallon, I. Tallon-Bosc, D. Tasso, L. Testi, F. Vakili, O. von der Lühe, J.-C. Valtier, and N. Ventura, “Interferometric data reduction with AMBER/VLTI. principle, estimators, and illustration,” Astron. Astrophys. 464, 29–42 (2007). [CrossRef]

21. J. A. Gordon and D. F. Buscher, “Detection noise bias and variance in the power spectrum and bispectrum in optical interferometry,” Astron. Astrophys. 541, A46 (2012). [CrossRef]

22. P. R. Lawson, W. D. Cotton, C. A. Hummel, J. D. Monnier, M. Zhao, J. S. Young, H. Thorsteinsson, S. C. Meimon, L. M. Mugnier, G. le Besnerais, É. M. Thiébaut, and P. G. Tuthill, “An interferometry imaging beauty contest,” Proc. SPIE 5491, 886–899 (2004). [CrossRef]

23. A. W. Lohmann, G. Weigelt, and B. Wirnitzer, “Speckle masking in astronomy: triple correlation theory and applications,” Appl. Opt. 22, 4028–4037 (1983). [CrossRef]

24. R. G. Petrov and AMBER Consortium, “Introducing the near infrared VLTI instrument AMBER to its users,” Astrophys. Space Sci. 286, 57–67 (2003). [CrossRef]

25. G. Duvert, J. Young, and C. Hummel, “Oifits 2: the 2nd version of the data exchange standard for optical (visible/IR) interferometry,” Astron. Astrophys. 597, A8 (2017). [CrossRef]

26. J. C. Dainty and A. H. Greenaway, “Estimation of spatial power spectra in speckle interferometry,” J. Opt. Soc. Am. 69, 786–790 (1979). [CrossRef]

27. B. Wirnitzer, “Bispectral analysis at low light levels and astronomical speckle masking,” J. Opt. Soc. Am. A 2, 14–21 (1985). [CrossRef]

28. T. A. Pauls, J. S. Young, W. D. Cotton, and J. D. Monnier, “A data exchange standard for optical (visible/IR) interferometry,” Publ. Astron. Soc. Pac. 117, 1255–1262 (2005). [CrossRef]

29. J.-L. Starck, A. Bijaoui, B. Lopez, and C. Perrier, “Image reconstruction by the wavelet transform applied to aperture synthesis,” Astron. Astrophys. 283, 349–360 (1994).

30. J. Kluska, F. Malbet, J.-P. Berger, F. Baron, B. Lazareff, J.-B. Le Bouquin, J. D. Monnier, F. Soulez, and É. Thiébaut, “SPARCO: a semi-parametric approach for image reconstruction of chromatic objects,” Astron. Astrophys. 564, A80 (2014). [CrossRef]

31. A. Lannes, E. Anterrieu, and P. Maréchal, “CLEAN and WIPE,” Astron. Astrophys. Suppl. 123, 183–198 (1997). [CrossRef]

32. K.-H. Hofmann and G. Weigelt, “Iterative image reconstruction from the bispectrum,” Astron. Astrophys. 278, 328–339 (1993).

33. J. A. Fessler and B. P. Sutton, “Nonuniform fast Fourier transforms using min-max interpolation,” IEEE Trans. Signal Process. 51, 560–574 (2003). [CrossRef]

34. J. Keiner, S. Kunis, and D. Potts, “Using NFFT3–a software library for various nonequispaced fast Fourier transforms,” ACM Trans. Math. Softw. 36, 1–30 (2009). [CrossRef]

35. T. Cornwell and R. Braun, “Deconvolution,” in Synthesis Imaging in Radio Astronomy, R. A. Perley, F. R. Schwab, and A. H. Bridle, eds., Vol. 6 of Astronomical Society of the Pacific Conference Series (Astronomical Society of the Pacific, 1989), pp. 167–183.

36. J.-F. Giovannelli and A. Coulais, “Positive deconvolution for superimposed extended source and point sources,” Astron. Astrophys. 439, 401–412 (2005). [CrossRef]

37. D. R. Gerwe, J. Dolne, P. N. Crabtree, R. B. Holmes, and B. Calef, “Image reconstruction from sparse irregular intensity interferometry measurements of Fourier magnitude,” Technical Report, DTIC Document (2013).

38. J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. (Springer-Verlag, 2006).

39. É. Thiébaut, “Principles of image reconstruction in interferometry,” Euro. Astron. Soc. Publ. Ser. 59, 157–187 (2013).

40. S. Boyd and L. Vandenberghe, Convex Optimization, 7th ed. (Cambridge University, 2009).

41. J. Ables, “Maximum entropy spectral analysis,” Astron. Astrophys. Suppl. 15, 383–393 (1974).

42. J. Skilling and R. K. Bryan, “Maximum entropy image reconstruction: general algorithm,” Mon. Not. R. Astron. Soc. 211, 111–124 (1984). [CrossRef]

43. T. J. Cornwell and K. F. Evans, “A simple maximum entropy deconvolution algorithm,” Astron. Astrophys. 143, 77–83 (1985).

44. J. W. Goodman, Statistical Optics, 2nd ed. (Wiley, 2015).

45. S. Meimon, L. M. Mugnier, and G. le Besnerais, “Convex approximation to the likelihood criterion for aperture synthesis imaging,” J. Opt. Soc. Am. A 22, 2348–2356 (2005). [CrossRef]

46. F. Soulez, É. Thiebaut, M. Tallon, I. Tallon-Bosc, and P. Garcia, “Optimal a posteriori fringe tracking in optical interferometry,” Proc. SPIE 9146, 91462Y (2014). [CrossRef]

47. K.-H. Hofmann, G. Weigelt, and D. Schertl, “An image reconstruction method (IRBis) for optical/infrared interferometry,” Astron. Astrophys. 565, A48 (2014). [CrossRef]

48. K. Mardia, “Statistics of directional data,” J. R. Stat. Soc. Ser. B 37, 349–393 (1975).

49. A. Schutz, A. Ferrari, D. Mary, F. Soulez, É. Thiébaut, and M. Vannier, “PAINTER: a spatiospectral image reconstruction algorithm for optical interferometry,” J. Opt. Soc. Am. A 31, 2334–2345 (2014). [CrossRef]

50. A. N. Tikhonov and V. Y. Arsenin, Solution of Ill-Posed Problems, Scripta Series in Mathematics (Winston & Sons, 1977).

51. G. le Besnerais, S. Lacour, L. M. Mugnier, É. Thiébaut, G. Perrin, and S. Meimon, “Advanced imaging methods for long-baseline optical interferometry,” IEEE J. Sel. Top. Signal Process. 2, 767–780 (2008). [CrossRef]

52. J.-M. Conan, L. M. Mugnier, T. Fusco, V. Michau, and G. Rousset, “Myopic deconvolution of adaptive optics images by use of object and point-spread function power spectra,” Appl. Opt. 37, 4614–4622 (1998). [CrossRef]

53. L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60, 259–268 (1992). [CrossRef]

54. P. Charbonnier, L. Blanc-Féraud, G. Aubert, and M. Barlaud, “Deterministic edge-preserving regularization in computed imaging,” IEEE Trans. Image Process. 6, 298–311 (1997). [CrossRef]

55. W. J. J. Rey, Introduction to Robust and Quasi-Robust Statistical Methods (Springer, 1983).

56. S. Meimon, L. M. Mugnier, and G. le Besnerais, “Reconstruction method for weak-phase optical interferometry,” Opt. Lett. 30, 1809–1811 (2005). [CrossRef]

57. E. T. Jaynes, “Prior probabilities,” IEEE Trans. Syst. Sci. Cybern. 4, 227–241 (1968). [CrossRef]

58. B. R. Frieden, “Restoring with maximum likelihood and maximum entropy,” J. Opt. Soc. Am. 62, 511–518 (1972). [CrossRef]

59. K. Horne, “Images of accretion discs–I: the eclipse mapping method,” Mon. Not. R. Astron. Soc. 213, 129–141 (1985). [CrossRef]

60. D. Donoho, “For most large underdetermined systems of linear equations, the minimal ell-1 norm near-solution approximates the sparsest near-solution,” Commun. Pure Appl. Math. 59, 907–934 (2006). [CrossRef]

61. R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc. 58, 267–288 (1996).

62. P. C. Hansen, “Analysis of discrete ill-posed problems by means of the L-curve,” SIAM Rev. 34, 561–580 (1992). [CrossRef]

63. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972).

64. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21, 2758–2769 (1982). [CrossRef]

65. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn. 3, 1–122 (2010). [CrossRef]

66. É. Thiébaut, F. Soulez, and L. Denis, “Exploiting spatial sparsity for multi-wavelength imaging in optical interferometry,” J. Opt. Soc. Am. A 30, 160–170 (2013). [CrossRef]

67. F. Soulez and É. Thiébaut, “An image reconstruction framework for polychromatic interferometry,” in Improving the Performances of Current Optical Interferometers & Future Designs, L. Arnold, H. Le Coroller, and J. Surdej, eds. (Proceedings of Haute Provence Observatory Colloquium, 2013).

68. S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process. 41, 3397–3415 (1993). [CrossRef]

69. S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6, 721–741 (1984). [CrossRef]

70. D. J. C. MacKay, “Introduction to Monte Carlo methods,” in Learning in Graphical Models, M. I. Jordan, ed., NATO Science Series (Kluwer Academic, 1998), pp. 175–204.

71. D. F. Buscher, “Direct maximum-entropy image reconstruction from the bispectrum,” in Very High Angular Resolution Imaging, J. G. Robertson and W. J. Tango, eds., Vol. 158 of the Proceedings of the International Astronomical Union (Kluwer, 1994), pp. 91–93.

72. F. Baron and J. S. Young, “Image reconstruction at Cambridge University,” Proc. SPIE 7013, 70133X (2008). [CrossRef]

73. J. Young, É. Thiébaut, G. Duvert, M. Vannier, P. Garcia, and G. Mella, “User-friendly imaging algorithms for interferometry,” Proc. SPIE 9907, 99073N (2016). [CrossRef]

74. É. Thiébaut, J. Young, J. Léger, M. Tallon, G. Duvert, G. Mella, L. Bourges, and M. Vannier, “Interface to image reconstruction,” OPTICON Report (2017). https://github.com/emmt/OI-Imaging-JRA/blob/master/doc/interface/OI-Interface.pdf.

75. W. W. Hager and H. Zhang, “A new active set algorithm for box constrained optimization,” SIAM J. Optim. 17, 526–557 (2006). [CrossRef]

76. M. Ireland, J. Monnier, and N. Thureau, “Monte-Carlo imaging for optical interferometry,” Proc. SPIE 6268, 62681T1 (2008). [CrossRef]

77. É. Thiébaut, “MiRA: an effective imaging algorithm for optical interferometry,” Proc. SPIE 7013, 70131I1 (2008). [CrossRef]

78. É. Thiébaut, “Optimization issues in blind deconvolution algorithms,” Proc. SPIE 4847, 174–183 (2002). [CrossRef]

79. D. H. Munro, “Using the Yorick interpreted language,” Comput. Phys. 9, 609–615 (1995). [CrossRef]

80. A. Readhead and P. Wilkinson, “The mapping of compact radio sources from VLBI data,” Astrophys. J. 223, 25–36 (1978). [CrossRef]

81. T. J. Cornwell and P. N. Wilkinson, “A new method for making maps with unstable radio interferometers,” Mon. Not. R. Astron. Soc. 196, 1067–1086 (1981). [CrossRef]

82. S. Bongard, F. Soulez, É. Thiébaut, and E. Pécontal, “3-D deconvolution of hyper-spectral astronomical data,” Mon. Not. R. Astron. Soc. 418, 258–270 (2011). [CrossRef]

83. J.-B. le Bouquin, S. Lacour, S. Renard, E. Thiébaut, and A. Merand, “Pre-maximum spectro-imaging of the Mira star T Lep with AMBER/VLTI,” Astron. Astrophys. 496, L1–L4 (2009). [CrossRef]

84. F. Baron, J. D. Monnier, and B. Kloppenborg, “A novel image reconstruction software for optical/infrared interferometry,” Proc. SPIE 7734, 77342I (2010). [CrossRef]

85. J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, “Julia: a fresh approach to numerical computing,” SIAM Rev. 59, 65–98 (2017). [CrossRef]

86. F. Soulez, É. Thiébaut, J. Kluska, M. Tallon, A. Schutz, and A. Ferrari, “Direct temperature map estimation in optical long baseline interferometry,” Proc. SPIE 9907, 99070Z (2016). [CrossRef]

87. P. R. Lawson, W. D. Cotton, C. A. Hummel, F. Baron, J. S. Young, S. Kraus, K.-H. Hofmann, G. P. Weigelt, M. Ireland, J. D. Monnier, É. Thiébaut, S. Rengaswamy, and O. Chesneau, “The 2006 interferometry image beauty contest,” in “Advances in stellar interferometry,” Proc. SPIE 6268, 62681U (2006). [CrossRef]

88. W. Cotton, J. Monnier, F. Baron, K.-H. Hofmann, S. Kraus, G. Weigelt, S. Rengaswamy, É. Thiébaut, P. Lawson, W. Jaffe, C. Hummel, T. Pauls, S. Henrique, P. Tuthill, and J. Young, “2008 imaging beauty contest,” Proc. SPIE 7013, 70131N1 (2008). [CrossRef]

89. F. Malbet, W. Cotton, G. Duvert, P. Lawson, A. Chiavassa, J. Young, F. Baron, D. Buscher, S. Rengaswamy, B. Kloppenborg, M. Vannier, and L. Mugnier, “The 2010 interferometric imaging beauty contest,” Proc. SPIE 7734, 77342N (2010). [CrossRef]

90. F. Baron, W. D. Cotton, P. R. Lawson, S. T. Ridgway, A. Aarnio, J. D. Monnier, K.-H. Hofmann, D. Schertl, G. Weigelt, É. Thiébaut, F. Soulez, D. Mary, F. Millour, M. Vannier, J. Young, N. M. Elias, H. R. Schmitt, and S. Rengaswamy, “The 2012 interferometric imaging beauty contest,” Proc. SPIE 8445, 84451E (2012). [CrossRef]

91. J. D. Monnier, J.-P. Berger, J.-B. Le Bouquin, P. G. Tuthill, M. Wittkowski, R. Grellmann, A. Müller, S. Renganswany, C. Hummel, K.-H. Hofmann, D. Schertl, G. Weigelt, J. Young, D. Buscher, J. Sanchez-Bermudez, A. Alberdi, R. Schoedel, R. Köhler, F. Soulez, É. Thiébaut, J. Kluska, F. Malbet, G. Duvert, S. Kraus, B. K. Kloppenborg, F. Baron, W.-J. de Wit, T. Rivinius, and A. Merand, “The 2014 interferometric imaging beauty contest,” Proc. SPIE 9146, 91461Q (2014). [CrossRef]

92. J. Sanchez-Bermudez, É. Thiébaut, K.-H. Hofmann, M. Heininger, D. Schertl, G. Weigelt, F. Millour, A. Schutz, A. Ferrari, M. Vannier, D. Mary, and J. Young, “The 2016 interferometric imaging beauty contest,” Proc. SPIE 9907, 99071D (2016). [CrossRef]

93. C. Haniff, “Least-squares Fourier phase estimation from the modulo 2π bispectrum phase,” J. Opt. Soc. Am. A 8, 134–140 (1991). [CrossRef]

Name	Authors	Optimization	Regularization	Multispectral
BSMEM	Baron, Buscher, Young	Trust region gradient	MEM-prior	No
Building block method	Hofmann, Weigelt	Matching pursuit	Sparsity	No
IRBis	Hofmann, Weigelt	ASA_CG	Many	No
Macim	Ireland, Monnier	Simulated annealing	MEM, darkness	No
MiRA	Thiébaut	VMLM-B	Many	No
Wisard	Meimon, Mugnier, Le Besnerais	VMLM-B plus self-calibration	Many	No
Squeeze	Baron, Monnier, Kloppenborg	Parallel tempering	Many	Yes
Sparco	Kluska et al.	VMLM-B	Many	Yes
Painter	Schutz et al.	ADMM	Many	Yes
MiRA-3D	Soulez et al.	ADMM	Many	Yes

Principles of image reconstruction in optical interferometry: tutorial

Abstract

1. INTRODUCTION

2. SIMPLE INTERFEROMETER

A. Observed Signal

B. Coverage of the Frequency Plane and Index Notation

C. Interferometric Observables

D. Complex Visibilities and Normalization

3. IMAGING FROM SPARSE FOURIER DATA

A. Image and Complex Visibility Models

B. Obtaining a Dirty Image

C. Improving the Reconstruction

D. Bayesian Inference

4. LIKELIHOOD OF THE DATA

A. Gaussian Noise and Linear Model

B. Nonlinear Data

5. REGULARIZATIONS

A. Quadratic Regularizations

B. Improved Smoothness

C. Weighted $ℓ_{p}$ Norm

D. Maximum Entropy

E. Sparsity Promoting Priors

F. Choosing the Regularization

6. IMAGE RECONSTRUCTION ALGORITHMS

A. Optimization Strategies

B. Algorithms for Gray-Image Reconstruction

C. Multispectral Methods

7. DISCUSSION

APPENDIX A: PENALTY FOR ANGULAR DATA

APPENDIX B: SEPARABLE REGULARIZATIONS

Funding

Acknowledgment

REFERENCES

Cited By

Figures (5)

Tables (1)

Equations (106)

Journal of the Optical Society of America A

Abstract

1. INTRODUCTION

2. SIMPLE INTERFEROMETER

A. Observed Signal

B. Coverage of the Frequency Plane and Index Notation

C. Interferometric Observables

D. Complex Visibilities and Normalization

3. IMAGING FROM SPARSE FOURIER DATA

A. Image and Complex Visibility Models

B. Obtaining a Dirty Image

C. Improving the Reconstruction

D. Bayesian Inference

4. LIKELIHOOD OF THE DATA

A. Gaussian Noise and Linear Model

B. Nonlinear Data

5. REGULARIZATIONS

A. Quadratic Regularizations

B. Improved Smoothness

C. Weighted ℓp Norm

D. Maximum Entropy

E. Sparsity Promoting Priors

F. Choosing the Regularization

6. IMAGE RECONSTRUCTION ALGORITHMS

A. Optimization Strategies

B. Algorithms for Gray-Image Reconstruction

C. Multispectral Methods

7. DISCUSSION

APPENDIX A: PENALTY FOR ANGULAR DATA

APPENDIX B: SEPARABLE REGULARIZATIONS

Funding

Acknowledgment

REFERENCES

Cited By

Figures (5)

Tables (1)

Equations (106)

Journal of the Optical Society of America A

C. Weighted $ℓ_{p}$ Norm