Generalized optimization framework for pixel super-resolution imaging in digital holography

Yunhui Gao; Liangcai Cao

doi:10.1364/OE.434449

1. Introduction

Since its invention by Gabor in the 1940s [1], holography remains an active research area in many fields and is getting actively involved in emerging topics like metasurfaces [2] and optical vortices [3]. Among all the holography-related technologies, holographic imaging has drawn great attention from the optics and signal processing communities in the past decades. Based on the optical configuration, holographic imaging techniques can be broadly categorized into two types, namely the off-axis holography and the in-line holography. Compared to the off-axis counterpart, the in-line setup is preferred for its simple and compact configuration, stability against the environment, low cost, and a potentially higher information throughput. To this day, it has found a wide range of applications in X-ray crystallography [4], biomedical imaging [5], nanoparticle detection [6], and fluid mechanics [7]. While promising, in-line digital holography techniques are faced with their own challenges.

First, there is the phase problem. Unlike off-axis holography which records the complex wavefield by interference with a known reference beam, in-line holograms are formulated by direct exposure to the diffracted field. At optical wavelength, the electromagnetic field oscillates at such high frequencies that imaging sensors can only respond to the intensity. Thus, the phase information is lost during the exposure, which is illustrated in Fig. 1. Without the phase, numerical reconstruction suffers from the so-called twin image artefact, which severely degrades the quality of the holographic image. Techniques to recover the missing phase from intensity-only measurements are referred to as phase retrieval [8]. Considering the inherent ill-posedness of the phase retrieval problem, different measurement schemes have been explored that introduce diversity into the measurement data, such as using multiple wavelengths [9–11], illumination angles [12,13], or phase masks [14,15], or capturing the diffracted field at various axial distances [16] or with a shifting probe [17]. By varying the parameters of the physical model, redundant information is recorded during the measurement, which provides recovery guarantees under proper conditions [18].

Fig. 1. Diagram of the measurement and reconstruction process for in-line digital holography.

Download Full Size | PDF

Another issue that arises is the aliasing effect due to the insufficient sampling of the pixelated sensor. This typically occurs when imaging systems have a high numerical aperture and a low magnification, such as the lensless on-chip microscopes [19]. For such systems, the resolution is limited by the sampling frequency of the sensor pixels rather than the diffraction limit given by Abbe’s criterion. As shown in Fig. 1, the wavefield incident onto the same pixel is converted into a single signal and therefore cannot be resolved. To overcome the sampling limitation, pixel super-resolution (PSR) techniques have been proposed, which aim to recover high-frequency details of the wavefield at subpixel scale. Early works tackled this problem by introducing subpixel displacements during the measurement process [20,21]. Later it has been recognized that the phase problem and the aliasing problem are in fact closely related and can be solved simultaneously using the diversity measurement schemes, which is very much similar to the classical (non-PSR) phase retrieval problem [22–25].

On the algorithmic side, reconstruction algorithms for classical phase retrieval have been extensively studied. Among them, two classes of algorithms, namely the iterative projection algorithms and the gradient descent algorithms, turned out to be very popular. The iterative projection algorithms solve the phase retrieval problem by introducing projections and related operators [26]. Examples include the error-reduction (ER) algorithm [27], difference map (DM) [28], and the relaxed averaged alternating reflections (RAAR) [29] algorithm. The gradient descent algorithms aim to minimize an objective function that favors certain solutions. Since the seminal works by Candès et al. [30], Wirtinger gradient descent algorithms have been widely adopted for holographic reconstruction [31–33]. In sharp contrast to the classical phase retrieval algorithms, PSR phase retrieval algorithms remain largely unexplored. Many previous studies were focused on designing better physical models. But the corresponding reconstruction algorithms often lie on a heuristic basis. As a result, their numerical performances, such as their convergence behavior and optimality conditions, are thus difficult to analyze.

In this work, we aim to present a generalized algorithmic framework for solving pixel-super-resolved phase retrieval problems. In particular, we derive the projection operator and the Wirtinger gradient, which provides the basic building blocks of the iterative projection algorithms and the gradient descent algorithms, respectively. As an example, the Wirtinger gradient descent algorithm for PSR phase retrieval, termed Wirtinger-PSR, is studied in detail. The algorithm is generally applicable, regardless of the physical settings.

This paper is organized as follows. In Section 2, we provide a detailed description of the sampling process and a general forward model for PSR holographic imaging. In Section 3, we derive the projection operator for iterative projection algorithms and the Wirtinger derivatives for the gradient descent algorithms in the context of PSR phase retrieval. Section 4 presents the numerical and experimental results using our proposed Wirtinger-PSR algorithm, which is followed by a brief summary in Section 5.

2. Problem formulation

2.1 Notations

In the following sections, scalars and scalar functions are denoted by italic letters. Vectors and matrices are denoted by boldface lowercase letters and uppercase letters, respectively. Bold numbers 0 and 1 denote the zero vector and all-ones vector respectively, while identity matrix is represented by I. The shape of 0, 1, and I should be determined according to the context. Cursive letters represent sets. The symbol $|\cdot| {\; _p}$ stands for p-norm. ${({\; \cdot \; } )^\textrm{T}}$, $\overline {({\; \cdot \; } )} $, and ${({\; \cdot \; } )^\ast }$ are the transpose, conjugate, and conjugate transpose operators, respectively. The hat symbol $\widehat {\; \cdot \; }$ denotes the Fourier transform. When applied to vectors, $|{\; \cdot \; } |$, ${|{\; \cdot \; } |^2}$, $\sqrt {\; \cdot \; } $, and ${\cdot} \; /\; \cdot $ should be interpreted as element-wise operators, and ${\odot} $ denotes the element-wise (Hadamard) multiplication operator for vectors. $\textrm{diag}({\; \cdot \; } )$ operator puts the entries of a vector into the diagonal of a matrix.

2.2 Modeling and analysis of the sampling process

As stated above, for lensless imaging systems, pixel size is a key limiting factor for spatial resolution. Within the time of exposure, each pixel converts the incoming photons into electron charges. Thus, the output value for each pixel is determined by the incident light energy. We assume that the relationship is linear. Otherwise, an additional calibration procedure is required before reconstruction. Consider the geometry shown in Fig. 2, where we have an ${M_1} \times {M_2}$ pixel array sensor, whose pixel size is denoted by Δ, and the output image is a matrix denoted as ${\mathbf Y} = ({{y_{ij}}} )\in {{{\mathbb R}}^{{M_1} \times {M_2}}}$. For an incident wavefield with intensity denoted by $I({\mathbf r},t)$, the output value of the pixel indexed by $(i,j)$ in the image Y is given by

(1)$${y_{ij}} \propto \int_{{\cal T}} {\int_{{{{\cal Q}}_{ij}}} {I({{\mathbf r},t} )\textrm{d}{\mathbf r}} \textrm{ d}t} \,,$$

where ${\mathbf r} = ({{r_1},{r_2}} )$ denotes the two-dimensional spatial coordinate, and t denotes time. The integral is taken over the exposure time duration $t \in {{\cal T}}$, and the active area of the pixel $(i,j)$ defined as ${{{\cal Q}}_{ij}} = \{{({{r_1},{r_2}} ){\mid }(i - 1/2)\mathrm{\Delta } \le {r_1} \le (i + 1/2)\mathrm{\Delta },(j - 1/2)\mathrm{\Delta } \le {r_2} \le (j + 1/2)\mathrm{\Delta }} \}$. If the field distribution is further assumed to remain static during the measurement, we can drop the time integral and simplify Eq. (1) as

(2)$${y_{ij}} = \int_{{{{\cal Q}}_{ij}}} {I({\mathbf r} )\textrm{d}{\mathbf r}} \,.$$

Fig. 2. Geometry of the sampling model. An example with M₁= M₂= 4 and σ = 3.

Download Full Size | PDF

Here, with a slight abuse of notation, we use ${y_{ij}}$ to denote the normalized output values, and thus the symbol ${\propto}$ is replaced by an equal sign. The normalization step is unnecessary in practice since it only introduces a constant factor to the entire field.

To quantitatively study the resolution limit, we consider the effect of different frequency components of the incident complex wavefield upon the output value of the sensor pixels. The complex amplitude of the incident wavefield $U({\mathbf r})$ is related to the intensity by

(3)$$I({\mathbf r} )= {|{U({\mathbf r} )} |^2} = U({\mathbf r} )\overline {U({\mathbf r} )} \;.$$

We follow the formulation in [19], where the sampling of the pixels is expressed as

(4)$$Y({\mathbf r} )= Y({{r_1},{r_2}} )= ({I \ast W} )({{r_1},{r_2}} )\sum\limits_{i,j} {\delta ({{r_1} - i\Delta ,{r_2} - j\Delta } )} \;,$$

where $Y({\mathbf r})$ denotes the sampled intensity, * represents the convolution operator, δ denotes the Dirac delta function, and $W({\mathbf r})$ is the rectangular window function which is expressed as $W({{r_1},{r_2}} )= \textrm{rect}({{r_1}/\mathrm{\Delta }} )\textrm{rect}({{r_2}/\mathrm{\Delta }} )$. Therefore, the whole process can be interpreted as first computing the squared modulus of the complex wavefield, then convolving the intensity with a rectangular window function, and finally sampling the result with a period of Δ along each dimension. One could easily verify that this formulation is equivalent to Eq. (2) if we take ${y_{ij}} = Y(i\mathrm{\Delta },j\mathrm{\Delta })$.

Viewing the above process in the Fourier domain offers interesting insights into the problem. Squared modulus in the spatial domain is equivalent to autocorrelation in the Fourier domain:

(5)$$\hat{I}({\mathbf \nu } )= \hat{U}({\mathbf \nu } )\ast \overline {\hat{U}({ - {\mathbf \nu }} )} \;,$$

where ${\mathbf v} = ({{v_1},{v_2}} )$ is the spatial frequency coordinate. The convolution between $I({\mathbf r})$ and $W({\mathbf r})$ corresponds to the multiplication between $\hat{I}({\mathbf v})$ and $\hat{W}({\mathbf v}) = sinc({\mathrm{\Delta }{\nu_1}} )sinc({\mathrm{\Delta }{\nu_2}} )$. Finally, sampling in the spatial domain results in convolving $\hat{I}({\mathbf v})\hat{W}({\mathbf v})$ with a comb function. Therefore, the entire process can be expressed as

(6)$$\hat{Y}({\mathbf \nu } )= \hat{Y}({{\nu_1},{\nu_2}} )= \hat{I}({{\nu_1},{\nu_2}} )\hat{W}({{\nu_1},{\nu_2}} )\ast \sum\limits_{i,j} {\delta \left( {{\nu_1} - \frac{i}{\Delta },{\nu_2} - \frac{j}{\Delta }} \right)} \;.$$

The above formulation is illustrated in Fig. 3 for the one-dimensional case. The relation between $U({\mathbf r})$ and $Y({\mathbf r})$ is nonlinear and is thus difficult to derive. From a qualitative study in the frequency domain, we can see that due to the aliasing effect, any frequency component of $\hat{I}({\mathbf v})$, except for those corresponding to the zeros of $\hat{W}({\mathbf v})$, contributes to the output image Y. Furthermore, thanks to the autocorrelation operation, any frequency component of $\hat{U}({\mathbf r})$ would tend to spread out in the intensity spectrum $\hat{I}({\mathbf v})$, and it is thus highly unlikely for certain frequency components of $U({\mathbf r})$ being totally blocked by the intensity transfer function $\hat{W}({\mathbf v})$. As a result, we can safely conclude that all frequency components of the incident complex field $U({\mathbf r})$ contributes to the output image Y. This finding further indicates that, the loss of information due to the insufficient sampling of the sensor can also be recovered via the diversity measurement strategy, which has been previously exploited for recovering the phase.

Fig. 3. One-dimensional illustration of the sampling procedure. r and ν are the spatial and Fourier domain coordinates, respectively.

Download Full Size | PDF

2.3 Forward model: discretization and vectorization

As mentioned in the previous section, discretization of the continuous intensity distribution is achieved by taking ${y_{ij}} = Y(i\mathrm{\Delta },j\mathrm{\Delta })$, forming the image Y. The properly sampled complex transmittance of the object is represented by a matrix ${\mathbf X} \in {{{\mathbb C}}^{{N_1} \times {N_2}}}$ with ${N_1} = \sigma {M_1},{N_2} = \sigma {M_2}$, where σ ≥ 1 is an integer referred to as the up-sampling ratio. Similarly, the discretized complex amplitude of the wavefield incident on the sensor is denoted by ${\mathbf U} = ({{u_{ij}}} )\in {{{\mathbb C}}^{{N_1} \times {N_2}}}$. The elements of X and U are termed subpixels, in contrast to the pixels in Y that contain them. U and Y are related by

(7)$${y_{ij}} = \sum\limits_{({p,q} )\in {{{\cal I}}_{ij}}} {{{|{{u_{pq}}} |}^2}} \;,$$

where ${{{\cal I}}_{ij}}$ denotes the set of indices $(p,q)$ of the subpixels in U that corresponds to the pixel indexed by $(i,j)$ in Y, which, under our coordinate system described above, can be further expressed as ${{{\cal I}}_{ij}} = \{ (p,q){\mid }\sigma (i - 1) + 1 \le p \le \sigma i,\sigma (j - 1) + 1 \le q \le \sigma j,p,q \in {{\mathbb Z}}\}$. In fact, Eq. (7) follows directly from the notion of energy conservation, that the incident energy within the given area ${{{\cal Q}}_{ij}}$ is the same when expressed in different forms.

For the convenience of illustration, we follow the convention in linear algebra and vectorize the two-dimensional wavefield representation as a one-dimensional vector. Therefore, the vectorized complex amplitude of the object and that of the incident wavefield are denoted by vectors ${\mathbf x} \in {{{\mathbb C}}^N}$ and ${\bf u} \in {{{\mathbb C}}^N}$, respectively, where $N = {N_1} \times {N_2}$. Similarly, the measured intensity is denoted by ${\mathbf y} \in {{{\mathbb R}}^M}$, where $M = {M_1} \times {M_2}$. The imaging system is linear with respect to complex amplitude, and thus its input x and output u are related by ${\mathbf u} = {\mathbf{Ax}}$. The matrix ${\mathbf A} \in {{{\mathbb C}}^{N \times N}}$ may represent certain physical process, such as free-space propagation, wavefront modulation, etc. The intensity y is related to u by a down-sampling matrix, denoted as ${\mathbf D} = ({{d_{ij}}} )\in {{{\mathbb R}}^{M \times N}}$, whose entries take values 0 or 1. Specifically, d_ij = 1 if and only if subpixel u_j corresponds to pixel y_i, which can also be expressed by the above set notation as $j \in {{{\cal I}}_i}$. With this, the vectorized version of Eq. (7) is given by ${\mathbf y} = {\mathbf D}|{\mathbf u}{|^2}$. Thus, as depicted in Fig. 4, the discretized and vectorized forward model can be compactly expressed as

(8)$${\mathbf D}{|{{\mathbf{Ax}}} |^2} = {\mathbf y}{\kern 1pt} .$$

Since the reconstruction from a single intensity image as given in Eq. (8) is ill-posed, diversity measurement schemes are generally adopted:

(9)$${\mathbf D}{|{{{\mathbf A}_k}{\mathbf x}} |^2} = {{\mathbf y}_k},\textrm{ }k = 1,2, \ldots ,S\,,$$

where S is the number of measurements. It is self-evident from Eq. (9) that, when taking into account the under-sampling process, the forward model is different from the classical phase retrieval with an additional down-sampling process, which is linear with respect to the intensity. It should be noted that, when we assume that the sampling of the imaging sensor is fine enough, we may have M = N and D being the identity matrix. Therefore, the PSR phase retrieval problem given by Eq. (9) can be viewed as a further generalization to the classical phase retrieval problem.

Fig. 4. Illustration of the forward model.

Download Full Size | PDF

3. Derivation of algorithms

The PSR phase retrieval problem aims to recover the missing phase and high-frequency details from low-resolution intensity measurements:

(10)$$\begin{array}{lllll} \textrm{find}\quad & {\mathbf x}\\ \textrm{s}\textrm{.t}\textrm{.}\quad & {\mathbf D}{|{{{\mathbf A}_k}{\mathbf x}} |^2} = {{\mathbf y}_k},\textrm{ }k = 1,2, \ldots ,S\,. \end{array}$$

In the following, we study two popular types of algorithms, namely the iterative projection algorithms and the gradient descent algorithms, in the context of PSR phase retrieval.

3.1 Iterative projection algorithms for PSR phase retrieval

3.1.1 PSR phase retrieval as a feasibility problem

For reconstruction purposes, the PSR phase retrieval problem described in Eq. (10) can be reformulated as a feasibility problem:

(11)$$\textrm{find }{\mathbf x} \in {{{\cal C}}_1} \cap {{{\cal C}}_2} \cap \cdots \cap {{{\cal C}}_S}.$$

That is, the desired solution x is assumed to lie in the intersection of some known constraint sets ${{{\cal C}}_1},{{{\cal C}}_2}, \ldots ,{{{\cal C}}_S}$. For the PSR phase retrieval problem in particular, ${{{\cal C}}_k}$ denotes the modulus constraint set with respect to the k^th measurement:

(12)$${{{\cal C}}_k} = \{{{\mathbf x} \in {{{\mathbb C}}^N}{{|{\,{\mathbf D}|{{{\mathbf A}_k}{\mathbf x}} |} }^2} = {{\mathbf y}_k}} \}\,.$$

Various projection algorithms have been proposed to solve problems in the form of Eq. (11). These algorithms use projections (or relaxed projections) to iteratively update the estimated complex field. Loosely speaking, the projection of a vector x onto a set ${{\cal C}}$, denoted as ${{{\cal P}}_{{\cal C}}}({\mathbf x})$, yields a vector (or a set of vectors) in ${{\cal C}}$ that is nearest (e.g. in the sense of Euclidean norm) to x, that is

(13)$${{{\cal P}}_{{\cal C}}}({\mathbf x}) = \mathop {\textrm{argmin}}\limits_{{\mathbf z} \in {{\cal C}}} {||{{\mathbf x} - {\mathbf z}} ||_2}.$$

The updating rule for the ER, DM, RAAR as well as other iterative projection algorithms can all be expressed by projection-related operators.

3.1.2 Projection operator for the undersampled modulus constraint set

In the classical phase retrieval problem, it is assumed that the incident wavefield is properly sampled. Thus, the modulus constraint sets can be expressed by Eq. (12) with D = I. The projection algorithms have been extensively studied in this case. Here we extend previous results to the under-sampling case where ${\mathbf D} \ne {\mathbf I}$.

Using the definitions of projection and the modulus constraint sets, we derived the projection operator with respect to ${{{\cal C}}_k}$ in Appendix A. If the transmittance matrix ${{\mathbf A}_k}$ is unitary, that is if we have ${\mathbf A}_k^ \ast {{\mathbf A}_k} = {\mathbf I}$, the projection of x onto ${{{\cal C}}_k}$ can be explicitly expressed as

(14)$${{{\cal P}}_{{{{\cal C}}_k}}}({\mathbf x} ) ={\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{\mathbf x}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{\mathbf x}} |}^2}} }}} \right)\sqrt {{{\mathbf y}_k}} \,.$$

It should be noted that, while the expression in Eq. (14) seems abstract, it can be easily translated into physical or programming languages. For example, if ${{\mathbf A}_k}$ represents a free-space propagation by a certain distance, multiplication with its conjugate transpose ${\mathbf A}_k^ \ast $ simply requires numerically propagating the wavefield by the same distance in the opposite direction. In addition, multiplication with a diagonal matrix diag(v) is equivalent to implementing an element-wise product with vector v.

While the projection operator given by Eq. (14) is generally applicable to various projection algorithms, we take the simple yet popular ER algorithm as an example for illustration. The updating rule for the ER algorithm is given by

(15)$${{\mathbf x}_{t + 1}} =\frac{1}{S}\sum\limits_{k = 1}^S {{{{\cal P}}_{{{{\cal C}}_k}}}({{{\mathbf x}_t}} )} = \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{{\mathbf x}_t}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} }}} \right)\sqrt {{{\mathbf y}_k}} } \,,$$

where t denotes the iteration number. As another example, Chang et el. has recently proposed using the method of alternating or cyclic projections to solve the PSR phase retrieval problem [34].

3.2 Gradient descent algorithms for PSR phase retrieval

3.2.1 PSR phase retrieval as a minimization problem

The inverse problem Eq. (10) can also be converted into a minimization problem with respect to a certain objective function. A natural choice would be the least-square formulation:

(16)$$\mathop {\textrm{min}}\limits_{\mathbf x} \left\{ {{f_I}({\mathbf x} ): = \frac{1}{{2S}}\sum\limits_{k = 1}^S {\left\Vert{{\mathbf D}{{|{{{\mathbf A}_k}{\mathbf x}} |}^2} - {{\mathbf y}_k}} \right\Vert_2^2} } \right\},$$

where ${f_I}({\mathbf x})$ calculates the squared residuals with respect to the intensity, and is thus referred to as the intensity-based objective function. For the classical phase retrieval problem, however, it has been discovered that using amplitude residuals yields better algorithmic performance than using the intensity ones [31]. Following this notion, we consider an alternative formulation:

(17)$$\mathop {\textrm{min}}\limits_{\mathbf x} \left\{ {{f_A}({\mathbf x} ): = \frac{1}{{2S}}\sum\limits_{k = 1}^S {\left\Vert{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{\mathbf x}} |}^2}} - \sqrt {{{\mathbf y}_k}} } \right\Vert_2^2} } \right\},$$

where ${f_A}({\mathbf x})$ is called the amplitude-based objective function. Although mathematically speaking, these two objective functions have exactly the same minimizers, they may yield very different numerical performances, as is shown in Section 4.

3.2.2 Wirtinger gradients

The objective functions in problems Eqs. (16) and (17) are real-valued functions over complex-valued variables. This type of problems can be solved either by introducing the CR calculus or by replacing the complex-valued variables by real-valued counterparts. Here, we adopt the first approach and derive the Wirtinger gradients for the objective functions. Readers may refer to [35] for a brief introduction to the CR calculus. The Wirtinger gradient of the intensity-based objective function can be expressed as

(18)$$\nabla {f_I}({\mathbf x} )= \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{\mathbf x}} ){{\mathbf D}^\textrm{T}}({{\mathbf D}{{|{{{\mathbf A}_k}{\mathbf x}} |}^2} - {{\mathbf y}_k}} )} {\kern 1pt} .$$

See Appendix B.1 for derivation. Similarly, the gradient of the amplitude-based objective function is derived in Appendix B.2, which is

(19)$$\nabla {f_A}({\mathbf x} )= \frac{1}{{2S}}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{\mathbf x}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{\mathbf x}} |}^2}} }}} \right)\left( {\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{\mathbf x}} |}^2}} - \sqrt {{{\mathbf y}_k}} } \right)} \,.$$

The Wirtinger gradients above provide the basic building blocks of many gradient-based optimization algorithms. In this paper, we consider the simplest gradient descent update as an example:

(20)$${{\mathbf x}_{t + 1}} = {{\mathbf x}_t} - {\mu _t}\nabla f({{{\mathbf x}_t}} ),$$

where ${\mu _t}$ denotes the stepsize for the t^th iteration, and f denotes either the amplitude-based or the intensity-based objective function. It has long been known in the classical phase retrieval problem that, the projection algorithms and the gradient descent algorithms are closely related. In Appendix B.3, we prove that for the PSR phase retrieval problem, when A_k is unitary, the ER algorithm of Eq. (15) is equivalent to a gradient descent update for the amplitude-based function with a constant step size ${\mu _t} = 2$.

3.3 Generalized optimization framework

From the above discussions, we see that the ER update is similar to the gradient descent update for the amplitude-based objective function with a step size of two. Although the ER algorithm can be extended using relaxed projections, the projection operator of Eq. (14) applies only if we assume unitarity on ${{\mathbf A}_k}$. In contrast, there are no such restrictions when deriving the Wirtinger gradients. The gradient descent formulation can be viewed as a more generalized version than ER. Therefore, we propose using the Wirtinger gradient descent algorithm as the general approach for solving the PSR phase retrieval problem, which is referred to as the Wirtinger-PSR algorithm.

Introducing classical optimization algorithms like the Wirtinger gradient descent algorithm into the context of PSR phase retrieval also brings many theoretical and empirical results from the optimization community. As an important example, we briefly introduce two commonly used updating strategies. The first one is the global gradient descent update given by Eq. (20), which calculates the gradient using the entire objective function. While this provides the steepest descending direction, finding an accurate estimation of the gradient may be a time-consuming task. This is particularly challenging for large scale optimization problems like PSR phase retrieval, where we usually have a large number of measurements and high-resolution reconstructions. To improve computation efficiency, a second updating strategy, termed the incremental update, is generally adopted. This comes naturally if one notices that the objective function in either Eq. (16) or Eq. (17) can be expressed by separated terms as

(21)$$f({\mathbf x} )= \frac{1}{S}\sum\limits_{k = 1}^S {{f_k}({\mathbf x} )} ,$$

where ${f_k}({\mathbf x})$ denotes part of the objective function that corresponds to the k^th measurement. The incremental gradient descent algorithm updates the variable x using only partial observations (e.g. a single intensity measurement), as is illustrated in Fig. 5. Within each iteration, the variable is updated several times, in contrast to the global gradient descent algorithm that updates the variable only once. Therefore, in some cases, incremental algorithms can significantly increase the convergence speed. Nevertheless, the global updating strategy may yield much more stable convergence when the measurement data are inconsistent.

Fig. 5. Flowchart of the (a) global gradient descent algorithm and (b) incremental gradient descent algorithm.

Download Full Size | PDF

Our MATLAB implementation of the Wirtinger-PSR algorithm can be found in [36], including the two choices for the objective function, and both the global and incremental updating strategies.

4. Performance analysis of PSR phase retrieval algorithms

4.1 Numerical results

As shown in Fig. 6, two virtual objects with different complex transmittance were designed for numerical studies. For each object, we introduced diversity in the measurements by recording the intensity distribution at S = 64 axial distances ranging from 1 mm to 5 mm. To simulate the insufficient sampling by the sensor pixels, we down-sampled the intensity by a factor of four, and correspondingly adopted a fourfold up-sampling σ = 4 during the reconstruction. Other parameters are listed in Fig. 6. To quantify the reconstruction quality, we use the relative error (RE) as the error metric, which is defined as

(22)$$\textrm{RE} = \frac{{{{||{{{\mathbf x}_{est}} - {{\mathbf x}_{true}}} ||}_2}}}{{{{||{{{\mathbf x}_{true}}} ||}_2}}},$$

where ${{\mathbf x}_{{est }}}$ and ${{\mathbf x}_{{true}}}$ denotes the estimated and ground truth value of x. It should be noted that, since we aim to recover x up to a global phase shift, RE should be calculated after normalizing the phase.

Fig. 6. Simulated virtual objects with complex transmittance, and simulation parameters.

Download Full Size | PDF

We first compare the performance between the global and incremental gradient descent algorithms using the amplitude-based objective function. When we have ideal measurements without any noise, the incremental gradient descent algorithm converges much faster than the global one, as indicated by Fig. 7(a). But when the measurements are corrupted by noise and thus inconsistent, it is a whole different story. We added 5% white Gaussian noise to the data and found that the incremental algorithm gets stuck somewhere with a large error, whereas the global algorithm converges to a much better solution, as shown in Fig. 7(b). These findings suggest that in practice one should combine the advantages of both algorithms to achieve better performance. For example, we may start with the incremental algorithm for faster convergence and then turn to the global algorithm for further refinement. In the following simulations, we used only the incremental algorithm for illustration, although similar results can be obtained using the global one.

Fig. 7. Convergence behavior of the incremental and global gradient descent algorithms with (a) noiseless and (b) noisy data.

Download Full Size | PDF

Next, we compared the convergence behavior using the amplitude-based and intensity-based objective functions. The results are presented in Fig. 8. Their differences are twofold. First, the amplitude-based minimization algorithm generally converges faster than the intensity-based one, although in some cases their performances are similar. Second, choosing a proper step size for the intensity-based algorithm is more difficult than that for the amplitude-based one. Recall that in Section 3, we have established the relationship between the amplitude-based gradient descent update and the ER update. The results suggest that using a constant step size of two, the objective function is generally non-increasing. In comparison, there is no default choice for the step size of the intensity-based algorithms. As a result, in our simulations, we used a fixed step size μ = 2 for the amplitude-based algorithms, whereas the step sizes for the intensity-based algorithms were selected case by case. Therefore, we conclude that the amplitude-based objective function is generally preferable in terms of numerical reconstruction. This agrees with previous findings from the classical phase retrieval problem.

Fig. 8. Convergence behavior of the incremental gradient descent algorithm when reconstructing different objects.

Download Full Size | PDF

We also tested the resolution limit under different noise levels, as is shown in Fig. 9. As the noise level increases, the algorithm converges earlier with a larger reconstruction error, and correspondingly the PSR reconstruction turns out to be noisier. Compared to the classical phase retrieval problem, PSR reconstruction is more sensitive to the measurement noise, and thus the resolution is in fact limited by the signal-to-noise ratio.

Fig. 9. (a) Low resolution (non-PSR) reconstruction of the amplitude. (b) PSR reconstruction under different noise levels.

Download Full Size | PDF

Considering the nonconvexity of the PSR phase retrieval problem, we compared different initialization strategies. We applied either a uniform or random initialization for amplitude and phase. In addition, we also tried the commonly adopted method that uses the averaged back-propagated holograms as the initial guess. The results are presented in Fig. 10. When initialized with a uniform phase and amplitude, the algorithm converges to the best result, and is competitive with the back-propagating method. Any type of random initialization may introduce additional noises to the reconstruction result.

Fig. 10. Convergence of the incremental gradient descent algorithm with different initial guess.

Download Full Size | PDF

4.2 Validation on experimental data

We tested the proposed algorithms on experimental data. The data were collected using phase modulation diversity with a phase-only spatial light modulator (SLM). The optical configuration and implementation details can be found in [25]. A Quantitative Phase Microscopy Target (Benchmark Technologies) and a positive 1951 USAF Test Target (Thorlabs, Inc.) were used as the phase and amplitude object, respectively. In total S = 64 images were captured, and we adopted a threefold up-sampling for reconstruction. Based on the discussion above, we ran the amplitude-based incremental gradient descent algorithm for one iteration and then the global algorithm for nine iterations. The combined algorithm outperforms the incremental algorithm for both phase and amplitude object reconstruction, as shown in Fig. 11.

Fig. 11. Convergence of the incremental and global gradient descent algorithms for (a) phase object and (b) amplitude object. The losses are calculated as the values of the objective function.

Download Full Size | PDF

The reconstruction results for the phase and amplitude objects are shown in Fig. 12 and Fig. 13, respectively. Compared to the low-resolution (LR) reconstruction with σ = 1, the PSR reconstruction achieves half-pitch resolution, and the recovered phase agrees with the theoretical results quite well. Furthermore, experimental results also suggest a superior performance of global algorithms when the measurement data are deprecated by modeling errors. Although the SLM had been calibrated in advance, minor phase errors were still inevitable during the experiments. The reconstruction result with the incremental algorithm suffers from severe background noise, which can be effectively suppressed using the global algorithm.

Fig. 12. Reconstruction result for the phase object using the (a) global and (b) incremental gradient descent algorithm, respectively. Below are the zoomed in details of (a). The theoretical values are calculated using the height and the refractive index of the structures, which are provided by the manufacturer.

Download Full Size | PDF

Fig. 13. Reconstruction result for the amplitude object.

Download Full Size | PDF

5. Conclusion

To summarize, we proposed a generalized optimization framework for pixel-super-resolved phase retrieval. We introduced the iterative projection algorithms and gradient descent algorithms to solve this problem. The analytical expressions for the projection operator and the Wirtinger gradient were derived, which serve as the fundamental building blocks for these two classes of algorithms. As two examples, the ER algorithm and the Wirtinger-PSR algorithm were studied and compared. We proved the equivalence of the two algorithms under proper conditions, and proposed using the Wirtinger-PSR algorithm as the general approach for solving the PSR phase retrieval problem. With both simulated and experimentally collected data, we tested and compared different updating schemes and choices of the objective function for the Wirtinger-PSR algorithm. In particular, we found that the incremental update can help speed up convergence, whereas the global update yields more stable performance under noisy measurements. Besides, our numerical studies show that the amplitude-based objective function is generally better than the intensity-based one, which is similar to the case of classical phase retrieval. The proposed framework generalizes well to various physical settings, and may provide new insights into the PSR phase retrieval problem.

Appendix A: derivation for the projection operator

In this Section, we derive the projection of x onto constraint set ${{{\cal C}}_k}$ as defined in Eq. (12). For the sake of simplicity, we omit the subscript k in the discussion below, since it applies to all $k = 1,2, \ldots ,S$. Requiring x to satisfy ${\mathbf x} \in {{\cal C}}$ is equivalent to having ${\mathbf u} = {\mathbf{Ax}}$ satisfy ${\mathbf u} \in {\tilde{{\cal C}}}$, where ${{\cal \tilde C}}$ is defined as

(23)$${{{\cal \tilde C}}} = \{{{\mathbf u} \in {{{\mathbb C}}^N}{{|{\,{\mathbf D}|{\mathbf u} |} }^2} = {\mathbf y}} \}\,.$$

While a closed-form expression for ${{{\cal P}}_{{\cal C}}}({\mathbf x})$ is not readily apparent, Lemma 1 suggests its close relationship with the projection of u onto ${{\cal \tilde C}}$, provided A is unitary. The proof for Lemma 1 can be found in [37], where the problem is formulated in the continuum. Result for the discrete case is similar, yet the proof is still presented here for completeness.

Lemma 1. Assume A is unitary, then the projection of x onto ${{\cal C}}$ is given by

(24)$${{{\cal P}}_{{\cal C}}}({\mathbf x} )= {{\mathbf A}^ \ast }{{{\cal P}}_{{{{\cal \tilde C}}}}}({{\mathbf{Ax}}} ).$$

Proof: Let ${{\mathbf z}^{ \star }}$ denote the projection of ${\mathbf u} = {\mathbf{Ax}}$ onto ${{\cal \tilde C}}$, i.e., ${{\mathbf z}^{ \star }} = {{{\cal P}}_{{{{\cal \tilde C}}}}}({\mathbf u}) = {{{\cal P}}_{{{{\cal \tilde C}}}}}({\mathbf{Ax}})$. By definition, we have

(25)$${||{{{\mathbf z}^{ \star }} - {\mathbf{Ax}}} ||_2} \le {||{{\mathbf z} - {\mathbf{Ax}}} ||_2},\;\forall {\mathbf z} \in {{{\cal \tilde C}}}.$$

Since A is unitary, we obtain that

(26)$${||{{{\mathbf A}^ \ast }{{\mathbf z}^{ \star }} - {\mathbf x}} ||_2} = {||{{{\mathbf z}^{ \star }} - {\mathbf{Ax}}} ||_2} \le {||{{\mathbf z} - {\mathbf{Ax}}} ||_2} = {||{{{\mathbf A}^ \ast }{\mathbf z} - {\mathbf x}} ||_2},\;\forall {\mathbf z} \in {{{\cal \tilde C}}}.$$

The unitarity of A also suggests that there exists a one-to-one mapping between each ${\mathbf z} \in {{{\cal \tilde C}}}$ and each ${\mathbf w} \in {{\cal C}}$. Thus, Eq. (26) is equivalent to

(27)$${||{{{\mathbf A}^ \ast }{{\mathbf z}^{ \star }} - {\mathbf x}} ||_2} \le {||{{\mathbf w} - {\mathbf x}} ||_2},\;\forall {\mathbf w} \in {{\cal C}}.$$

Therefore, we conclude that ${{\mathbf A}^ \ast }{{\mathbf z}^{ \star }} = {{\mathbf A}^ \ast }{{{\cal P}}_{{{{\cal \tilde C}}}}}({\mathbf{Ax}})$ is the projection of x onto ${{\cal C}}$. ▪

With Lemma 1, the derivation of ${{{\cal P}}_{{\cal C}}}({\mathbf x})$ can be simplified to that of ${{{\cal P}}_{{{{\cal \tilde C}}}}}({\mathbf{Ax}})$, which is given by Lemma 2 below.

Lemma 2. The projection operator of u onto ${{\cal \tilde C}}$ is given by

(28)$${{{\cal P}}_{{{{\cal \tilde C}}}}}({\mathbf u} ) =\textrm{diag}({\mathbf u} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{\mathbf u} |}^2}} }}} \right)\sqrt {\mathbf y} \,.$$

Proof: Let ${{\mathbf z}^{ \star }}$ denote the projection of u onto ${{\cal \tilde C}}$, i.e., ${{\mathbf z}^{ \star }} = {{{\cal P}}_{\tilde{c}}}({\mathbf u})$. For a given u, the derivation of ${{\mathbf z}^{ \star }}$ can be formulated as a constrained minimization problem

(29)$${{\mathbf z}^{ \star }} = \mathop {\textrm{argmin}}\limits_{{\mathbf z} \in {{{\cal \tilde C}}}} {||{{\mathbf u} - {\mathbf z}} ||_2}.$$

Minimizing the l₂ norm is equivalent to minimizing its square:

(30)$$\begin{aligned} ||{{\mathbf u} - {\mathbf z}} ||_2^2 &= \sum\limits_i {\sum\limits_{j \in {{{\cal I}}_i}} {{{|{{u_j} - {z_j}} |}^2}} } \ge \sum\limits_i {\sum\limits_{j \in {{{\cal I}}_i}} {{{({|{{u_j}} |- |{{z_j}} |} )}^2}} } \\ &= \sum\limits_i {\left( {{y_i} + \sum\limits_{j \in {{{\cal I}}_i}} {{{|{{u_j}} |}^2}} - 2\sum\limits_{j \in {{{\cal I}}_i}} {|{{u_j}} |\cdot |{{z_j}} |} } \right)} \,. \end{aligned}$$

According to the triangle equality of complex modulus, the equality above holds if and only if $\arg {\mathbf u} = \arg {\mathbf z}$, i.e., the complex argument (i.e., the phase) remains unchanged [38]. The last equality is obtained because ${\mathbf z} \in {{{\cal \tilde C}}}$, that is we have, $\mathop \sum \limits_{j\in { {\cal I}}_i} \left| {u_j} \right|^2 = y_i$. Equation (30) suggests that the problem is equivalent to maximizing $\sum\limits_{j \in {{{\cal I}}_i}} {|{{u_j}} |\cdot |{{z_j}} |}$ for each pixel i:

(31)$$\sum\limits_{j \in {{{\cal I}}_i}} {|{{u_j}} |\cdot |{{z_j}} |} \le {\sqrt{\sum\limits_{j \in {{{\cal I}}_i}} {{{|{{u_j}} |}^2}} \cdot \sum\limits_{j \in {{{\cal I}}_i}} {{{|{{z_j}} |}^2}}}} = {\sqrt{{y_i} \cdot \sum\limits_{j \in {{{\cal I}}_i}} {{{|{{u_j}} |}^2}}}}.$$

The Cauchy-Schwarz inequality indicates that the equality above holds if and only if we have

(32)$$\frac{{|{{z_j}} |}}{{|{{u_j}} |}} = {c_i},\;\forall j \in {{{\cal I}}_i}\,,$$

where c_i is a constant dependent on i, which can be easily derived as

(33)$${c_i} = \sqrt {\frac{{{y_i}}}{{\sum\nolimits_{j \in {{{\cal I}}_i}} {{{|{{u_j}} |}^2}} }}} \,.$$

When the above condition is satisfied for any pixel i, we obtain the projection operator, which can be written in the vectorized form as

(34)$${{\mathbf z}^{ \star }} = {{{\cal P}}_{{{{\cal \tilde C}}}}}({\mathbf u} ) =\textrm{diag}({\mathbf u} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{\mathbf u} |}^2}} }}} \right)\sqrt {\mathbf y} \,,$$

which completes the proof. ▪

Combining the results from the two lemmas, we arrive at the main theorem as follows.

Theorem 3. Assume A is unitary, then the projection of x onto ${{\cal C}}$ is given by

(35)$${{{\cal P}}_{{\cal C}}}({\mathbf x} ) ={{\mathbf A}^ \ast }\textrm{diag}({{\mathbf{Ax}}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{\mathbf{Ax}}} |}^2}} }}} \right)\sqrt {\mathbf y} \,.$$

Appendix B: gradient calculation

B.1. Intensity-based objective function

Let ${{\boldsymbol \rho }_k} = {\mathbf D}{|{{{\mathbf u}_k}} |^2} - {{\mathbf y}_k}$ denotes the intensity residual for the k^th measurement. Then the objective function of Eq. (16) can be expressed as

(36)$${f_I}({\mathbf x} )= \frac{1}{{2S}}\sum\limits_{k = 1}^S {||{{{\mathbf {\rho } }_k}} ||_2^2} \,.$$

According to the chain rule of complex derivatives:

(37)$$\frac{{\partial {f_I}({\mathbf x} )}}{{\partial \bar{{\mathbf x}}}} = \sum\limits_{k = 1}^S {\frac{{\partial {f_I}}}{{\partial {{\mathbf {\rho } }_k}}}} \frac{{\partial {{\mathbf {\rho } }_k}}}{{\partial \bar{{\mathbf x}}}} = \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf {\rho } }_k^\textrm{T}} \frac{{\partial {{\mathbf {\rho } }_k}}}{{\partial \bar{{\mathbf x}}}} = \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf {\rho } }_k^\textrm{T}} \left( {\frac{{\partial {{\mathbf {\rho } }_k}}}{{\partial {{\mathbf u}_k}}}\frac{{\partial {{\mathbf u}_k}}}{{\partial \bar{{\mathbf x}}}} + \frac{{\partial {{\mathbf {\rho } }_k}}}{{\partial {{\bar{{\mathbf u}}}_k}}}\frac{{\partial {{\bar{{\mathbf u}}}_k}}}{{\partial \bar{{\mathbf x}}}}} \right)\,,$$

where the four partial derivative terms are calculated as

(38)$$\frac{{\partial {{\mathbf {\rho } }_k}}}{{\partial {{\mathbf u}_k}}} = {\mathbf D}\textrm{diag}({{{\bar{{\mathbf u}}}_k}} )$$

(39)$$\frac{{\partial {{\mathbf {\rho } }_k}}}{{\partial {{\bar{{\mathbf u}}}_k}}} = {\mathbf D}\textrm{diag}({{{\mathbf u}_k}} )$$

(40)$$\frac{{\partial {{\mathbf u}_k}}}{{\partial \bar{{\mathbf x}}}} = {\mathbf 0}$$

(41)$$\frac{{\partial {{\bar{{\mathbf u}}}_k}}}{{\partial \bar{{\mathbf x}}}} = {\bar{{\mathbf A}}_k}\,.$$

Substituting Eqs. (38)–(41) into Eq. (37), we obtain the derivative of f_I with respect to $\bar{{\mathbf x}}$:

(42)$$\frac{{\partial {f_I}({\mathbf x} )}}{{\partial \bar{{\mathbf x}}}} = \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf {\rho } }_k^\textrm{T}{\mathbf D}\textrm{diag}({{{\mathbf u}_k}} ){{\bar{{\mathbf A}}}_k}} .$$

The gradient is thus given by

(43)$$\nabla {f_I}({\mathbf x} )= {\left( {\frac{{\partial {f_I}({\mathbf x} )}}{{\partial {\mathbf x}}}} \right)^ \ast } = {\left( {\frac{{\partial {f_I}({\mathbf x} )}}{{\partial \bar{{\mathbf x}}}}} \right)^\textrm{T}} = \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf u}_k}} ){{\mathbf D}^\textrm{T}}{{\mathbf {\rho } }_k}} {\kern 1pt} .$$

B.2. Amplitude-based objective function

Let ${{\boldsymbol \varepsilon }_k} = \sqrt {{\mathbf D}{{|{{{\mathbf u}_k}} |}^2}} - \sqrt {{{\mathbf y}_k}}$ denotes the amplitude residual for the k^th measurement. Then the objective function of Eq. (17) can be expressed as

(44)$${f_A}({\mathbf x} )= \frac{1}{{2S}}\sum\limits_{k = 1}^S {||{{{\mathbf \varepsilon }_k}} ||_2^2} \,.$$

According to the chain rule of complex derivatives:

(45)$$\frac{{\partial {f_A}({\mathbf x} )}}{{\partial \bar{{\mathbf x}}}} = \sum\limits_{k = 1}^S {\frac{{\partial {f_A}}}{{\partial {{\mathbf \varepsilon }_k}}}} \frac{{\partial {{\mathbf \varepsilon }_k}}}{{\partial \bar{{\mathbf x}}}} = \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf \varepsilon }_k^\textrm{T}} \frac{{\partial {{\mathbf \varepsilon }_k}}}{{\partial \bar{{\mathbf x}}}} = \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf \varepsilon }_k^\textrm{T}} \left( {\frac{{\partial {{\mathbf \varepsilon }_k}}}{{\partial {{\mathbf u}_k}}}\frac{{\partial {{\mathbf u}_k}}}{{\partial \bar{{\mathbf x}}}} + \frac{{\partial {{\mathbf \varepsilon }_k}}}{{\partial {{\bar{{\mathbf u}}}_k}}}\frac{{\partial {{\bar{{\mathbf u}}}_k}}}{{\partial \bar{{\mathbf x}}}}} \right)\,,$$

where [38]

(46)$$\frac{{\partial {{\mathbf \varepsilon }_k}}}{{\partial {{\mathbf u}_k}}} = \frac{{\partial \sqrt {{\mathbf D}{{|{{{\mathbf u}_k}} |}^2}} }}{{\partial ({{{|{{{\mathbf u}_k}} |}^2}} )}}\frac{{\partial ({{{\bar{{\mathbf u}}}_k} \odot {{\mathbf u}_k}} )}}{{\partial {{\mathbf u}_k}}} = \frac{1}{2}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf u}_k}} |}^2}} }}} \right){\mathbf D}\textrm{diag}({{{\bar{{\mathbf u}}}_k}} )$$

(47)$$\frac{{\partial {{\mathbf \varepsilon }_k}}}{{\partial {{\bar{{\mathbf u}}}_k}}} = \overline {\left( {\frac{{\partial {{\mathbf \varepsilon }_k}}}{{\partial {{\mathbf u}_k}}}} \right)} = \frac{1}{2}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf u}_k}} |}^2}} }}} \right){\mathbf D}\textrm{diag}({{{\mathbf u}_k}} )\,.$$

Thus, substituting Eqs. (40), (41), (46), and (47) into Eq. (45), we arrive at the derivative and the gradient:

(48)$$\frac{{\partial {f_A}({\mathbf x} )}}{{\partial \bar{{\mathbf x}}}} = \frac{1}{{2S}}\sum\limits_{k = 1}^S {{\mathbf \varepsilon }_k^\textrm{T}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf u}_k}} |}^2}} }}} \right){\mathbf D}\textrm{diag}({{{\mathbf u}_k}} ){{\bar{{\mathbf A}}}_k}}$$

(49)$$\nabla {f_A}({\mathbf x} )= {\left( {\frac{{\partial {f_A}({\mathbf x} )}}{{\partial {\mathbf x}}}} \right)^ \ast } = {\left( {\frac{{\partial {f_A}({\mathbf x} )}}{{\partial \bar{{\mathbf x}}}}} \right)^\textrm{T}} = \frac{1}{{2S}}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf u}_k}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf u}_k}} |}^2}} }}} \right){{\mathbf \varepsilon }_k}} \,.$$

B.3. Equivalence between the Wirtinger gradient descent update and the ER update

If the stepsize for the amplitude-based gradient descent update of Eq. (20) is two, i.e., ${\mu _t} = 2$, using the expression of the gradient in Eq. (19), we have

(50)$$\begin{aligned} {{\mathbf x}_{t + 1}} &={{\mathbf x}_t} - 2\nabla {f_A}({{{\mathbf x}_t}} )= {{\mathbf x}_t} - \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf u}_t}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf u}_t}} |}^2}} }}} \right){{\mathbf \varepsilon }_k}} \\ &={{\mathbf x}_t} - \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{{\mathbf x}_t}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} }}} \right)\left( {\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} - \sqrt {{{\mathbf y}_k}} } \right)} \\ &={{\mathbf x}_t} + \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{{\mathbf x}_t}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} }}} \right)\sqrt {{{\mathbf y}_k}} } \\&\quad- \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{{\mathbf x}_t}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} }}} \right)\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} } \\ &={{\mathbf x}_t} + \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{{\mathbf x}_t}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} }}} \right)\sqrt {{{\mathbf y}_k}} } - \frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast {{\mathbf A}_k}{{\mathbf x}_t}} \,. \end{aligned}$$

If ${{\mathbf A}_k}$ is unitary, ${\mathbf A}_k^ \ast {{\mathbf A}_k} = {\mathbf I}$, notice that the first and the third terms are cancelled. Thus, the updating equation above is simplified to

(51)$${{\mathbf x}_{t + 1}} =\frac{1}{S}\sum\limits_{k = 1}^S {{\mathbf A}_k^ \ast \textrm{diag}({{{\mathbf A}_k}{{\mathbf x}_t}} ){{\mathbf D}^\textrm{T}}\textrm{diag}\left( {\frac{{\mathbf 1}}{{\sqrt {{\mathbf D}{{|{{{\mathbf A}_k}{{\mathbf x}_t}} |}^2}} }}} \right)\sqrt {{{\mathbf y}_k}} } \,,$$

which is equivalent to the ER update given by Eq. (15).

Funding

National Natural Science Foundation of China (61827825).

Disclosures

The authors declare no conflicts of interest.

Data availability

The code for the Wirtinger-PSR algorithm is available in Ref. [36]. Experimental data underlying the results presented in this paper may be obtained from the authors upon reasonable request.

References

1. D. Gabor, “A New Microscopic Principle,” Nature 161(4098), 777–778 (1948). [CrossRef]

2. R. Zhao, L. Huang, and Y. Wang, “Recent advances in multi-dimensional metasurfaces holographic technologies,” PhotoniX 1(1), 20 (2020). [CrossRef]

3. X. Fang, H. Yang, W. Yao, T. Wang, Y. Zhang, M. Gu, and M. Xiao, “High-dimensional orbital angular momentum multiplexing nonlinear holography,” Adv. Photonics 3, 015001 (2021). [CrossRef]

4. P. Thibault, M. Dierolf, A. Menzel, O. Bunk, C. David, and F. Pfeiffer, “High-Resolution Scanning X-ray Diffraction Microscopy,” Science 321(5887), 379–382 (2008). [CrossRef]

5. Z. Göröcs and A. Ozcan, “On-Chip Biomedical Imaging,” IEEE Rev. Biomed. Eng. 6, 29–46 (2013). [CrossRef]

6. O. Mudanyali, E. McLeod, W. Luo, A. Greenbaum, A. F. Coskun, Y. Hennequin, C. P. Allier, and A. Ozcan, “Wide-field optical detection of nanoparticles using on-chip microscopy and self-assembled nanolenses,” Nat. Photonics 7(3), 247–254 (2013). [CrossRef]

7. J. Katz and J. Sheng, “Applications of Holography in Fluid Mechanics and Particle Dynamics,” Annu. Rev. Fluid Mech. 42(1), 531–555 (2010). [CrossRef]

8. Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase Retrieval with Application to Optical Imaging: A contemporary overview,” IEEE Signal Proc. Mag. 32(3), 87–109 (2015). [CrossRef]

9. P. Bao, F. Zhang, G. Pedrini, and W. Osten, “Phase retrieval using multiple illumination wavelengths,” Opt. Lett. 33(4), 309–311 (2008). [CrossRef]

10. J. Mariën, R. Stahl, A. Lambrechts, C. V. Hoof, and A. Yurt, “Color lens-free imaging using multi-wavelength illumination based phase retrieval,” Opt. Express 28(22), 33002–33018 (2020). [CrossRef]

11. M. Sanz, J. A. Picazo-Bueno, J. García, and V. Micó, “Improved quantitative phase imaging in lensless microscopy by single-shot multi-wavelength illumination using a fast convergence algorithm,” Opt. Express 23(16), 21352–21365 (2015). [CrossRef]

12. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” Nat. Photonics 7(9), 739–745 (2013). [CrossRef]

13. Y. Zhou, J. Wu, J. Suo, X. Han, and Q. Dai, “Single-shot lensless imaging via simultaneous multi-angle LED illumination,” Opt. Express 26(17), 21418 (2018). [CrossRef]

14. Y. Wu, M. K. Sharma, and A. Veeraraghavan, “WISH: wavefront imaging sensor with high resolution,” Light Sci Appl 8(1), 44 (2019). [CrossRef]

15. J. A. Rodrigo, T. Alieva, G. Cristóbal, and M. L. Calvo, “Wavefield imaging via iterative retrieval based on phase modulation diversity,” Opt. Express 19(19), 18621–18635 (2011). [CrossRef]

16. G. Pedrini, W. Osten, and Y. Zhang, “Wave-front reconstruction from a sequence of interferograms recorded at different planes,” Opt. Lett. 30(8), 833–835 (2005). [CrossRef]

17. H. M. L. Faulkner and J. M. Rodenburg, “Movable Aperture Lensless Transmission Microscopy: A Novel Phase Retrieval Algorithm,” Phys. Rev. Lett. 93(2), 023903 (2004). [CrossRef]

18. P. Grohs, S. Koppensteiner, and M. Rathmair, “Phase Retrieval: Uniqueness and Stability,” SIAM Rev. 62(2), 301–350 (2020). [CrossRef]

19. J. Zhang, J. Sun, Q. Chen, and C. Zuo, “Resolution Analysis in a Lens-Free On-Chip Digital Holographic Microscope,” IEEE Trans. Comput. Imaging 6, 697–710 (2020). [CrossRef]

20. W. Bishara, T.-W. Su, A. F. Coskun, and A. Ozcan, “Lensfree on-chip microscopy over a wide field-of-view using pixel super-resolution,” Opt. Express 18(11), 11181–11191 (2010). [CrossRef]

21. A. Greenbaum and A. Ozcan, “Maskless imaging of dense samples using pixel super-resolution based multi-height lensfree on-chip microscopy,” Opt. Express 20(3), 3129–3143 (2012). [CrossRef]

22. W. Luo, Y. Zhang, Z. Göröcs, A. Feizi, and A. Ozcan, “Propagation phasor approach for holographic image reconstruction,” Sci. Rep. 6(1), 22738 (2016). [CrossRef]

23. W. Luo, Y. Zhang, A. Feizi, Z. Göröcs, and A. Ozcan, “Pixel super-resolution using wavelength scanning,” Light Sci Appl 5(4), e16060 (2016). [CrossRef]

24. J. Zhang, J. Sun, Q. Chen, J. Li, and C. Zuo, “Adaptive pixel-super-resolved lensfree in-line digital holography for wide-field on-chip microscopy,” Sci. Rep. 7(1), 11777 (2017). [CrossRef]

25. Y. Gao and L. Cao, “High-fidelity pixel-super-resolved complex field reconstruction via adaptive smoothing,” Opt. Lett. 45(24), 6807–6810 (2020). [CrossRef]

26. S. Marchesini, “Invited Article: A unified evaluation of iterative projection algorithms for phase retrieval,” Rev. Sci. Instrum. 78(1), 011301 (2007). [CrossRef]

27. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]

28. V. Elser, “Phase retrieval by iterated projections,” J. Opt. Soc. Am. A 20(1), 40–55 (2003). [CrossRef]

29. D. R. Luke, “Relaxed averaged alternating reflections for diffraction imaging,” Inverse Probl. 21(1), 37–50 (2005). [CrossRef]

30. E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase Retrieval via Wirtinger Flow: Theory and Algorithms,” IEEE Trans. Inform. Theory 61(4), 1985–2007 (2015). [CrossRef]

31. L. Yeh, J. Dong, J. Zhong, L. Tian, M. Chen, G. Tang, M. Soltanolkotabi, and L. Waller, “Experimental robustness of Fourier ptychography phase retrieval algorithms,” Opt. Express 23(26), 33214–33240 (2015). [CrossRef]

32. L. Bian, J. Suo, G. Zheng, K. Guo, F. Chen, and Q. Dai, “Fourier ptychographic reconstruction using Wirtinger flow optimization,” Opt. Express 23(4), 4856–4866 (2015). [CrossRef]

33. S. Kandel, S. Maddali, M. Allain, S. O. Hruszkewycz, C. Jacobsen, and Y. S. G. Nashed, “Using automatic differentiation as a general framework for ptychographic reconstruction,” Opt. Express 27(13), 18653–18672 (2019). [CrossRef]

34. X. Chang, L. Bian, S. Jiang, G. Zheng, and J. Zhang, “Plug-and-play optimization for pixel super-resolution phase retrieval,” arXiv:2105.14746 (2021).

35. K. Kreutz-Delgado, “The Complex Gradient Operator and the CR-Calculus,” arXiv:0906.4835 (2009).

36. THUHoloLab, “Pixel-super-resolution-phase-retrieval,” Github (2021), https://github.com/THUHoloLab/pixel-super-resolution-phase-retrieval.

37. D. R. Luke, J. V. Burke, and R. G. Lyon, “Optical Wavefront Reconstruction: Theory and Numerical Methods,” SIAM Rev. 44(2), 169–224 (2002). [CrossRef]

38. The derivation assumes that the complex-valued incident field u does not have any zero entries. Otherwise, the projection is multivalued and the amplitude-type Wirtinger gradient is not well-defined. Nevertheless, this issue can be circumvented by assigning a certain phase value (typically zero phase) for the zero entries.

Generalized optimization framework for pixel super-resolution imaging in digital holography

Abstract

1. Introduction

2. Problem formulation

2.1 Notations

2.2 Modeling and analysis of the sampling process

2.3 Forward model: discretization and vectorization

3. Derivation of algorithms

3.1 Iterative projection algorithms for PSR phase retrieval

3.1.1 PSR phase retrieval as a feasibility problem

3.1.2 Projection operator for the undersampled modulus constraint set

3.2 Gradient descent algorithms for PSR phase retrieval

3.2.1 PSR phase retrieval as a minimization problem

3.2.2 Wirtinger gradients

3.3 Generalized optimization framework

4. Performance analysis of PSR phase retrieval algorithms

4.1 Numerical results

4.2 Validation on experimental data

5. Conclusion

Appendix A: derivation for the projection operator

Appendix B: gradient calculation

B.1. Intensity-based objective function

B.2. Amplitude-based objective function

B.3. Equivalence between the Wirtinger gradient descent update and the ER update

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Equations (51)

Optics Express