Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

New ℓ2 − ℓ0 algorithm for single-molecule localization microscopy

Open Access Open Access

Abstract

Among the many super-resolution techniques for microscopy, single-molecule localization microscopy methods are widely used. This technique raises the difficult question of precisely localizing fluorophores from a blurred, under-resolved, and noisy acquisition. In this work, we focus on the grid-based approach in the context of a high density of fluorophores formalized by a ℓ2 least-square term and sparsity term modeled with ℓ0 pseudo-norm. We consider both the constrained formulation and the penalized formulation. Based on recent results, we formulate the ℓ0 pseudo-norm as a convex minimization problem. This is done by introducing an auxiliary variable. An exact biconvex reformulation of the ℓ2 − ℓ0 constrained and penalized problems is proposed with a minimization algorithm. The algorithms, named CoBic (Constrained Biconvex) and PeBic (Penalized Biconvex) are applied to the problem of single-molecule localization microscopy and we compare the results with other recently proposed methods.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Single-Molecule localization microscopy (SMLM) is an acquisition method that makes it possible to obtain images with a higher resolution than the diffraction limit. Ernst Abbe first developed the equation of the diffraction in 1873, which gives the resolution limit of a microscope in the lateral plane. The limit is given as

$$d_{min}=\frac{\lambda}{2\,NA}$$
where $\lambda$ is the wavelength of the light and $NA$ is the numerical aperture of the optical system [1]. This limit is around 200 nm in the lateral plane of the microscope. This is not sufficient when observing small biological structures, such as proteins and viruses. An electron microscope has a resolution up to 0.2 nm but the samples must be pretreated and fixed, and in-vivo imaging (images of living structures) is then impossible. Among the many super-resolution techniques for microscopy (STED, SIM etc, see [2]), SMLM is widely used. 20 nm resolution is reported on SMLM when it was first introduced in [35], and can thus be used to observe fine structures.

SMLM uses photoactivable fluorophores. This allows one to activate a sparse subsample of the fluorophores instead of illuminating all at the same time, as in standard fluorescence microscopy. The probability that two fluorophores are in the same Point Spread Function (PSF) is low when only a few fluorophores are activated (low-density images), and precise localization of each is therefore possible. However, obtaining the image of the whole sample necessitate to acquire a big number of low-density images, which takes time, and the temporal resolution of the image may be inadequate. High-density acquisitions reduce the total acquisition time and increase temporal resolution. However, the fluorophores may be too close to each other, and thus efficient localization algorithms are needed.

The ISBI 2013 [6] and 2016 [7] SMLM challenges addressed this problem and compared the localization performed by different algorithms. There are many approaches to solve the problem. Some algorithms localize spots in the observed image and fit specific shapes (templates, PSF) on these spots [8]. Other algorithms compare the spot to a pre-defined Gaussian function [9]. Other, more sophisticated algorithms test the likeness between the spot and a sum of different Gaussians on each spot [10]. Another approach consists of modeling the localization problem as an inverse problem with a sparsity term. These methods can be grouped into two: Grid methods and grid-less methods. The grid methods require a fine grid to obtain a sufficient precision, but at the cost of computational time. Grid-less algorithms do not have this problem, but the observation model is non-linear. With a grid, this observation model is linear. Some algorithms of the inverse problem approach are [11,12].

In this article, we focus on inverse grid method approaches. We work with this formulation of the problem mainly because of its versatility. Inverse problems with a sparsity term have many applications beyond SMLM as geophysics, source separation, variable selection, and many more.

2. Direct and inverse problem

2.1 The image formation model

Let $d\in \mathbb {R}^{M \times M}$ be an image acquisition. We suppose that only a small number of fluorophores have contributed to this image. We want to localize the fluorophores on a finer grid $x\in \mathbb {R}^{ML \times ML}$, where $L$ is the refinement factor. The goal is to reconstruct $x$ from $d$. To do so, we need to model the acquisition process. The fluorophores are observed through an optical system, and thus we observe diffraction discs instead of the fine position of the fluorophores. This is modeled by a convolution with the Point Spread Function (PSF) of the microscope. We suppose the PSF to be a Gaussian kernel:

$$PSF(z,y)= I\exp\left(-\frac{z^{2}+y^{2}}{2\sigma^{2}}\right)$$
where $I$ is a normalization factor. Furthermore, a sensor captures the observation with a resolution inferior to the fine grid. We model this as an operator that sums pixel groups of $L\times L$. The result is an observation of size $M \times M$. Finally, this observation is affected by noise $\eta$, which is assumed to be a mix of Poisson noise and Gaussian noise. We simplify the noise assumptions and consider only Gaussian noise.

The model can be written as

$$d=Ax+\eta$$
where $A\in \mathbb {R}^{M^{2}\times (ML)^{2}}$ is the matrix that performs a convolution and downsampling.

2.2 Formulation of the inverse problem

With the assumption of Gaussian noise we can write the recovering of $x$ as

$$\underset{x}{\arg \min }\|A x-d\|_{2}^{2}.$$
The $\ell _2$ norm is defined in Notations at the end of section 2. However, $A$ is a matrix with more columns than lines, and thus the problem is underdetermined and ill-posed. An a priori knowledge of $x$ is needed to correctly localize the fluorophores. The first hypothesis is that only a few fluorophores are excited and emit light. Thus the solution should be sparse. We can use the $\ell _0$ pseudo-norm to enforce sparsity. This will be, by abuse of terminology referred to as the $\ell _0$-norm. It is defined as
$$\|x\|_0=\#\{x_i , i=1,\ldots (ML)^{2} : x_i\neq 0\}$$
The second hypothesis is that the fluorophores emit light, and we wish to reconstruct the intensity, which is positive. We add, therefore, that the solution should be non-negative. This yields that we can search a solution $\hat {x}$ as
$$\hat{x} \in \arg \min _{x} G_{k}(x):=\frac{1}{2}\|A x-d\|_{2}^{2}+\iota_{\cdot \geq 0}(x) \text { s.t. }\|x\|_{0} \leq k$$
$$\hat{x} \in \underset{x}{\arg \min } G_{\ell_{0}}(x):=\frac{1}{2}\|A x-d\|_{2}^{2}+\lambda\|x\|_{0}+\iota_{\cdot \geq 0}(x)$$
where $\iota _{\cdot \in X}$ is the indicator function defined in Notations, and $\iota _{\cdot \geq 0}(x)$ enforces the positivity constraint. We refer problem (2) to the constrained optimization problem. The problem (3) is referred to as the regularized problem. The problems differ in the regularization. The constrained problem reconstruct at maximum $k$ non-zero components. The regularized problem reconstructs a sparse solution, and the number of non-zero elements depends on the regularization parameter $\lambda$ and the acquisition $d$. In an ideal case, where the fluorophores are sufficiently separated, one fluorophore would represent a non-zero component in $x$. Thus the parameter $k$ represents the maximum number of fluorophores to reconstruct for each acquisition. Indeed, in practice, this is not the case since we work mainly with high-density acquisitions, and multiple fluorophores can be situated in one pixel, even on the fine grid. Thus the parameter $k$ will represent the number of distinguishable fluorophores in the reconstruction.

2.3 State-of-the-art of sparse optimization

The two problems (2) and (3) are not continuous, nor convex and the problems are known to be NP-hard due to the combinatorial nature of the $\ell _0$ norm. Due to numerous applications in domains such as geophysics, variable selection and machine learning (see, among others, [13,14]), they have been greatly studied. We present below some methods to solve minimization problems with a $\ell _0$ norm.

Greedy algorithms Greedy algorithms are a well-used method in sparse optimization. These algorithms add one component to the signal $x$ at each iteration. The criterion for choosing a component depends on the algorithm. One of the easiest and least costly greedy algorithms, the Matching Pursuit (MP) algorithm [15], adds the component that reduces most the residual $R$, defined as $R=d-Ax^{(s)}$, at each iteration $s$. More sophisticated algorithms have been developed. Greedy sparse simplex [16] or Single Best Replacement (SBR) [17] can, in contrast to MP, add and also subtract components.

Relaxations The non-convexity is due to the $\ell _0$ norm. The convex $\ell _1$ norm could replace the non-convex $\ell _0$ norm, as it is known to promote sparsity. The $\ell _2-\ell _0$ problem becomes a $\ell _2-\ell _1$ problem. This approach is called a convex relaxation since a convex term replaces the non-convex term. The solution to the original problem is, however, not always the same as the one of the convex relaxed problem. This happens only under restrictive conditions on the matrix $A$ [18], which are not satisfied here. Non-smooth, non-convex, but continuous relaxations were primarily introduced to avoid the difference between $\|x\|_0$ and $\|x\|_1$ when $x$ contains large values. Among them we find NonNegative Garrote, the Log-Sum penalty, Capped-$\ell _1$, and the Continuous Exact $\ell _0$ (CEL0) penalty which is an exact relaxation for the problem (3), to mention some. The relaxed problems are still non-convex, and the convergence of the algorithms to a global minimum is not ensured. A unified view of the non-convex continuous relaxations for the penalty problem is given in [19].

Mathematical program with equilibrium constraint A more recent method of resolving a sparse optimization problem is to introduce auxiliary variables to mimic the nature of $\ell _0$ norm. This requires a constraint between the primary variable and the auxiliary. Hence the problem becomes a mathematical program with equilibrium constraint, and among the approaches, we find Mixed Integer reformulations [20] (designed for small dimension problems), and Boolean relaxation [21]. Articles, such as [2225], use reformulations of the $\ell _0$ norm similar to ours.

Contribution: The aim of this paper is to present two new techniques for SMLM. We start in section 3 by introducing a reformulation of the $\ell _0$ norm. We rewrite the norm as a convex minimization problem by introducing an auxiliary variable. Thus, we can reformulate (2) and (3) as a mathematical program with equilibrium constraint (MPEC). Based on the reformulation of the problem, we define a Lagrangian cost function $G_\rho$. The function $G_\rho : \mathbb {R}^{N}\times \mathbb {R}^{N}\rightarrow \mathbb {R}$ is biconvex. In section 4, we propose a Constrained Biconvex (CoBic) algorithm and a Penalized Biconvex algorithm (PeBic) to minimize the new objective function. The algorithms are based on already existing and well-known algorithms, and thus easy to implement. In section 5, we test the algorithms on the problem of SMLM and compare them to the state-of-the-art methods Deep-STORM [26] and IRL1-CEL0 [27], as well as the well-known IHT algorithms [28]. The comparison done on the real dataset confirms the efficiency of CoBic and PeBic. Section 7 gives the mathematical justification of the algorithm.

Note that the reformulation of the $\ell _0$ norm was presented in [25], and slight variations have been previously presented in [2225]. Our work differs from their work, as [23] and [22] did not work on the minimization problem (3) and (2). [24,25] study the minimization of the sum of Lipschitz continuous data term and a sparsity constraint. In this paper, the data term is the square norm, which is not Lipschitz continuous.

Notations:

  • • The $\ell _2$ norm of a vector $x$ is denoted $\|\cdot \|_2$ and defined as $\sqrt {\sum _i x_i^{2}}$.
  • • To simplify the notation we will set $\|\cdot \|=\|\cdot \|_2$.
  • • The $\ell _1$ norm of a vector $x$ is denoted $\|\cdot \|_1$ and defined as $\sum _i |x_i|$.
  • • A is a matrix in $\mathbb {R}^{P \times N}$, $P<N$. In this paper $P=M\times M$ and $N=ML\times ML$.
  • $A^{T}$ is the transposed matrix of $A$.
  • • If not stated otherwise, the vector $x\in \mathbb {R}^{N}$.
  • • The observed signal $d\in \mathbb {R}^{P}$.
  • • The function
    $$\|x\|_0=\#\{x_i , i=1,\ldots N : x_i\neq 0\}$$
    where $\#$ denotes the cardinality of the set, will be, by abuse of terminology, referred to as the $\ell _0$-norm.
  • • For a matrix $A \in \mathbb{R}^{P \times N}$, the singular value decomposition (SVD) of $A$ is noted $A=U_A\Sigma (A)V_A^{*}$.
  • • For a matrix $A \in \mathbb{R}^{P \times N}$, we denote $\|A\|$ the spectral norm of A defined as
    $$\|A\|= \sigma(A)$$
    where $\sigma (A)$ is the largest singular value of $A$.
  • • The indicator function $\iota _{x\in X}$ is defined for $X\subset \mathbb {R}^{N}$ as
    $$\iota_{x\in X}(x)=\begin{cases} +\infty \textrm{ if } x\notin X\\ 0 \textrm{ if } x\in X \end{cases}$$
  • $-\textbf {1}\leq u \leq \textbf {1}$ is a component-wise notation, i.e, $\forall \, \, i, \, -1\leq u_i\leq 1$.
  • $|x|\in \mathbb {R}^{N}$ is a vector containing the absolute value of each component of the vector $x$.

3. Exact reformulation

In this section, we focus on a reformulation of the $\ell _0$ norm. [25] first introduced this formulation where they rewrite the $\ell _0$ norm as convex minimization problem by adding an auxiliary variable. We can write the $\ell _0$ norm of any $x \in \mathbb {R}^{N}$ as

$$\|x\|_0 = \min_{\substack{-\textbf{1}\leq u \leq \textbf{1}\\ u \in \mathbb{R}^{N} }} \|u\|_1 \textrm{ s.t } \|x\|_1= <u,x>.$$
Even though the introduction of the auxiliary variable $u$ increases the dimension of the problem, the non-convex and non-continuous $\ell _0$ norm can now be written as a convex and continuous minimization problem. In this paper, we study the $\ell _2-\ell _0$ penalized and constrained problems using the reformulation of the $\ell _0$ norm. We also add a non-negativity constraint to the $x$ variable as it is usually used as a priori in imaging problems. We can get an unified notation for the constrained and penalized forms
$$\min_{x,u} \frac{1}{2}\|Ax-d\|^{2}+ I(u) +\iota_{\cdot\geq 0}(x) \textrm{ s.t. } \|x\|_1=<x,u>$$
where $I(u)$ is, in the case of the constrained problem (2):
$$I(u)=\begin{cases} 0 \textrm{ if }\|u\|_1 \leq k \textrm{ and } \forall \, \, i, \, -1\leq u_i\leq 1\\ \infty \textrm{ otherwise } \end{cases}$$
and for the penalized problem (3):
$$I(u)=\begin{cases} \lambda \|u\|_1 \textrm{ if } \forall \, \, i, \, -1\leq u_i\leq 1\\ \infty \textrm{ otherwise. } \end{cases}$$
We note the $\mathcal {S}= \{(x,u); \|x\|_1=<x,u>\}$, and we define the functional $G$ as
$$G(x,u)=\frac{1}{2}\|Ax-d\|^{2}+ I(u)+\iota_{\cdot\geq 0}(x)+\iota_{\cdot \in \mathcal{S}}(x,u)$$
The functional (8) is continuous and biconvex with respect to $(x,u)$: the minimization of (8) with respect to $x$ while $u$ is fixed is convex, and conversely. However, globally, it is still non-convex due to the equality constraint. We can relax this constraint by introducing a penalty term, $\rho (\|x\|_1-<x,u>)$, which is based on the method of Lagrange Multipliers.

We define the Lagrangian cost function $G_\rho (x,u): \mathbb {R}^{N}\times \mathbb {R}^{N} \rightarrow \mathbb {R}$ as

$$G_\rho (x,u) = \frac{1}{2}\|Ax-d\|^{2}+I(u) +\iota_{\cdot\geq 0}(x) +\rho(\|x\|_1-<x,u>).$$
In this paper, we are interested in exact reformulation methods. This means that any minimizer of (9) must be also a minimizer of (8). We show in this paper that for $\rho >\sigma (A)\|d\|$, with $\sigma (A)$ the largest singular value of $A$, $G_\rho (x,u)$ is exact (see Section 7).

4. A minimization algorithm

The minimization algorithm is presented in this section. We refer to the constrained biconvex algorithm by CoBic and to the penalized biconvex algorithm by PeBic.

The main body of the algorithm depends on two particularities of $G_\rho$; $G_\rho$ is convex when $\rho =0$, and the non-convexity of $G_\rho$ is due to the coupling term $<x,u>$. These two properties inspire the idea of an algorithm for minimizing $G_\rho (x,u)$. The minimization is initialized with a small $\rho ^{(0)}$ and $(x^{(0)},u^{(0)})$ as zero vector. We minimizes $G_{\rho ^{(0)}}(x^{(0)},u^{(0)})$ and the result is denoted $(x^{(1)},u^{(1)})$. The penalty parameter $\rho$ increases at each iteration. For a given iteration, $p$, we minimize $G_{\rho ^{(p)}}(x^{(p)},u^{(p)})$, with $(x^{(p)},u^{(p)})$ the solution of arg min $G_{\rho ^{(p-1)}}(x^{(p-1)},u^{(p-1)})$. This method will hopefully give a proper initialization for the final minimization, which is when $\rho >\sigma (A)\|d\|$. The second attractive property of functional $G_\rho$ is the bi-convexity. Minimization by blocks is therefore suitable. With this in mind, and following [25], we propose the following algorithm.

boe-11-2-1153-i001

The Proximal Alternating Minimization algorithm (PAM) [29] minimize (10). The algorithm ensures convergence to a critical point, and thus Algorithm 1 converges to a critical point.

The PAM minimizes functions of the form

$$L(x,u)=f(x)+g(u)+Q(x,u)$$
In our case, we have, $f(x)=\frac {1}{2}\|Ax-d\|^{2}+\rho \|x\|_1+ \iota _{\cdot \geq 0}(x)$, $g(u)=I(u)$ and $Q(x,u)=-\rho <x,u>$. PAM has the following outline
$$\left\{\begin{array}{l} {\text { Repeat }} \\ {x^{(s+1)} \in \arg \min _{x}\left\{G_{\rho}\left(x, u^{(s)}\right)+\frac{1}{2 c^{(s)}}\left\|x-x^{(s)}\right\|_{2}^{2}\right\}} \\ {u^{(s+1)} \in \arg \min _{u}\left\{G_{\rho}\left(x^{(s+1)}, u\right)+\frac{1}{2 b^{(s)}}\left\|u-u^{(s)}\right\|_{2}^{2}\right\}} \\ {\text { Until convergence }} \end{array}\right.$$
$c^{(s)}$ and $b^{(s)}$ add strict convexity to each block, and $c^{(s)},b^{(s)}$ are bounded from below and above. In this work, we fix $c^{(s)} =b^{(s)}= 10^{4}$. In the following section we develop minimization schemes for (13) in the case of the constrained ($I(u$) defined by (6)) and respectively the penalized ($I(u$) defined by (7)) problem. Recall that arg min $G_{\rho }$ is defined as
$${\mathop{\arg\ \min}\limits_{x,u}} \frac{1}{2}\|Ax-d\|^{2}+I(u)+\rho(\|x\|_1-<x,u>)+\iota_{\cdot\geq 0}(x)$$
where $I(u)$ is defined by (6) or (7). Note that ADMM [30] is not suitable as the algorithm supposes a linear relation between the variables $x$ and $u$.

4.1 The minimization with respect to $x$.

The minimization with respect to $x$ using PAM is

$$x^{(s+1)}\in{\mathop{\arg \textrm{min}}\limits_{x\in \mathbb{R}^{N}}} \frac{1}{2}\|Ax-d\|^{2} + \rho(\|x\|_1 - <x,u^{(s)}>) +\frac{1}{2c^{(s)}}\|x-x^{(s)}\|_2^{2}+\iota_{\cdot\geq 0}(x)$$
which can be rewritten as
$$x^{(s+1)}\in{\mathop{\arg \textrm{min}}\limits_{x\in \mathbb{R}^{N}}} \frac{1}{2}\|Ax-d\|^{2} + \frac{1}{2c^{(s)}}\|x-(x^{(s)}+\rho c^{(s)}u^{(s)})\|^{2} + \rho\|x\|_1+\iota_{\cdot\geq 0}(x)$$
We apply the FISTA algorithm [31] to solve the above problem. This algorithm is designed to work with functionals on the form of $F(x)=f(x)+g(x)$ where $f$ is a smooth convex function with a Lipschitz continuous gradient $L(f)$. $g$ is a continuous convex function and possibly non-smooth. In our case we have
\begin{align}f(x)&=\frac{1}{2}\|Ax-d\|^{2} + \frac{1}{2c^{(s)}}\|x-(x^{(s)}+\rho c^{(s)}u^{(s)})\|^{2}\end{align}
\begin{align}g(x)&=\rho\|x\|_1 +\iota_{\cdot\geq 0}(x)\end{align}
which verify the conditions to apply FISTA. The proximal operator of $g(x)$ is the soft thresholding with positivity constraint
$$\textrm{prox}_{\frac{g}{L(f)}}(x)=\begin{cases} x_i-\frac{\rho}{L(f)} \textrm{ if } x_i>\frac{\rho}{L(f)}\\ 0 \textrm{ if } x_i \leq \frac{\rho}{L(f)} \end{cases}$$

4.2 The minimization with respect to $u$

We study how to find a solution to the convex minimization problem

$$u^{(s+1)}={\mathop{\arg \textrm{min}}\limits_{u\in \mathbb{R}^{N}}} \frac{1}{2b^{(s)}}\|u-u^{(s)}\|_2^{2}-\rho<x^{(s+1)},u> +I(u)$$
The above problem can be written as
$$ u^{(s+1)}=\underset{u \in \mathbb{R}^{N}}{\arg \min } \frac{1}{2 b^{(s)}}\left\|u-\left(u^{(s)}+\rho b^{(s)} x^{(s+1)}\right)\right\|^{2}+I(u) $$
and to simplify, we denote $z=u^{(s)}+\rho b^{(s)} x^{(s+1)}$. In the next two paragraphs, we study the above problem for $I(u)$ defined by (6) or (7).

Firstly, we work with the constrained biconvex formulation of $G_\rho$, CoBic. $I(u)$ is thus defined by (6). The minimization problem (17) can be written as

$$ u^{(s+1)}=\underset{u \in \mathbb{R}^{N}}{\arg \min } \frac{1}{2}\|u-z\|^{2} \text { s.t. }\|u\|_{1} \leq k \text { and } \forall i,-1 \leq u_{i} \leq 1 $$
The minimizer of $\operatorname{arg} \min _{u} \frac{1}{2}\|u-z\|^{2}$ is reached for $u=z$, and we can write $u^{(s+1)}=sign(z) \arg \min _{u} \frac{1}{2}\|u-|z|\|^{2}$. Furthermore, since the $\|\cdot \|_1$ is invariant with respect to the sign, we can write the minimization problem as
$$ \left|u^{(s+1)}\right|=\underset{u \in \mathbb{R}^{N}}{\arg \min } \frac{1}{2}\|u-|z|\|^{2} \text { s.t. }\|u\|_{1} \leq k \text { and } \forall i, 0 \leq u_{i} \leq 1 $$
and then $u^{(s+1)}=sign(z)|u^{(s+1)}|$. The above minimization problem is a variant of the well-known knapsack problem and can be solved using a classical minimization scheme such as [32] :
$$\begin{aligned} |u^{(s+1)}|= {\mathop{\arg \textrm{min}}\limits_{u\in \mathbb{R}^{N}}} &\frac{1}{2}<u,u>-<u,|z|> \\ &\textrm{ s.t. } \left(\sum_i u_i \right)\leq k\\ & \textrm{ and } \forall \, \, i, \, 0\leq u_i\leq 1 \end{aligned}$$
Secondly, we work with the penalized formulation of $G_\rho$, PeBic, with $I(u)$ on the penalized form (7). We write the problem as
$$ u^{(s+1)}=\underset{u \in \mathbb{R}^{N}}{\arg \min } \lambda\|u\|_{1}+\frac{1}{2 b^{(s)}, }\|u-z\|^{2}+\iota_{-1 \leq\cdot \leq 1}(u) $$
The solution is reached for
$$(u^{(s+1)})_i= \begin{cases} 1 \textrm{ if } z_i\in [1+\lambda b^{(s)}, \infty)\\ z_i-\lambda b^{(s)} \textrm{ if } z_i\in (\lambda b^{(s)},1+\lambda b^{(s)})\\ 0 \textrm{ if } z_i\in\lambda b^{(s)}[-1,1]\\ z_i+\lambda b^{(s)} \textrm{ if } z_i\in (-1-\lambda b^{(s)},-\lambda b^{(s)}) \\ -1 \textrm{ if } z_i\in (-\infty,-1-\lambda b^{(s)} ] \end{cases}.$$
The proof is given in Section 7.

5. Numerical experiments

In this section, we perform numerical experiments of CoBic and PeBic and compare them to other grid-based localization algorithms. They are compared on two datasets from the ISBI 2013 SMLM challenge [6]. A more recent challenge was launched in 2016. We decided to use the 2013 challenge as the data is denser in the 2013 challenge (220 fluorophores per acquisition in 2013 compared to 12 in the 2016 challenge). The algorithms are coded on MATLAB2015 with a computer running on Linux, with CPU INTEL core i7-3920XM, except Deep-STORM that was launched on a computer running on Linux, with CPU Intel Xeon E5-2687WV3. The code for PeBic and CoBic, as well as an example can be found at: https://github.com/abechens/CoBic-and-PeBic-SMLM/

CoBic and PeBic are compared with four algorithms; the IRL1-CEL0 [27], Deep-STORM [26], as well as the standard Iterative Hard Thresholding (IHT) [28] applied to the constrained and penalized form. The IRL1-CEL0 is an algorithm that uses the IRL1 [33] to minimize an exact relaxation [11] of the penalized formulation (3). Deep-STORM is an algorithm that uses deep-learning to localize the fluorophores. We use the public codes of Deep-STORM [34]. Note that a relaxation such that the $\ell _0$ norm in (3) is replaced by the $\ell _1$ norm is possible, however, this relaxation is not exact and a full comparison is not done. Table 3 in the Appendix shows the Jaccard index for the reconstruction. All the algorithms, except Deep-STORM, are based on the constrained or penalized problem. The IRL1-CEL0, penalized IHT, and PeBic have the trade-off parameter $\lambda$ to choose. It is not possible to select this parameter in beforehand, and substantial trial and fail to obtain a satisfying result is needed. CoBiC and the constrained IHT algorithm use the sparsity parameter $k$. This parameter ensures that the algorithm does not reconstruct more than $k$ elements non-zero in the solution. The parameter $k$ may be easier to use for two reasons, principally. The user may have an idea of the upper bound of excited fluorophores in each acquisition. If the user thinks the algorithm reconstructs too many fluorophores, he can easily reduce $k$. In the latter case, the user does not know how to change $\lambda$ to obtain the wished-for results. The Deep-STORM is a deep learning algorithm. To teach the algorithm, the user needs to create proper simulated images that represent the dataset. This requires an idea of the density, knowledge of the PSF, as well as an estimation of the noise level.

5.1 Results of the simulated dataset

The first dataset contains simulated acquisitions, which makes it possible to quantitatively evaluate the reconstruction, while the second contains real acquisitions. Figure 1 (a), (b) and (c) show three of the 361 acquisitions of the simulated dataset. We apply the localization algorithms to each image of the acquisition dataset. The sum of the results of the localization of the 361 acquisitions yields the super-resolution image. We use the Jaccard index in order to perform the numerical evaluation of the reconstructions. The Jaccard index evaluates the localization of the reconstructed fluorophores (see [6]), and is defined as the ratio between the correctly reconstructed (CR) fluorophores and the sum of CR-, false negatives (FN)- and false positives (FP) fluorophores. A perfect reconstruction yields an index of 1, and the lower the index, the poorer the reconstruction. Furthermore, the Jaccard index includes an error tolerance in its calculation.

 figure: Fig. 1.

Fig. 1. Top: (a) $1^{\textrm {st}}$, (b) $200^{\textrm {th}}$ and (c) 361$^{\textrm {th}}$ frame of the simulated high density data. Bottom: (d) Ground truth and (e) the sum of all acquisitions.

Download Full Size | PDF

$$Jac=\frac{CR}{CR+FP+FN}.$$

The ISBI simulated dataset represents eight tubes of 30 nm diameter. The acquisitions are captured with a $64\times 64$ pixels sensor where each pixel is of size $100 \times 100$ nm$^{2}$. The Point Spread Function (PSF) is modeled by a Gaussian function where the Full Width at Half Maximum (FWHM) is 258.21 nm. In total, there are 81 049 fluorophores on a total of 361 images.

We localize the fluorophores on a fine grid of $256\times 256$ pixel image, where the size of each pixel is $25nm\times 25nm$. Mathematically, this is equivalent to reconstruct $x\in \mathbb {R}^{ML\times ML}$ from an acquisition $d\in \mathbb {R}^{M\times M}$, where $M=64$ and $L=4$. Note that $L$ could be larger, but this introduces more local minima, and the results might be worse. See Table 4 in Appendix for a small comparison. The center of the pixel is used to estimate the precise position of the fluorophore.

We set $k$ equal to the average number of fluorophores for each acquisition, which is 220, known from the ground truth. In order to observe the reconstruction, we normalize the image after summing all the reconstruction. Thus the brightest points indicate strong intensity, and dark spots indicate a low intensity.

We set $\rho ^{(0)}=1$ for PeBic and CoBic. Note that a smaller $\rho$ could be chosen, but this implies longer computational time without improving the results.

CoBic reconstructs 99 non-zero pixels on average. Even though 220 fluorophores emit lights on average, several of these may be within a $25nm\times 25nm$ square, and in the grid case, this will count only for one pixel. This may be one reason for our algorithm reconstructs only 99 non-zero pixels. In addition, it may be that the proposed algorithm converges to a critical point and not a global minimum.

First, we adjust the parameters of the other algorithms such that they reconstruct on average 99 non-zero pixels in order to properly compare the Jaccard index. We initialize the IHT and IRL1-CEL0 algorithms by applying the conjugate of the operator $A$ on the acquisition as this gives the best result.

The ground truth and the sum of the 361 acquisitions can be observed Fig. 1(d) and Fig. 1(e), and the results of the reconstructions are presented in Fig. 2. The two IHT algorithms do not manage to distinguish between two tubes when they are close (see the red case in Fig. 2) compared to the other algorithms. CoBic and PeBic perform quite well, but not as well as the state-of-the-art IRL1-CEL0 and Deep-STORM. The Deep-STORM algorithm seems to reconstruct the fluorophores best visually.

 figure: Fig. 2.

Fig. 2. Reconstructed images from the simulated ISBI dataset, 99 non-zero pixels on average. Top: From left to right: (a) CoBic, (b) Constrained IHT and (c) Deep-STROM. Bottom: From left to right: (d) PeBic, (e) Penalized IHT and (f) IRL1-CEL0.

Download Full Size | PDF

The Jaccard index is shown in Table 1 for a reconstruction of on average 90 non-zero pixels, 99 non-zero pixels and 142 non-zero pixels. The last case is optimized for the IRL1-CEL0. The case of 90 non-zero pixels demonstrate the performance of the algorithms with a $k$ chosen which is not optimal for IRL1-CEL0 nor CoBic and PeBic. We observe the low Jaccard index of the IHT constrained algorithm compared to CoBic. With a precision of $150nm$ and worse, CoBic performs the best with the 99 non-zero pixels, but IRL1-CEL0 performs best overall. The Deep-STORM algorithm reconstruct images with an average of 44264 non-zero pixels. Thus, due to the high number of non-zero pixels, the calculation of the Jaccard index is too demanding, but the index would be close to 0. Most of the non-zero pixels have a low intensity, with higher intensity on the tubelins, and that is why we observe in Fig. 2 a good reconstruction. We could fix a threshold, and let all the pixels with an intensity less than the threshold be zero. However, this would not be fair to the other methods as the same operation could be performed on them. Further note that CoBic cannot reconstruct more than 99 non-zero pixels with the given initialization, and there is no value in the case of 142 non-zeros pixels in Table 1.

Tables Icon

Table 1. The Jaccard index obtained for an reconstruction of 90, 99 and 142 non zero pixels on average.

Table 2 shows the average computational time for one image acquisition from the simulated dataset. The Deep-STORM is fast and outperforms the other methods in speed. The other algorithms have not been optimized with respect to speed, and could possibly be accelerated by parallel computing and GPU computing.

Tables Icon

Table 2. Average reconstruction time for one image acquisition for IHT Constrained, IHT Penalized, IRL1-CELO, CoBic, PeBic and Deep-STORM.

However, the calibration time, the time to find the best parameters, is something that cannot be measured by a computer. The advantage of the IHT (penalized and constrained), IRL1-CEL0, PeBic and CoBic is that we can fine-tune the parameters by testing them on a few images. This is not possible with Deep-STORM as each change of parameters needs a different training set which must be simulated and then the deep neural network must be trained. The total training time is around 2 hours. In contrast to a maximum of 15 minutes if we test the parameters of another algorithm on 7-10 images. Furthermore, the constrained IHT and CoBic have $k$ as the main parameter. The parameter is quite easy to choose and to adjust after testing. The $\lambda$ for the penalized formulations is trickier to regulate as it is not possible to know how much to change it to obtain the wished-for result.

5.2 Results of the real dataset

We compare the algorithms on a high-density dataset of tubulins provided from the 2013 ISBI SMLM challenge. The dataset contains 500 acquisitions of $128\times 128$ pixels, and each pixel is of size $100 \times 100$ nm$^{2}$. The FWHM has been estimated to be 351.8 nm in [35] by averaging many fitted PSF on observed single molecules in the given dataset. We localize the fluorophores on a $512\times 512$ pixel grid. Each pixel is of size $25 \times 25$ nm$^{2}$.

We do not have any beforehand knowledge of the solution, and after trial and error, we set $k=140$ for CoBic. We choose the parameters of the other algorithms such that their reconstructions visually looks best. Figure 3 presents the reconstruction. The results are coherent with the obtained results of the simulated dataset. The IHT algorithms reconstruct not as well as the other algorithms and the penalized version is worse than the constrained version. The reconstructions obtained by the other algorithms are equivalent, with the Deep-STORM algorithm slightly better.

 figure: Fig.
					3.

Fig. 3. Reconstructed images from the real ISBI dataset. Top: From left to right: (a) CoBic, (b) Constrained IHT and (c) Deep-STROM. Bottom: From left to right: (d) PeBic, (e) Penalized IHT and (f) IRL1-CEL0.

Download Full Size | PDF

6. Conclusion

We have presented a reformulation of the $\ell _2-\ell _0$ constrained and penalized problem. We propose CoBic and PeBic, two algorithms to minimize the problems and apply them to SMLM. The algorithms are easy to implement, as each step can be decomposed to well-studied problems. We compare the algorithms to the well-known IHT algorithm on constrained and penalized form as well as the state-of-the-art relaxation CEL0. Furthermore, we compare them to a deep-learning method. CoBic and PeBic outperform the IHT algorithms visually and numerically. Compared to the IRL1-CEL0 and Deep-STORM methods, the methods are less precise. However, the choice of the parameter for CoBic may be more natural to choose for a biologist, as $k$ represents the maximum number of non-zero pixels which can be translated to the number of distinguishable fluorophores.

Even though CoBic reconstruct with high precision in the simulated dataset, we can observe that the constraint is not saturated. This may indicate that the algorithm has converged to a critical point which is not the global minimum. As a perspective, we are interested in how to avoid these non satisfactory critical points that may be saddle points. Furthermore, it seems interesting to further investigate the reformulation of the $\ell _0$-norm and to introduce it with other data-fitting terms. The acquisitions from Single-Molecule Localization microscopy has low photon count, and Poisson noise is dominant. The CEL0 penalization cannot be used with a Poisson fitting data term, but this may be possible with the proposed reformulation.

7. Mathematical justification of the algorithm

In this section we present the theoretical foundation of our work. Theorem 1 and 2 show that minimizing (9) is equivalent in terms of minimizers as minimizing (8), given $\rho$ is large enough.

This theorem differs from [24, Corollary 3.2] as their $\rho$ may be arbitrarily large in contrast to this work where $\rho$ can be calculated precisely. Furthermore, they work with a slightly different reformulation of the $\ell _0$-norm and not explicitly with the problem (2) since they assume their loss-function to be locally Lipschitzian.

First, some important notations that has not been previously stated.

  • • The subgradient of a convex function $f$ at point $x$ is the set of vectors $v$ such that
    $$\forall z\in\,dom(f)\quad f(z)\geq f(x)+v^{T}(z-x)$$
  • • The normal cone $N_C(x_0)$ of a convex set $C$ in $x_0 \in C$ is defined as
    $$N_C(x_0)=\{\eta \in \mathbb{R}^{n},<\eta,x-x_0>\leq 0\quad \forall x \in C\}.$$

Theorem 1 (Constrained form). Assume that $\rho >\sigma (A)\|d\|_2$, and $A$ is full rank. Let $G_\rho$ and $G$ be defined respectively in (9) and (8) with $I(u)$ defined by (6). We have:

  • 1. If $(x_\rho ,u_\rho )$ is a local (respectively global) minimizer of $G_\rho$, then $(x_\rho ,u_\rho )$ is a local (respectively global) minimizer of $G$.
  • 2. If $(\hat {x},\hat {u})$ is a global minimizer of $G$, then $(\hat {x},\hat {u})$ is a global minimizer of $G_\rho$.

Theorem 2 (Penalized form). Assume that $\rho >\sigma (A)\|d\|_2$, and $A$ is full rank. Let $G_\rho$ and $G$ be defined respectively in (9) and (8) with $I(u)$ defined in (7). We have:

  • 1. If $(x_\rho ,u_\rho )$ is a local (respectively global) minimizer of $G_\rho$, then we can construct $(x_\rho ,\tilde {u}_\rho )$ which is a local (respectively global) minimizer of $G$.
  • 2. If $(\hat {x},\hat {u})$ is a global minimizer of $G$, then $(\hat {x},\hat {u})$ is a global minimizer of $G_\rho$.

Some lemmas are needed to prove Theorem 1. We start with presenting and proving the lemmas, and the proofs of the theorems are presented at the end.

Lemma 1. Let $B\in \mathbb {R}^{N\times l}$ be a semi-orthogonal matrix, that is, a non-square matrix composed of orthonormal columns. Then, $B^{T}B$ is the identity matrix in $\mathbb {R}^{l\times l}$.

Lemma 2. Let $A\in \mathbb {R}^{P\times N}$, let $a_i$ denote the $i$th column of $A$. Defining $\omega$ to be a set of indices, $\omega \subseteq \{1,\dots ,N\}$. Let the restriction of $A$ to the columns indexed by the elements of $\omega$ be denoted as $A_\omega =(a_{\omega [1]},\dots ,a_{\omega [\#\omega ]})\in \mathbb {R}^{P\times \#\omega }$. Then $\|A_\omega \|\leq \|A\|$.

Proof. $A_\omega$ can be written as the product of matrix $A$ and a matrix $B$. We define the vector $e_i\in \mathbb {R}^{P}$, the unitary vector which has zeros everywhere except for the $i$-th place. The matrix $B \in \mathbb {R}^{N \times \# \omega }$ can be constructed with $e_i\, \forall i \in \omega$. The matrix $B$ is thus a semi-orthonormal matrix. The spectral norm of the matrix $B$ is 1, as $B^{T}B$ is the identity matrix (from Lemma 1). We overbound $A_\omega$ as

$$\|A_\omega\|=\|AB\|\leq \|A\|\|B\|=\|A\|$$

Lemma 3. [Pshenichnyi-Rockafellar lemma][36, Theorem 2.9.1] Assume $g$ is a proper lower semi-continuous convex function. Let $C$ be a convex set, such that $int (C) \cap dom (g)\neq \emptyset$. Then

$$ \hat{x}=\underset{x \in C}{\arg \min }\; g(x) \Leftrightarrow \boldsymbol{\theta} \in \partial g(\hat{x})+N c(\hat{x}) $$
where $N_C$ is the normal cone of the convex set $C$.

Lemma 4. Given the minimization problem

$$ \underset{x}{\arg \min } \frac{1}{2}\|A x-d\|^{2}+<w,|x|> $$
where $A\in \mathbb {R}^{P \times N}$ is a full rank matrix and $w$ a non-negative vector. $|x|$ is a vector which contains the absolute value of each component of $x$. Let $\hat {x}$ be a solution of problem (19). Then $\|A\hat {x}-d\|_2$ is bounded independently of $w$ and
$$\|A\hat{x}-d\|\leq \|d\|$$

Proof. Let $\hat {x}$ be the solution of $$ \arg \min _{x} \frac{1}{2}\|A x-d\|^{2}+<w,|x|>,$$ then we have $\forall x\in \mathbb {R}^{N}$

$$\frac{1}{2}\|A\hat{x}-d\|^{2}+<w,|\hat{x}|>\leq\frac{1}{2}\|Ax-d\|^{2}+<w,|x|>.$$
In particular, by choosing $x=0$ we have:
$$\frac{1}{2}\|A\hat{x}-d\|^{2}+<w,|\hat{x}|>\leq \frac{1}{2}\|d\|^{2}.$$
Since $w$ is a non-negative vector, the term $<w,|\hat {x}|>$ is always non-negative; therefore we have
$$\frac{1}{2}\|A\hat{x}-d\|^{2}\leq \frac{1}{2}\|d\|^{2}$$
and so
$$\|A\hat{x}-d\|\leq \|d\|.$$

Lemma 5. Let $f(x)=\frac {1}{2}\|Ax-d\|_2^{2}+<w,|x|>+\iota _{\cdot \geq 0}(x)$, $A$ be a full rank matrix and $w$ is a non-negative vector. We have the following result: If $w_i> \sigma (A)\|d\|_2$ then the optimal solution of the following optimization problem:

$$ \hat{x}=\underset{x}{\arg \min }\;f(x) $$
is achieved with $\hat {x}_i=0$.

Proof. We start by proving that $\sigma (A)\|d\|_2 \geq \left |\left (A^{T}(A\hat {x}-d)\right )_{i}\right |$. Remark that Lemma 4 is valid for problem (23), from which we have

$$\begin{aligned} \sigma(A)\|d\|_2 & \geq \sigma(A)\|A\hat{x}-d\|_2 \\ &\geq \|A^{T}\| \|A\hat{x}-d\|_2 \\ &\geq \|A^{T}(A\hat{x}-d)\|_2 \\ &\geq \|A^{T}(A\hat{x}-d)\|_\infty \\ &\geq |\left(A^{T}(A\hat{x}-d)\right)_i| \quad \forall i\in \{1,\dots,N\} \end{aligned}$$
Then, by choosing, for all $i\in [1..N]$, $w_{i}>\sigma (A)\|d\|_2$, we are sure that $w_{i}> \left |\left (A^{T}(A\hat {x}-d)\right )_{i}\right |$.

From the Pshenichnyi-Rockafellar lemma, a necessary and sufficient condition for $\hat {x}$ is a minimizer of $f$ on $C$ is that

$$\textbf{0}\in \partial f(\hat{x})+N_C(\hat{x})$$
In our case $C$ is the $\mathbb {R}^{N}_+$ and $f(x)=\frac {1}{2}\|Ax-d\|^{2}+<w,|x|>$. We have that $\partial f(x)=\partial (\frac {1}{2}\|Ax-d\|^{2}) + \partial (<w,|x|>)$ as $f(x)$ is a sum of two convex functions, where the intersection of the domains is non empty (see [37, Corollary 16.38]).

The optimal condition is therefore

$$\textbf{0}\in A^{T}(A\hat{x}-d)+\partial <w,|\hat{x}|> + N_{\mathbb{R}^{N}_+}(\hat{x})$$
where
$$(\partial <w,|\hat{x}|>)_i \begin{cases} =w_i \textrm{ if } \hat{x}_i>0\\ =-w_i \textrm{ if } \hat{x}_i<0\\ \in [-w_i,w_i] \textrm{ if } \hat{x}_i=0 \end{cases}$$
and
$$(N_{\mathbb{R}^{N}_+}(\hat{x}))_i\begin{cases} =0 \textrm{ if } \hat{x}_i>0\\ \in (-\infty,0] \textrm{ if } \hat{x}_i=0 \end{cases}$$
For $\hat {x}_{i}$ we have the following optimal condition
$$-A^{T}(A\hat{x}-d)_{i} \begin{cases} =w_{i} \textrm{ if } \hat{x}_{i}>0\\ \in [-w_{i},w_{i}] + \,\, (-\infty,0] \textrm{ if } \hat{x}_{i}=0 \end{cases}$$
If $w_i>\sigma (A)\|d\|_2$, then $|A^{T}(A\hat {x}-d)_{i}|<w_{i}$ and $\hat {x}_{i}$ cannot be strictly positive, furthermore $\hat {x}_{i}$ cannot be strictly negative since we work in the non-negative space. Therefore $\hat {x}_{i}=0$.

Lemma 6. Let $(x_\rho ,u_\rho )$ be a local minimizer of $G_\rho$ defined in (9), with $I$ on the constrained form, that is, defined as in (6). Let $G_{x_{\rho }}(u)= \frac {1}{2}\|Ax_{\rho }-d\|^{2}+ I(u) +\rho (\|x_{\rho }\|_1-<x_{\rho },u>)$. We denote $O$ the indexes of the $k$ largest values of $\{i=1\cdots N,|(x_{\rho })_i|\}$. $Q\triangleq \{i|(x_{\rho })_i>0\}$, and $S\triangleq \{j|(x_{\rho })_j<0\}$. Moreover, define $D\triangleq O \cap Q$, $L\triangleq O\cap S$ and $W\triangleq \{1,2\cdots ,N\}\backslash \{D\cup L\}$. If $\#(D\cup L)=k$, that is, $\|x_\rho \|_0\geq k$, then the minimum of $G_{x_\rho }(u)$ will be reached with $u_{\rho }$ such that

$$(u_{\rho})_i\begin{cases} =1 \textrm{ if } i\in D\\ =-1 \textrm{ if } i\in L\\ =0 \textrm{ if } i\in W\\ \end{cases}$$
If $\#(D\cup L)<k$, that is, $\|x_\rho \|_0<k$, then
$$(u_{\rho})_i\begin{cases} =1 \textrm{ if } i\in D\\ =-1 \textrm{ if } i\in L\\ \in [-1,1]\textrm{ if } i\in W\\ \end{cases}$$
such that $\sum _{i\in W} |u_i|\leq k-\#(D\cup L)$.

Proof. The minimization of $G_{x_\rho }(u)$ can be viewed as a problem of minimizing $-<x_{\rho },u> + \iota _{-1\leq \cdot \leq 1 }(u) + \iota _{\|\cdot \|_1\leq k}(u)$ by using the definition of $I(u)$. The results are obvious.

Lemma 7. Let $\rho >\sigma (A)\|d\|_2$. Let $(x_{\rho },u_{\rho })$ be a local or global minimizer of $G_\rho (x,u):=\frac {1}{2}\|Ax-d\|^{2}+ I(u) +\rho (\|x\|_1-<x,u>)$ with $I(u)$ defined as in (6) or (7). Let $\omega = \{i\in \{1,\dots ,N\}; (u_{\rho })_i=0\}$. Then $(x_{\rho })_i=0\, \forall i \in \omega$.

Proof. Let $J$ denote the set of indices: $J=\{1,\dots , N\} \backslash \omega$. If $(x_{\rho },u_{\rho })$ is a local or global minimizer of $G_\rho$ then $\forall (x,u)\in \mathcal {N}((x_{\rho },u_{\rho }),\gamma )$, where $\mathcal {N}((x_{\rho },u_{\rho }),\gamma )$ denotes a neighborhood of $(x_{\rho },u_{\rho })$ of size $\gamma$, we have

$$\begin{aligned} \frac{1}{2}\|Ax_{\rho}-d\|^{2}+\iota_{\cdot\geq 0}(x_{\rho})+I(u_{\rho})+&\rho(\|x_{\rho}\|_1-<x_{\rho},u_{\rho}>)\leq\\ &\frac{1}{2}\|Ax-d\|^{2}+\iota_{\cdot\geq 0}(x)+I(u)+\rho(\|x\|_1-<x,u>) \end{aligned}$$
By choosing $u=u_{\rho }$ and $x=\tilde {x}$ with $\tilde {x}_J=(x_{\rho })_J$ and $\tilde {x}_\omega =x_\omega$, with $(x_\omega ,(u_\rho )_\omega ) \in \mathcal {N}(((x_{\rho })_\omega ,(u_{\rho })_\omega ),\gamma )$, we have
$$\frac{1}{2}\|Ax_{\rho}-d\|^{2}+\iota_{\cdot\geq 0}(x_{\rho})+\rho\|(x_{\rho})_\omega\|_1\leq \frac{1}{2}\|A\tilde{x}-d\|^{2}+\iota_{\cdot\geq 0}(\tilde{x})+\rho\|x_\omega\|_1$$
We want to show that $(x_\rho )_\omega$ is zero. We have
$$\begin{aligned} \|Ax-d\|^{2}&=\|Ax\|^{2}+\|d\|^{2}-2<Ax,d> \\ &=\sum_i (Ax)_i^{2}+\|d\|^{2}-2\sum_i x_i(A^{T}d)_i\\ &=\sum_i\left[(\sum_{j\in J}A_{ij}x_j)^{2}+(\sum_{j\in \omega}A_{ij}x_j)^{2}\right]+\|d\|^{2}-\\ &\quad\quad 2\left[\sum_{i\in J}x_i(A^{T}d)_i+\sum_{i\in \omega}x_i(A^{T}d)_i \right] \end{aligned}$$
Using the above decomposition simplifies (26), and we have $\forall \,\,\, x_\omega$:
$$\begin{aligned} \frac{1}{2}\sum_i\left(\sum_{j\in \omega} A_{ij}(x_{\rho})_J\right)^{2}-&\sum_{i\in \omega} (x_{\rho})_i(A^{T}d)_i + \rho\|(x_{\rho})_\omega\|_1 + \iota_{\cdot\geq 0}(x_{\rho}) \leq\\ &\frac{1}{2}\sum_i\left(\sum_{j\in \omega} A_{ij}x_j\right)^{2}-\sum_{i\in \omega} x_i(A^{T}d)_i + \rho\|x_\omega\|_1+ \iota_{\cdot\geq 0}(x_{\omega}) \end{aligned}$$
Thus $(x_{\rho })_\omega$ is a solution of
$$ \underset{x_{\omega}}{\arg \min } \frac{1}{2} \sum_{i}\left(\sum_{j \in \omega} A_{i j} x_{j}\right)^{2}-\sum_{i \in \omega} x_{i}\left(A^{T} d\right)_{i}+\rho\left\|x_{\omega}\right\|_{1}+\iota_{\cdot \geq 0}\left(x_{\omega}\right) $$
or, equivalently solution of
$$ \underset{x_{\omega}}{\arg \min } \frac{1}{2}\left\|A_{\omega} x_{\omega}-d\right\|^{2}+\rho\left\|x_{\omega}\right\|_{1}+\iota_{\cdot \geq 0}\left(x_{\omega}\right) $$
where $A_{\omega }$ is the $P\times \#\omega$ submatrix of $A$ composed by the columns indexed by $\omega$ of $A$. With Lemma 2 , we have that $\sigma (A)\geq \sigma (A_{\omega })$ and if $\rho >\sigma (A)\|d\|_2$ we can apply Lemma 5 with $w$ a vector composed of $\rho$. We conclude that $(x_{\rho })_\omega =0$.

Lemma 8. If $\rho >\sigma (A)\|d\|_2$, let $(x_\rho ,u_\rho )$ be a local or global minimizer of

$$ \underset{x, u}{\arg \min } \frac{1}{2}\|A x-d\|^{2}+\iota_{\cdot \geq 0}(x)+\rho\left(\|x\|_{1}-<x, u>\right)+I(u) $$
with $I(u)$ defined as in (6), that is, the constrained form. Then $\|x_{\rho }\|_1-<x_{\rho },u_{\rho }>=0$.

Proof. From Lemma 6, we have that $(u_{\rho })_i(x_{\rho })_i=|(x_{\rho })_i| \forall \,\, i \in J$, and $(u_{\rho })_i=0 \, \forall i \in \omega$. It suffices to prove $(x_{\rho })_i=0\, \forall i \in \omega$. For that we use Lemma 7, and conclude that $(x_{\rho })_\omega =0$.

With the above lemmas we can prove Theorem 1

Proof. We start by proving the first part of the theorem. Let $(x_\rho ,u_\rho )$ be a local minimizer of $G_\rho$, with $I(u)$ on the constrained form, that is, defined as in (6). Let $\mathcal {S}= \{(x,u); \|x\|_1=<x,u>\}$. If $\rho >\sigma (A)\|d\|_2$ then, from Lemma 8,

$$(x_\rho,u_\rho) \textrm{ verifies } \|x_\rho\|_1=<x_\rho,u_\rho>.$$
Furthermore, from the definition of a minimizer, we have
$$G_\rho (x_\rho,u_\rho) \leq G_\rho(x,u)\, \, \forall (x,u)\in \mathcal{N}((x_{\rho},u_{\rho}),\gamma)$$
and so we have
$$G_\rho (x_\rho,u_\rho) \leq G_\rho(x,u) \, \, \forall (x,u)\in \mathcal{N}((x_{\rho},u_{\rho}),\gamma)\cap \mathcal{S}$$
Since $\forall (x,u) \in \mathcal {S}, G_\rho (x,u)=G(x_\rho ,u_\rho )$, we have
$$G (x_\rho,u_\rho)\leq G(x,u) \, \, \forall (x,u)\in \mathcal{N}((x_{\rho},u_{\rho}),\gamma)\cap \mathcal{S}$$
By the definition, $(x_\rho ,u_\rho )$ is also a local minimizer of $G$.

Now we prove part 2 of Theorem 1.

Let $(\hat {x},\hat {u})$ be a global minimizer of $G$. We necessarily have $\|\hat {x}\|_1=<\hat {x},\hat {u}>$. First, we show that

$$G_\rho(\hat{x},\hat{u})\leq \min G_\rho(x,u).$$
This can be shown by contradiction. Assume the opposite, and denote $(x_\rho ,u_\rho )$ a global minimizer of $G_\rho$. We then have
$$G_\rho(\hat{x},\hat{u})> \min G_\rho(x,u)=G_\rho(x_\rho,u_\rho)$$
Lemma 8 shows that $\|x_\rho \|_1=<x_\rho ,u_\rho >$, so $G_\rho (x_\rho ,u_\rho )=G(x_\rho ,u_\rho )$ and we have
$$G(\hat{x},\hat{u})=G_\rho(\hat{x},\hat{u})> \min G_\rho(x,u)=G_\rho(x_\rho,u_\rho) =G(x_\rho,u_\rho)$$
and more precisely, $G(\hat {x},\hat {u})>G(x_\rho ,u_\rho )$ which is not possible, since $(\hat {x},\hat {u})$ is a global minimizer of $G$.

We therefore have shown that $G_\rho (\hat {x},\hat {u})\leq \min G_\rho (x,u)$, and we have

$$G_\rho(\hat{x},\hat{u})\leq \min G_\rho (x,u)\leq G_\rho(x,u)\quad \forall (x,u)$$
$(\hat {x},\hat {u})$ is thus a global minimizer of $G_\rho$.

Lemma 9. Let $(x_\rho ,u_\rho )$ be a local minimizer of $G_\rho$ defined in (9), with $I$ on the penalized form defined as in (7). Let $G_{x_{\rho }}(u)= \frac {1}{2}\|Ax_{\rho }-d\|^{2}+ I(u) +\rho (\|x_{\rho }\|_1-<x_{\rho },u>)$. The minimum of $G_{x_\rho }(u)$ will be reached with a $u_{\rho }$ such that

$$(u_{\rho})_i\begin{cases} =1 \;{ iff }\; (x_{\rho})_i\in [\frac{\lambda}{\rho}, + \infty)\\ =0 \;{ iff }\; (x_{\rho})_i\in \frac{\lambda}{\rho}[-1,1]\\ =-1 \;{ iff }\; (x_{\rho})_i\in (-\infty,-\frac{\lambda}{\rho}]\\ \in (0,1) \;{ iff }\; (x_{\rho})_i =\frac{\lambda}{\rho} \\ \in (-1,0) \;{ iff }\; (x_{\rho})_i =-\frac{\lambda}{\rho} \end{cases}$$

Proof. The proof of the necessary condition:

We start by writing the optimal conditions of $G_{x_\rho }(u)$.

$$\begin{aligned} \textbf{0}&\in -\rho x_{\rho} + N_{-1\leq \cdot \leq 1}(u_{\rho}) + \begin{cases} \lambda \;{ if }\; (u_{\rho})_i>0\\ -\lambda \;{ if }\; (u_{\rho})_i<0\\ [-\lambda ,\lambda ] \;{ if }\; (u_{\rho})_i=0 \end{cases} \end{aligned}$$
We split the study of (31) in five cases.
  • • If $(u_{\rho })_i=1$
    $$0\in -\rho (x_{\rho})_i + N_{-1\leq \cdot \leq 1}((u_{\rho})_i)+ \lambda \Leftrightarrow (x_{\rho})_i\in \frac{[0,\infty)+ \lambda}{\rho}$$
    Thus, $(u_{\rho })_i=1\Rightarrow (x_{\rho })_i\in [\frac {\lambda }{\rho },+\infty )$
  • • If $0<(u_{\rho })_i<1$
    $$0\in -\rho (x_{\rho})_i + N_{-1\leq \cdot \leq 1}((u_{\rho})_i)+ \lambda \Leftrightarrow (x_{\rho})_i= \frac{\lambda}{\rho}$$
    Thus $0<(u_{\rho })_i<1 \Rightarrow (x_{\rho })_i= \frac {\lambda }{\rho }$
  • • If $(u_{\rho })_i=0$
    $$0\in -\rho (x_{\rho})_i + N_{-1\leq \cdot \leq 1}((u_{\rho})_i)+ [-\lambda,\lambda] \Leftrightarrow (x_{\rho})_i\in \frac{\lambda}{\rho}[-1,1]$$
    Thus $(u_{\rho })_i=0 \Rightarrow (x_{\rho })_i\in \frac {\lambda }{\rho }[-1,1]$
  • • If $-1<(u_{\rho })_i<0$
    $$0\in -\rho (x_{\rho})_i + N_{-1\leq \cdot \leq 1}((u_{\rho})_i)- \lambda \Leftrightarrow (x_{\rho})_i=- \frac{\lambda}{\rho}$$
    Thus $-1<(u_{\rho })_i<0\Rightarrow (x_{\rho })_i= -{\lambda }{\rho }$
  • • If $(u_{\rho })_i=-1$
    $$0\in -\rho (x_{\rho})_i + N_{-1\leq \cdot \leq 1}((u_{\rho})_i)- \lambda \Leftrightarrow (x_{\rho})_i\in \frac{(-\infty,0]- \lambda}{\rho}$$
    Thus, $u_{\rho }=-1\Rightarrow (x_{\rho })_i\in (-\infty , -\frac {\lambda }{\rho }]$
The proof of sufficient condition:

We can prove the reverse statement. Rewrite $(x_{\rho })_i=\frac {\beta }{\rho }$, for some $\beta \in \mathbb {R}$. We have then, from the optimal conditions (31) that

$$\begin{aligned} \textbf{0}&\in -\rho \frac{\beta}{\rho} + N_{-1\leq \cdot \leq 1}(u_{\rho}) + \begin{cases} \lambda \textrm{ if } (u_{\rho})_i>0\\ -\lambda \textrm{ if } (u_{\rho})_i<0\\ [-\lambda ,\lambda ] \textrm{ if } (u_{\rho})_i=0 \end{cases} \end{aligned}$$
\begin{align}0&\in [-\beta + \lambda,+\infty) \textrm{ if }(u_{\rho})_i=1\end{align}
\begin{align}0&\in -\beta + \lambda \textrm{ if }0<(u_{\rho})_i<1\end{align}
\begin{align}0&\in [-\lambda-\beta, \lambda - \beta] \textrm{ if }(u_{\rho})_i=0\end{align}
\begin{align}0&\in -\beta - \lambda \textrm{ if }-1<(u_{\rho})_i<0\end{align}
\begin{align}0&\in (-\infty,-(\beta + \lambda )] \textrm{ if }(u_{\rho})_i=-1\end{align}
Assuming $\beta >\lambda$, then only (33) is possible. If $\beta =\lambda$, then (33), (34) (35) are possible. If $0\leq \beta <\lambda$, then only (35) is possible. If $-\lambda < \beta <0$, then only (35) is possible. If $\beta =-\lambda$, then (35), (36) and (37) are possible. If $\beta <-\lambda$, then only (37) is possible.

This finishes the proof.

Lemma 10. Let $(x_\rho ,u_\rho )$ be a local or a global minimizer of $G_\rho$ for the penalized form ( $I(u$ ) defined by (7)). If $\rho >\sigma (A)\|d\|_2$ then $\forall \, i$ such that $(u_{\rho })_i=0$ we have $(x_{\rho })_i =0$

Proof. From Lemma 9, we have that $(u_{\rho })_i=0$ iff $(x_{\rho })_i\in (-\frac {\lambda }{\rho },\frac {\lambda }{\rho })$. We denote $\omega$ the set of indices where $u_{\rho }=0$, and we can apply Lemma 7, and conclude that $(x_{\rho })_{\omega }=0$.

Remark 1. If $\rho >\sigma (A)\|d\|_2$, note that the cost function $G_\rho$ with minimizers $(x_{\rho },u_{\rho })$ is constant on $|(x_{\rho })_i|=\frac {\lambda }{\rho }$ and $|(u_{\rho })_i|\in [0, 1]$.

Remark 2. In the case of the penalized form, the minimizers ($x_{\rho },u_{\rho }$) of $G_\rho$ with $\rho >\sigma (A)\|d\|_2$ may be such that $<x_{\rho },u_{\rho }> \neq \|x_{\rho }\|_1$. This may only happen if $|(x_{\rho })_i|=\frac {\lambda }{\rho }$.

Remark 3. If $\rho >\sigma (A)\|d\|_2$. From Remark 1, from a minimizer $(x_{\rho },u_{\rho })$ of $G_\rho$, we can construct a minimiser $(x_{\rho },\tilde {u}_\rho )$ of $G_\rho$ such that $\|x_{\rho }\|_1=<x_{\rho },\tilde {u}_\rho >$. This can be done by denoting $Z$, the set of indices such that $0<|(u_{\rho })_i|<1$. If $Z$ is non-empty, we have $<x_\rho ,u_\rho >\neq \|x_\rho \|_1$. From Remark 2, $|(x_{\rho })_i|=\frac {\lambda }{\rho } \forall i \in Z$. Take $\tilde {u}_{\rho i} = sign(x_i) \,\,\, \forall i \in Z$ and $\tilde {u}_{\rho i} = (u_{\rho })_i \, \forall i \notin Z$, then $<x_\rho ,\tilde {u}_\rho >=\|x_\rho \|_1$. Furthermore, $(x_\rho ,\tilde {u}_\rho )$ is a minimizer of $G_\rho$ according to Remark 1 and the fact that $G_\rho (x_\rho ,u)$ is convex with respect to $u$.

With Lemma 10 and the above remarks, we can prove Theorem 2.

Proof. We start by proving the first part of the theorem. Given $(x_\rho ,u_\rho )$ a local or global minimizer of $G_\rho$, with $I(u)$ on the penalized form, that is, defined as in (6). Let $\mathcal {S}$ denote the space where $\|x\|_1=<x,u>$. If $\rho >\sigma (A)\|d\|_2$ then, from Remark 3, we can construct $(x_\rho ,\tilde {u}_\rho )$ such that

$$(x_\rho,\tilde{u}_\rho) \textrm{ verifies } \|x_\rho\|_1=<x_\rho,\tilde{u}_\rho>.$$
Furthermore, from the definition of a minimizer, we have
$$G_\rho (x_\rho,\tilde{u}_\rho)\leq G_\rho(x,u) \, \, \forall (x,u)\in \mathcal{N}((x_{\rho},\tilde{u}_\rho),\gamma)$$
and so we get
$$G_\rho (x_\rho,\tilde{u}_\rho)\leq G_\rho(x,u) \, \, \forall (x,u)\in \mathcal{N}((x_{\rho},\tilde{u}_\rho),\gamma)\cap \mathcal{S}$$
Since $\forall (x,u) \in \mathcal {S}, G_\rho (x,u)=G(x_\rho ,u_\rho )$, we obtain
$$G (x_\rho,\tilde{u}_\rho) \leq G(x,u)\, \, \forall (x,u)\in \mathcal{N}((x_{\rho},\tilde{u}_\rho),\gamma)\cap \mathcal{S}$$
Then, $(x_\rho ,\tilde {u}_\rho )$ is also a local minimizer of $G$.

The second part of Theorem 2 can be proved as in the proof of Theorem 1.

Lemma 11. [25, Lemma 1] For any $x\in \mathbb {R}^{N}$

$$\|x\|_0 = \min_{-\textbf{1}\leq u \leq \textbf{{1}}} \|u\|_1 \textrm{ s.t } \|x\|_1= <u,x>$$

Proof. We consider first the problem

$$\min_{-\textbf{{1}}\leq u \leq \textbf{{1}}} \|u\|_1 \textit{ s.t. } |x_i|=u_ix_i \,\,\,\, \forall i$$
The equality constraint $|x_i|=u_ix_i$ and $-1\leq u_i\leq 1$ yields
$$\hat{u}_i\begin{cases} =1 \textrm{ if } x_i>0\\ =-1 \textrm{ if } x_i<0\\ \in [-1,1] \textrm{ if } x_i=0 \end{cases}$$

Proposition 1. The solution $u^{(s+1)}$ of

$$ \underset{u}{\arg \min } \lambda\|u\|_{1}+\frac{1}{2 b^{(s)}}\|u-z\|^{2}+\iota_{-1 \leq \cdot \leq 1}(u) $$
is reached for
$$(u^{(s+1)})_i= \begin{cases} 1 \;{ if }\; z_i\in [1+\lambda b^{(s)}, \infty)\\ z_i-\lambda b^{(s)} \;{ if }\; z_i\in (\lambda b^{(s)},1+\lambda b^{(s)})\\ 0 \;{ if }\; z_i\in\lambda b^{(s)}[-1,1]\\ z_i+\lambda b^{(s)} \;{ if }\; z_i\in (-1-\lambda b^{(s)},-\lambda b^{(s)}) \\ -1 \;{ if }\; z_i\in (-\infty,-1-\lambda b^{(s)} ] \end{cases}$$

Proof. A closed form expression can be found by calculating the subgradient for the problem (42) with respect to $u$. The subgradient of the box constraint $\iota _{-1\leq \cdot \leq 1}$ is 0 if $|u_i|< 1$, $[0,\infty )$ if $u_i=1$ and $(-\infty ,0]$ if $u_i=-1$. We obtain the following optimal conditions:

$$0 \in \begin{cases} \lambda + [0,\infty)+ \frac{1}{b^{(s)}}(u^{(s+1)}_i-z_i) \textrm{ if } u^{(s+1)}_i=1\\ \lambda +\frac{1}{b^{(s)}}(u^{(s+1)}_i-z_i) \textrm{ if } 1>u^{(s+1)}_i>0\\ \lambda[-1,1] -\frac{1}{b^{(s)}}(z_i) \textrm{ if } u^{(s+1)}_i=0\\ -\lambda +\frac{1}{b^{(s)}}(u^{(s+1)}_i-z_i) \textrm{ if } -1<u^{(s+1)}_i<0\\ -\lambda + (-\infty,0]+ \frac{1}{b^{(s)}}(u^{(s+1)}_i-z_i) \textrm{ if } u^{(s+1)}_i=-1\\ \end{cases}$$
and the optimal solution $u_{\rho }$ is
$$(u^{(s+1)}_{\rho})_i= \begin{cases} 1 \textrm{ if } z_i\in [1+\lambda b^{(s)}, \infty)\\ z_i-\lambda b^{(s)} \textrm{ if } z_i\in (\lambda b^{(s)},1+\lambda b^{(s)})\\ 0 \textrm{ if } z_i\in\lambda b^{(s)}[-1,1]\\ z_i+\lambda b^{(s)} \textrm{ if } z_i\in (-1-\lambda b^{(s)},-\lambda b^{(s)}) \\ -1 \textrm{ if } z_i\in (-\infty,-1-\lambda b^{(s)} ] \end{cases}$$

Appendix

Tables Icon

Table 3. Jaccard index for the $\ell _1$ relaxation with a reconstruction of 99 non-zero pixels.

Tables Icon

Table 4. Jaccard index for CoBic with L=8 and L=4 for acquisition 1, 200 and 361, with 99 non-zero pixels. Note that the index is higher when considering only these samples.

Funding

Ministère de l'Enseignement supérieur, de la Recherche et de l'Innovation; Agence Nationale de la Recherche (ANR-19-P3IA-0002).

Disclosures

The authors declare no conflicts of interest.

References

1. A. Lipson, S. G. Lipson, and H. Lipson, Optical Physics, 4th ed (Cambridge University Press, 2010).

2. A. Mishin and K. Lukyanov, “Live-cell super-resolution fluorescence microscopy,” Biochemistry (Moscow) 84(S1), 19–31 (2019). [CrossRef]  

3. S. T. Hess, T. P. Girirajan, and M. D. Mason, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophys. J. 91(11), 4258–4272 (2006). [CrossRef]  

4. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313(5793), 1642–1645 (2006). [CrossRef]  

5. M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (storm),” Nat. Methods 3(10), 793–796 (2006). [CrossRef]  

6. D. Sage, H. Kirshner, T. Pengo, N. Stuurman, J. Min, S. Manley, and M. Unser, “Quantitative evaluation of software packages for single-molecule localization microscopy,” Nat. Methods 12(8), 717–724 (2015). [CrossRef]  

7. D. Sage, T.-A. Pham, H. Babcock, T. Lukes, T. Pengo, J. Chao, R. Velmurugan, A. Herbert, A. Agrawal, S. Colabrese, A. Wheeler, A. Archetti, B. Rieger, R. Ober, G. M. Hagen, J.-B. Sibarita, J. Ries, R. Henriques, M. Unser, and S. Holden, “Super-resolution fight club: A broad assessment of 2d & 3d single-molecule localization microscopy software,” Nat. Methods 16(5), 387–395 (2019). [CrossRef]  

8. T. Takeshima, T. Takahashi, J. Yamashita, Y. Okada, and S. Watanabe, “A multi-emitter fitting algorithm for potential live cell super-resolution imaging over a wide range of molecular densities,” J. Microsc. 271(3), 266–281 (2018). [CrossRef]  

9. R. Henriques, M. Lelek, E. F. Fornasiero, F. Valtorta, C. Zimmer, and M. M. Mhlanga, “Quickpalm: 3d real-time photoactivation nanoscopy image processing in imagej,” Nat. Methods 7(5), 339–340 (2010). [CrossRef]  

10. H. Babcock, Y. M. Sigal, and X. Zhuang, “A high-density 3d localization algorithm for stochastic optical reconstruction microscopy,” Opt Nanoscopy 1(1), 6 (2012). [CrossRef]  

11. E. Soubies, L. Blanc-Féraud, and G. Aubert, “A Continuous Exact ℓ0 Penalty (CEL0) for Least Squares Regularized Problem,” SIAM J. Imaging Sci. 8(3), 1607–1639 (2015). [CrossRef]  

12. J. Min, C. Vonesch, H. Kirshner, L. Carlini, N. Olivier, S. Holden, S. Manley, J. C. Ye, and M. Unser, “Falcon: fast and unbiased reconstruction of high-density super-resolution microscopy data,” Sci. Rep. 4(1), 4577 (2015). [CrossRef]  

13. J.-J. Cao, Y.-F. Wang, and C.-C. Yang, “Seismic data restoration based on compressive sensing using regularization and zero-norm sparse optimization,” Chin. J. Geophys. 55(2), 239–251 (2012). [CrossRef]  

14. F. Nie, H. Huang, X. Cai, and C. H. Ding, “Efficient and robust feature selection via joint l2, 1-norms minimization,” in Advances in neural information processing systems, (2010), pp. 1813–1821.

15. S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process. 41(12), 3397–3415 (1993). [CrossRef]  

16. A. Beck and Y. C. Eldar, “Sparsity constrained nonlinear optimization: Optimality conditions and algorithms,” SIAM J. Optim. 23(3), 1480–1509 (2013). [CrossRef]  

17. C. Soussen, J. Idier, D. Brie, and J. Duan, “From bernoulli–gaussian deconvolution to sparse signal restoration,” IEEE Trans. Signal Process. 59(10), 4572–4584 (2011). [CrossRef]  

18. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006). [CrossRef]  

19. E. Soubies, L. Blanc-Féraud, and G. Aubert, “A unified view of exact continuous penalties for ℓ2 − ℓ0 minimization,” SIAM J. Optim. 27(3), 2034–2060 (2017). [CrossRef]  

20. S. Bourguignon, J. Ninin, H. Carfantan, and M. Mongeau, “Exact sparse approximation problems via mixed-integer programming: Formulations and computational performance,” IEEE Trans. Signal Process. 64(6), 1405–1419 (2016). [CrossRef]  

21. M. Pilanci, M. J. Wainwright, and L. El Ghaoui, “Sparse learning via boolean relaxations,” Math. Program. 151(1), 63–87 (2015). [CrossRef]  

22. S. Bi, X. Liu, and S. Pan, “Exact penalty decomposition method for zero-norm minimization based on mpec formulation,” SIAM J. Sci. Comput. 36(4), A1451–A1477 (2014). [CrossRef]  

23. A. d’Aspremont, A semidefinite representation for some minimum cardinality problems, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), vol. 5 (IEEE, 2003), pp. 4985–4990.

24. Y. Liu, S. Bi, and S. Pan, “Equivalent lipschitz surrogates for zero-norm and rank optimization problems,” J Glob Optim 72(4), 679–704 (2018). [CrossRef]  

25. G. Yuan and B. Ghanem, “Sparsity Constrained Minimization via Math. Program. with Equilibrium Constraints,” arXiv:1608.04430 (2016).

26. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-STORM: super-resolution single-molecule microscopy by deep learning,” Optica 5(4), 458–464 (2018). [CrossRef]  

27. S. Gazagnes, E. Soubies, and L. Blanc-Féraud, “High density molecule localization for super-resolution microscopy using CEL0 based sparse approximation,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), (2017). pp. 28–31.

28. P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model. Simul. 4(4), 1168–1200 (2005). [CrossRef]  

29. H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality,” Mathematics of OR 35(2), 438–457 (2010). [CrossRef]  

30. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” FNT in Machine Learning 3(1), 1–122 (2010). [CrossRef]  

31. A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imaging Sci. 2(1), 183–202 (2009). [CrossRef]  

32. S. M. Stefanov, “Convex quadratic minimization subject to a linear constraint and box constraints,” Appl. Math. Res. eXpress 2004(1), 17–42 (2004). [CrossRef]  

33. P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, “On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision,” SIAM J. Imaging Sci. 8(1), 331–372 (2015). [CrossRef]  

34. Y. Shechtman, “Software nano-bio-optics lab,” https://nanobiooptics.net.technion.ac.il/software/ (2018). [Online; accessed 2018-10-09].

35. M. Chahid, “Echantillonnage compressif appliqué à la microscopie de fluorescence et à la microscopie de super résolution,” Ph.D. thesis, Bordeaux (2014).

36. C. Zalinescu, Convex Analysis in General Vector Spaces (World scientific, 2002).

37. H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408 (Springer, 2011).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (3)

Fig. 1.
Fig. 1. Top: (a) $1^{\textrm {st}}$, (b) $200^{\textrm {th}}$ and (c) 361$^{\textrm {th}}$ frame of the simulated high density data. Bottom: (d) Ground truth and (e) the sum of all acquisitions.
Fig. 2.
Fig. 2. Reconstructed images from the simulated ISBI dataset, 99 non-zero pixels on average. Top: From left to right: (a) CoBic, (b) Constrained IHT and (c) Deep-STROM. Bottom: From left to right: (d) PeBic, (e) Penalized IHT and (f) IRL1-CEL0.
Fig.
3.
Fig. 3. Reconstructed images from the real ISBI dataset. Top: From left to right: (a) CoBic, (b) Constrained IHT and (c) Deep-STROM. Bottom: From left to right: (d) PeBic, (e) Penalized IHT and (f) IRL1-CEL0.

Tables (4)

Tables Icon

Table 1. The Jaccard index obtained for an reconstruction of 90, 99 and 142 non zero pixels on average.

Tables Icon

Table 2. Average reconstruction time for one image acquisition for IHT Constrained, IHT Penalized, IRL1-CELO, CoBic, PeBic and Deep-STORM.

Tables Icon

Table 3. Jaccard index for the 1 relaxation with a reconstruction of 99 non-zero pixels.

Tables Icon

Table 4. Jaccard index for CoBic with L=8 and L=4 for acquisition 1, 200 and 361, with 99 non-zero pixels. Note that the index is higher when considering only these samples.

Equations (91)

Equations on this page are rendered with MathJax. Learn more.

d m i n = λ 2 N A
P S F ( z , y ) = I exp ( z 2 + y 2 2 σ 2 )
d = A x + η
arg min x A x d 2 2 .
x 0 = # { x i , i = 1 , ( M L ) 2 : x i 0 }
x ^ arg min x G k ( x ) := 1 2 A x d 2 2 + ι 0 ( x )  s.t.  x 0 k
x ^ arg min x G 0 ( x ) := 1 2 A x d 2 2 + λ x 0 + ι 0 ( x )
x 0 = # { x i , i = 1 , N : x i 0 }
A = σ ( A )
ι x X ( x ) = { +  if  x X 0  if  x X
x 0 = min 1 u 1 u R N u 1  s.t  x 1 =< u , x > .
min x , u 1 2 A x d 2 + I ( u ) + ι 0 ( x )  s.t.  x 1 =< x , u >
I ( u ) = { 0  if  u 1 k  and  i , 1 u i 1  otherwise 
I ( u ) = { λ u 1  if  i , 1 u i 1  otherwise. 
G ( x , u ) = 1 2 A x d 2 + I ( u ) + ι 0 ( x ) + ι S ( x , u )
G ρ ( x , u ) = 1 2 A x d 2 + I ( u ) + ι 0 ( x ) + ρ ( x 1 < x , u > ) .
L ( x , u ) = f ( x ) + g ( u ) + Q ( x , u )
{  Repeat  x ( s + 1 ) arg min x { G ρ ( x , u ( s ) ) + 1 2 c ( s ) x x ( s ) 2 2 } u ( s + 1 ) arg min u { G ρ ( x ( s + 1 ) , u ) + 1 2 b ( s ) u u ( s ) 2 2 }  Until convergence 
arg   min x , u 1 2 A x d 2 + I ( u ) + ρ ( x 1 < x , u > ) + ι 0 ( x )
x ( s + 1 ) arg min x R N 1 2 A x d 2 + ρ ( x 1 < x , u ( s ) > ) + 1 2 c ( s ) x x ( s ) 2 2 + ι 0 ( x )
x ( s + 1 ) arg min x R N 1 2 A x d 2 + 1 2 c ( s ) x ( x ( s ) + ρ c ( s ) u ( s ) ) 2 + ρ x 1 + ι 0 ( x )
f ( x ) = 1 2 A x d 2 + 1 2 c ( s ) x ( x ( s ) + ρ c ( s ) u ( s ) ) 2
g ( x ) = ρ x 1 + ι 0 ( x )
prox g L ( f ) ( x ) = { x i ρ L ( f )  if  x i > ρ L ( f ) 0  if  x i ρ L ( f )
u ( s + 1 ) = arg min u R N 1 2 b ( s ) u u ( s ) 2 2 ρ < x ( s + 1 ) , u > + I ( u )
u ( s + 1 ) = arg min u R N 1 2 b ( s ) u ( u ( s ) + ρ b ( s ) x ( s + 1 ) ) 2 + I ( u )
u ( s + 1 ) = arg min u R N 1 2 u z 2  s.t.  u 1 k  and  i , 1 u i 1
| u ( s + 1 ) | = arg min u R N 1 2 u | z | 2  s.t.  u 1 k  and  i , 0 u i 1
| u ( s + 1 ) | = arg min u R N 1 2 < u , u > < u , | z | >  s.t.  ( i u i ) k  and  i , 0 u i 1
u ( s + 1 ) = arg min u R N λ u 1 + 1 2 b ( s ) , u z 2 + ι 1 1 ( u )
( u ( s + 1 ) ) i = { 1  if  z i [ 1 + λ b ( s ) , ) z i λ b ( s )  if  z i ( λ b ( s ) , 1 + λ b ( s ) ) 0  if  z i λ b ( s ) [ 1 , 1 ] z i + λ b ( s )  if  z i ( 1 λ b ( s ) , λ b ( s ) ) 1  if  z i ( , 1 λ b ( s ) ] .
J a c = C R C R + F P + F N .
z d o m ( f ) f ( z ) f ( x ) + v T ( z x )
N C ( x 0 ) = { η R n , < η , x x 0 >≤ 0 x C } .
A ω = A B A B = A
x ^ = arg min x C g ( x ) θ g ( x ^ ) + N c ( x ^ )
arg min x 1 2 A x d 2 + < w , | x | >
A x ^ d d
arg min x 1 2 A x d 2 + < w , | x | > ,
1 2 A x ^ d 2 + < w , | x ^ | >≤ 1 2 A x d 2 + < w , | x | > .
1 2 A x ^ d 2 + < w , | x ^ | >≤ 1 2 d 2 .
1 2 A x ^ d 2 1 2 d 2
A x ^ d d .
x ^ = arg min x f ( x )
σ ( A ) d 2 σ ( A ) A x ^ d 2 A T A x ^ d 2 A T ( A x ^ d ) 2 A T ( A x ^ d ) | ( A T ( A x ^ d ) ) i | i { 1 , , N }
0 f ( x ^ ) + N C ( x ^ )
0 A T ( A x ^ d ) + < w , | x ^ | > + N R + N ( x ^ )
( < w , | x ^ | > ) i { = w i  if  x ^ i > 0 = w i  if  x ^ i < 0 [ w i , w i ]  if  x ^ i = 0
( N R + N ( x ^ ) ) i { = 0  if  x ^ i > 0 ( , 0 ]  if  x ^ i = 0
A T ( A x ^ d ) i { = w i  if  x ^ i > 0 [ w i , w i ] + ( , 0 ]  if  x ^ i = 0
( u ρ ) i { = 1  if  i D = 1  if  i L = 0  if  i W
( u ρ ) i { = 1  if  i D = 1  if  i L [ 1 , 1 ]  if  i W
1 2 A x ρ d 2 + ι 0 ( x ρ ) + I ( u ρ ) + ρ ( x ρ 1 < x ρ , u ρ > ) 1 2 A x d 2 + ι 0 ( x ) + I ( u ) + ρ ( x 1 < x , u > )
1 2 A x ρ d 2 + ι 0 ( x ρ ) + ρ ( x ρ ) ω 1 1 2 A x ~ d 2 + ι 0 ( x ~ ) + ρ x ω 1
A x d 2 = A x 2 + d 2 2 < A x , d > = i ( A x ) i 2 + d 2 2 i x i ( A T d ) i = i [ ( j J A i j x j ) 2 + ( j ω A i j x j ) 2 ] + d 2 2 [ i J x i ( A T d ) i + i ω x i ( A T d ) i ]
1 2 i ( j ω A i j ( x ρ ) J ) 2 i ω ( x ρ ) i ( A T d ) i + ρ ( x ρ ) ω 1 + ι 0 ( x ρ ) 1 2 i ( j ω A i j x j ) 2 i ω x i ( A T d ) i + ρ x ω 1 + ι 0 ( x ω )
arg min x ω 1 2 i ( j ω A i j x j ) 2 i ω x i ( A T d ) i + ρ x ω 1 + ι 0 ( x ω )
arg min x ω 1 2 A ω x ω d 2 + ρ x ω 1 + ι 0 ( x ω )
arg min x , u 1 2 A x d 2 + ι 0 ( x ) + ρ ( x 1 < x , u > ) + I ( u )
( x ρ , u ρ )  verifies  x ρ 1 =< x ρ , u ρ > .
G ρ ( x ρ , u ρ ) G ρ ( x , u ) ( x , u ) N ( ( x ρ , u ρ ) , γ )
G ρ ( x ρ , u ρ ) G ρ ( x , u ) ( x , u ) N ( ( x ρ , u ρ ) , γ ) S
G ( x ρ , u ρ ) G ( x , u ) ( x , u ) N ( ( x ρ , u ρ ) , γ ) S
G ρ ( x ^ , u ^ ) min G ρ ( x , u ) .
G ρ ( x ^ , u ^ ) > min G ρ ( x , u ) = G ρ ( x ρ , u ρ )
G ( x ^ , u ^ ) = G ρ ( x ^ , u ^ ) > min G ρ ( x , u ) = G ρ ( x ρ , u ρ ) = G ( x ρ , u ρ )
G ρ ( x ^ , u ^ ) min G ρ ( x , u ) G ρ ( x , u ) ( x , u )
( u ρ ) i { = 1 i f f ( x ρ ) i [ λ ρ , + ) = 0 i f f ( x ρ ) i λ ρ [ 1 , 1 ] = 1 i f f ( x ρ ) i ( , λ ρ ] ( 0 , 1 ) i f f ( x ρ ) i = λ ρ ( 1 , 0 ) i f f ( x ρ ) i = λ ρ
0 ρ x ρ + N 1 1 ( u ρ ) + { λ i f ( u ρ ) i > 0 λ i f ( u ρ ) i < 0 [ λ , λ ] i f ( u ρ ) i = 0
0 ρ ( x ρ ) i + N 1 1 ( ( u ρ ) i ) + λ ( x ρ ) i [ 0 , ) + λ ρ
0 ρ ( x ρ ) i + N 1 1 ( ( u ρ ) i ) + λ ( x ρ ) i = λ ρ
0 ρ ( x ρ ) i + N 1 1 ( ( u ρ ) i ) + [ λ , λ ] ( x ρ ) i λ ρ [ 1 , 1 ]
0 ρ ( x ρ ) i + N 1 1 ( ( u ρ ) i ) λ ( x ρ ) i = λ ρ
0 ρ ( x ρ ) i + N 1 1 ( ( u ρ ) i ) λ ( x ρ ) i ( , 0 ] λ ρ
0 ρ β ρ + N 1 1 ( u ρ ) + { λ  if  ( u ρ ) i > 0 λ  if  ( u ρ ) i < 0 [ λ , λ ]  if  ( u ρ ) i = 0
0 [ β + λ , + )  if  ( u ρ ) i = 1
0 β + λ  if  0 < ( u ρ ) i < 1
0 [ λ β , λ β ]  if  ( u ρ ) i = 0
0 β λ  if  1 < ( u ρ ) i < 0
0 ( , ( β + λ ) ]  if  ( u ρ ) i = 1
( x ρ , u ~ ρ )  verifies  x ρ 1 =< x ρ , u ~ ρ > .
G ρ ( x ρ , u ~ ρ ) G ρ ( x , u ) ( x , u ) N ( ( x ρ , u ~ ρ ) , γ )
G ρ ( x ρ , u ~ ρ ) G ρ ( x , u ) ( x , u ) N ( ( x ρ , u ~ ρ ) , γ ) S
G ( x ρ , u ~ ρ ) G ( x , u ) ( x , u ) N ( ( x ρ , u ~ ρ ) , γ ) S
x 0 = min 1 u {1} u 1  s.t  x 1 =< u , x >
min {1} u {1} u 1  s.t.  | x i | = u i x i i
u ^ i { = 1  if  x i > 0 = 1  if  x i < 0 [ 1 , 1 ]  if  x i = 0
arg min u λ u 1 + 1 2 b ( s ) u z 2 + ι 1 1 ( u )
( u ( s + 1 ) ) i = { 1 i f z i [ 1 + λ b ( s ) , ) z i λ b ( s ) i f z i ( λ b ( s ) , 1 + λ b ( s ) ) 0 i f z i λ b ( s ) [ 1 , 1 ] z i + λ b ( s ) i f z i ( 1 λ b ( s ) , λ b ( s ) ) 1 i f z i ( , 1 λ b ( s ) ]
0 { λ + [ 0 , ) + 1 b ( s ) ( u i ( s + 1 ) z i )  if  u i ( s + 1 ) = 1 λ + 1 b ( s ) ( u i ( s + 1 ) z i )  if  1 > u i ( s + 1 ) > 0 λ [ 1 , 1 ] 1 b ( s ) ( z i )  if  u i ( s + 1 ) = 0 λ + 1 b ( s ) ( u i ( s + 1 ) z i )  if  1 < u i ( s + 1 ) < 0 λ + ( , 0 ] + 1 b ( s ) ( u i ( s + 1 ) z i )  if  u i ( s + 1 ) = 1
( u ρ ( s + 1 ) ) i = { 1  if  z i [ 1 + λ b ( s ) , ) z i λ b ( s )  if  z i ( λ b ( s ) , 1 + λ b ( s ) ) 0  if  z i λ b ( s ) [ 1 , 1 ] z i + λ b ( s )  if  z i ( 1 λ b ( s ) , λ b ( s ) ) 1  if  z i ( , 1 λ b ( s ) ]
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.