Computing a projection operator onto the null space of a linear imaging operator: tutorial

Joseph Kuo; Jason Granstedt; Umberto Villa; Mark A. Anastasio; Mark A. Anastasio; Mark A. Anastasio

doi:10.1364/JOSAA.443443

1. INTRODUCTION

Digital imaging systems are described by operators that map one or more object properties to a collection of discrete measurements. Such operators will be referred to as imaging operators and, in many applications, can be approximated as linear operators. As such, concepts from operator theory can be applied to analyze and understand fundamental properties of an imaging system. For example, the existence, uniqueness, and stability of inverse problems associated with image formation can be established by analyzing the associated imaging operator via an operator–theoretic approach [1–5]. In particular, the issue of uniqueness is central to the topic of this tutorial paper.

Even in the contrived situation where measurement noise is absent, imaging operators are generally “blind” to certain components of the object, which reflects that there is information lost in the imaging process. Mathematically, this is explained by the fact that the imaging operator can possess a null space, which is also sometimes referred to as a kernel [2,4,6]. By definition, all objects in the null space are mapped to a collection of identically zero measurements and are hence invisible to the imaging system. In fact, a digital imaging system is properly described as a continuous-to-discrete (CD) mapping that inherently possesses an infinite dimensional null space [3]. In practice, a finite-dimensional representation of an object is sometimes introduced and an approximate discrete-to-discrete (DD) imaging operator is employed to describe a digital imaging system [7–9]. In such cases, the DD imaging operator will have a null space when the number of independent measurements is less than the dimension of the object representation.

Characterizing the null space of an imaging operator is of fundamental importance to the analysis of imaging systems. A characterization of an imaging operator’s null space can reveal the nature of the object features that an imaging system can or cannot measure [10]. This can facilitate the design and/or comparison of imaging systems and data-acquisition designs [11–15]. Additionally, a characterization of the null space can facilitate the design of regularization strategies for image reconstruction methods [16–22]. For example, one such method called “nullspace shuttles” exploits the structure of the null space to enable explicit estimation of the null space component of the object while preserving the fidelity of the overall solution to the measured data to a specified degree [21]. A modern manifestation of this idea that involves deep learning has also been developed [19]. Further, the ability to characterize the null space of an imaging operator is also essential for understanding and illustrating the effect of a Bayesian prior imposed by a reconstruction method and the formation of false structures or so-called image hallucinations [23].

The singular value decomposition (SVD) is a classic method for operator decomposition and yields a characterization of the null space of an imaging operator [24,25]. Specifically, the singular vectors that reside in the domain of the imaging operator and correspond to zero-valued singular values span the null space. From these singular vectors, a projection operator, which can map any object in the domain of the imaging operator to the null space, can be readily formed. As a result, the establishment of an orthogonal null space projection operator is, in a practical sense, equivalent to characterizing the null space. While the SVD has been employed to analyze certain imaging systems, its application has been severely limited due to the challenging nature of computing the SVD for even modestly sized problems.

In this tutorial, computational procedures for projecting an object to the null space of a discrete-to-discrete linear imaging operator are surveyed. These include an implicit optimization-based method for projecting a single object to the null space [26] and the randomized SVD procedure [27] for establishing an explicit null space projection operator. A new learning-based approach that employs a linear autoencoder-based neural network to establish an explicit null space projection operator [28] is also presented. The procedures are demonstrated by use of biomedical imaging examples, and their computational complexities and memory requirements are compared.

2. BACKGROUND

A. Imaging Operators as Mappings between Hilbert Spaces

A linear imaging system can be described as

(1)$${\textbf{g}} = {\textbf{Hf}},$$

where ${\textbf{f}} \in {\mathbb U}$, ${\textbf{g}} \in \mathbb V$, and ${\textbf{H}}:{\mathbb U} \to \mathbb V$ is a mapping between Hilbert spaces ${\mathbb U}$ and $\mathbb V$. In this discussion, measurement noise is neglected. Although digital imaging systems are inherently described by continuous-to-discrete (CD) mappings, they are often approximately described by discrete-to-discrete (DD) imaging models that are described by mappings between finite-dimensional Euclidean spaces. In this case, ${\mathbb U} = {\mathbb E}^{N}$, $\mathbb V = {\mathbb E}^{M}$, and ${\textbf{H}} \in {\mathbb E}^{{M \times N}}$ represent a DD imaging operator that maps ${\mathbb E}^{N} \to {\mathbb E}^{M}$. In the methods described later, it will be assumed that a DD imaging model is a sufficiently accurate representation of the true CD imaging model that describes a digital imaging system, and the impact of model error will be neglected.

Let ${f_{1,n}}$ denote the $n$ th component of the vector ${{\textbf{f}}_1^{}}$ and let ${\textbf{f}}_1^{\, \dagger} $ denote the conjugate transpose of ${{\textbf{f}}_1}$. The inner product for the Euclidean space ${\mathbb U}$, denoted as ${({{\textbf{f}}_1},{{\textbf{f}}_2})_{{\mathbb U}}}$, is given by

(2)$${({{\textbf{f}}_1},{{\textbf{f}}_2})_{{\mathbb U}}} \equiv {\textbf{f}}_1^{\,\dagger} {{\textbf{f}}_2^{}} \equiv \sum\limits_{n = 1}^N f_{1,n}^ * {f_{2,n}^{}},$$

for any ${{\textbf{f}}_1},{{\textbf{f}}_2} \in {\mathbb U}$. Above, the superscript $*$ denotes the complex conjugate of a complex-valued scalar.

The inner product for $\mathbb V$ will be defined similarly and denoted as ${({{\textbf{g}}_1},{{\textbf{g}}_2})_{\mathbb V}}$, for any ${{\textbf{g}}_1},{{\textbf{g}}_2} \in \mathbb V$. The adjoint of ${\textbf{H}}$ is denoted as ${{\textbf{H}}^\dagger}$ and is defined as ${({\textbf{g}},{\textbf{Hf}})_{\mathbb V}} = ({{\textbf{H}}^\dagger}{\textbf{g}},{\textbf{f}}{)_{{\mathbb U}}}$ for arbitrary ${\textbf{f}}$ and ${\textbf{g}}$.

The topic of null space projection operators is central to this tutorial and will be introduced below. A generic projection operator that maps an object ${\textbf{f}}$ to a subspace of ${\mathbb U}$ is a linear operator ${\textbf{P}}:{\mathbb U} \to {\mathbb U}$ such that ${{\textbf{P}}^2} = {\textbf{P}}$ [3] and implements an orthogonal projection when ${({{\textbf{f}}_1},{\textbf{P}}{{\textbf{f}}_2})_{{\mathbb U}}} = ({\textbf{P}}{{\textbf{f}}_1},{{\textbf{f}}_2}{)_{{\mathbb U}}}$ for any choice of ${{\textbf{f}}_1}$, ${{\textbf{f}}_2} \in {\mathbb U}$.

B. Null and Measurable Subspaces

It is natural and useful to consider two subspaces of the space ${\mathbb U}$ that represents the domain of ${\textbf{H}}$: the null space $N({\textbf{H}})$ and measurable space ${N_ \bot}({\textbf{H}})$. The null space of ${\textbf{H}}$ is defined as

(3)$$N({\textbf{H}}) = \{{\textbf{f}} \in {\mathbb U}|{\textbf{Hf}} = {\textbf{0}}\} $$

and is considered trivial only if it contains the zero-element [3]. As mentioned, the null space is of fundamental importance to the analysis of imaging systems. By definition, any component of an object that resides in the null space is invisible to an associated imaging system, even in the absence of measurement noise. However, when the null space can be characterized, the imaging system design can be optimized so that object features that reside in the null space are unlikely to have an impact on the utility of the measured images for specific inferences. Accordingly, there is an important need for methods that can characterize the null spaces of imaging operators.

By contrast, the measurable space ${N_ \bot}({\textbf{H}})$ is the orthogonal complement of the null space and contains the component of the object that an imaging system can measure. Thus, any object ${\textbf{f}}$ can be uniquely expressed as

(4)$${\textbf{f}} = {{\textbf{f}}_{\rm{meas}}} + {{\textbf{f}}_{\rm{null}}},$$

where ${{\textbf{f}}_{\rm{meas}}} \in {N_ \bot}({\textbf{H}})$, ${{\textbf{f}}_{\rm{null}}} \in N({\textbf{H}})$, and ${({{\textbf{f}}_{\rm{meas}}},{{\textbf{f}}_{\rm{null}}})_{{\mathbb U}}} = 0$. A DD operator has a nontrivial null space when ${\textbf{H}}$ is not full-rank, indicating exact inversion of Eq. (1) is impossible.

C. Singular Value Decomposition (SVD)

The SVD is a useful tool for analyzing linear operators and is a generalization of spectral decomposition for Hermitian operators [3,29,30]. The right singular vectors $\{{{\textbf{u}}_n}\}$ are the $N$ eigenvectors of ${{\textbf{H}}^\dagger}{\textbf{H}}$ defined as ${{\textbf{H}}^\dagger}{\textbf{H}}{{\textbf{u}}_n} = {\mu _n}{{\textbf{u}}_n}$. Correspondingly, the left singular vectors $\{{{\textbf{v}}_m}\}$ are the $M$ eigenvectors of ${\textbf{H}}{{\textbf{H}}^\dagger}$ defined as ${\textbf{H}}{{\textbf{H}}^\dagger}{{\textbf{v}}_m} = {\mu _m}{{\textbf{v}}_m}$. The $N$ singular vectors $\{{{\textbf{u}}_n}\}$ form an orthonormal basis for ${\mathbb U}$ and the $M$ singular vectors $\{{{\textbf{v}}_m}\}$ form an orthonormal basis for $\mathbb V$. The singular vectors are related such that ${\textbf{H}}{{\textbf{v}}_n} = \sqrt {{\mu _n}} {{\textbf{u}}_n}$, where $\sqrt {{\mu _n}}$ is the associated scalar known as a singular value. The SVD takes the form

(5)$${\textbf{H}} = \sum\limits_{n = 1}^N \sqrt {{\mu _n^{}}} {{\textbf{v}}_n^{}}{\textbf{u}}_n^ \dagger ,$$

where ${{\textbf{v}}_n^{}}{\textbf{u}}_n^\dagger$ denotes an outer product. The rank of ${\textbf{H}}$ is denoted as $R$ and corresponds to the number of nonzero singular values. When the imaging operator is rank deficient $(R \lt N)$, the upper summation limit in Eq. (5) can be changed from $N$ to $R$. For a DD imaging operator, the dimensions of the null and measurable spaces are related as $K + R = N$, where $K \equiv {\rm{Dim}}(N({\textbf{H}}))$ and $R \equiv {\rm{Dim}}({N_ \bot}({\textbf{H}}))$. It is sometimes convenient to represent the SVD as a matrix product. Assuming the singular vectors are column vectors, this takes the form

(6)$${\textbf{H}} = {\textbf{V}}\Sigma {{\textbf{U}}^\dagger},$$

where ${\textbf{V}} = [{{\textbf{v}}_1},{{\textbf{v}}_2},\ldots,{{\textbf{v}}_R}]$, $\Sigma = {\rm{diag}}([\sqrt {{\mu _1}} ,\sqrt {{\mu _2}} ,\ldots,\sqrt {{\mu _R}}\, ])$, and ${\textbf{U}} = [{{\textbf{u}}_1},{{\textbf{u}}_2},\ldots,{{\textbf{u}}_R}]$.

D. Projecting onto the Null and Measurable Space

Projection operators can be defined for the null and measurable spaces. The null space projection operator is a mapping

(7)$${{\textbf{P}}_{{\rm{null}}}}:D({\textbf{H}}) \to N({\textbf{H}}),$$

where $D({\textbf{H}})$ represents the domain of ${\textbf{H}}$. Similarly, the corresponding projection operator for the measurable space is a mapping

(8)$${{\textbf{P}}_{{\rm{meas}}}}:D({\textbf{H}}) \to {N_ \bot}({\textbf{H}}).$$

The null and measurement projection operators are related as

(9)$${{\textbf{P}}_{{\rm{null}}}} = {{\textbf{I}}_N} - {{\textbf{P}}_{{\rm{meas}}}}.$$

Two separate descriptions of projection operators for the null and measurable spaces based on the SVD are presented below. A projection operator for the measurement space can be computed from the singular vectors ${{\textbf{u}}_n}$ associated with nonzero singular values as

(10)$${{\textbf{P}}_{{\rm{meas}}}} = \sum\limits_{n = 1}^R {{\textbf{u}}_n^{}}{\textbf{u}}_n^\dagger .$$

Similarly, the null space projection operator can be computed as

(11)$${{\textbf{P}}_{{\rm{null}}}} = \sum\limits_{n = R + 1}^N {{\textbf{u}}_n^{}}{\textbf{u}}_n^ \dagger .$$

The second method for computing the null space projection operator employs the Moore–Penrose (MP) pseudoinverse, represented as ${{\textbf{H}}^ +}$. The MP pseudoinverse is a generalization of the inverse that can be applied to estimate ${\textbf{f}}$ in inverse problems where ${\textbf{H}}$ is rank-deficient. It can be computed as

(12)$${{\textbf{H}}^ +} = \sum\limits_{n = 1}^R \frac{1}{{\sqrt {{\mu _n}}}}{{\textbf{u}}_n^{} }{\textbf{v}}_n^ \dagger .$$

In terms of ${{\textbf{H}}^ +}$, the measurement and null space projection operators can be computed as ${{\textbf{P}}_{{\rm{meas}}}} = {{\textbf{H}}^ +}{\textbf{H}}$ and ${{\textbf{P}}_{{\rm{null}}}} = {{\textbf{I}}_N} - {{\textbf{H}}^ +}{\textbf{H}}$, where ${{\textbf{I}}_N}$ represents the identity matrix of size $N$.

Note that, in practice, the SVD is infeasible to compute for large-scale imaging operators, as it requires storage of both ${M^2}$ and ${N^2}$ values for the singular vectors and ${\cal O}(M{N^2} + {N^3})$ for the floating point operations (FLOPs) [29].

3. COMPUTATIONAL PROCEDURES FOR PROJECTING AN OBJECT ONTO THE NULL SPACE

In this section, three methods for computing the null space projection operator of a linear imaging operator are surveyed. All three methods circumvent computational limitations associated with performing SVD on large imaging operators. While linear algebra routines to compute the SVD require ${\textbf{H}}$ to be explicitly represented as a matrix [31], the methods surveyed here allow for the actions of ${\textbf{H}}$ and ${{\textbf{H}}^\dagger}$ to be computed on-the-fly through matrix-vector and the corresponding adjoint-matrix-vector products. The computational cost of such methods will be represented in terms of the number of matrix-vector/adjoint-matrix-vector products (MVPs) and dense linear algebra FLOPs necessary to construct the orthogonal projector ${{\textbf{P}}_{{\rm{null}}}}$. Additionally, the space complexity, defined as the amount of memory required to solve a computational problem as a function of its inputs [32] of such methods will be compared.

A. Wilson–Barrett Method

As mentioned in Section 2.D, the null space projection operator can be formulated using the MP pseudoinverse. However, explicitly forming the projection operator from the pseudoinverse is not computationally feasible for large systems, as ${{\textbf{H}}^ +}{\textbf{H}}$ is intractable. Rather than explicitly forming the MP pseudoinverse, an alternative approach is to estimate the product $({{\textbf{I}}_N} - {{\textbf{H}}^ +}{\textbf{H}}){\textbf{f}}$ through iterative algorithms. These algorithms project to the null space implicitly; further, while they can be utilized to assemble an explicit projection operator, the primary strength of these iterative methods is to project a particular object onto the null space.

One class of algorithms for estimating the null components of objects for large-scale linear systems is based upon stationary iterative methods for image reconstruction [24]. Stationary iterative methods compute a least-square solution of the linear system in Eq. (1) by computing a fixed point of the linear transformation

(13)$${{\textbf{f}}^ {\;(t)}} = {\textbf{A}}{{\textbf{f}}^ {\;(t - 1)}} + {\textbf{b}},$$

for a given initial guess ${{\textbf{f}}^ {\;(0)}}$. Above, the iteration operator ${\textbf{A}}$ and forcing term ${\textbf{b}}$, whose implementation is specific to the stationary method of choice, involve the imaging operator ${\textbf{H}}$ and the data ${\textbf{g}}$ but do not depend on iteration count $t$. Furthermore, these algorithms allow for large-scale imaging operators to be considered, as applying the iteration operator ${\textbf{A}}$ to a vector ${{\textbf{f}}^ {\;(t - 1)}}$ only require access to matrix-vector and adjoint-matrix-vector products with the imaging operator ${\textbf{H}}$. The key observation is that some of these methods, such as the Landweber iteration [33] and simultaneous algebraic reconstruction technique [10,34], converge even when the imaging operator is not full rank. In such cases, assuming that the data ${\textbf{g}}$ is noiseless, it is possible to show (cf. [34,35]) that the sequence of ${{\textbf{f}}^ {\;(t)}}$ converges to a fixed point ${{\textbf{f}}^{\;\star}}$ such as

(14)$${{{\textbf{f}}^{\;\star}} = \mathop{\arg\!\min\limits_{{\textbf{f}} \in {\mathbb U}}}\parallel {\textbf{f}} - {{\textbf{f}}^{(0)}}{\parallel ^2}}\\{{\rm{subject}}\;{\rm{to:}}\quad {\textbf{Hf}} = {\textbf{g}}.}$$

Note that these methods only update the measurable component of ${{\textbf{f}}^{(t)}}$; the null space component is left unaltered from the initial guess ${{\textbf{f}}^{(0)}}$. Due to the orthogonality of the decomposition in Eq. (4), the null and measurable space components of an object ${\textbf{f}}$ are solutions of Eq. (14) with ${{\textbf{f}}^{(0)}} = {\textbf{f}},{\textbf{g}} = {\textbf{0}}$ and ${{\textbf{f}}^{(0)}} = {\textbf{0}},{\textbf{g}} = {\textbf{Hf}}$, respectively.

Algorithm 1. Wilson–Barrett (Landweber) method for computing ${{\textbf{f}}_{{\rm{null}}}}$

View Table | View all tables in this article

The use of a stationary iterative method to compute the null component of an object was first discussed by Wilson and Barrett [10]. This method, referred to as the Wilson–Barrett (WB) hereafter, is based on Landweber iteration and is summarized in Algorithm 1. The algorithm performs fixed point iterations of the form in Eq. (13) with iteration operator ${\textbf{A}} = {\textbf{I}} - \alpha {{\textbf{H}}^\dagger}{\textbf{H}}$ and right-hand side ${\textbf{b}} = {\textbf{0}}$. Here, the step size $\alpha$ must satisfy $\alpha \le 2\mu _1^{- 1}$, where ${\mu _1}$ is the largest eigenvalue of ${{\textbf{H}}^\dagger}{\textbf{H}}$. The algorithm stops at iteration $t$ if $\parallel {{\textbf{r}}^{(t)}}\parallel = \parallel {\textbf{H}}{{\textbf{r}}^{(t)}}\parallel \le \varepsilon$ for a given tolerance $\varepsilon \ge 0$.

Let $\kappa = \sqrt {\frac{{{\mu _1}}}{{{\mu _R}}}}$ denote the spectral condition number of ${\textbf{H}}$. In terms of computational complexity, ${\cal O}({\kappa ^2})$ iterations are necessary to achieve convergence [36], and each iteration requires a matrix-vector and adjoint-matrix-vector product. Memory requirements are modest, as only two vectors are required to be stored: one of size $N$ and the other of size $M$.

Although the WB method was established to estimate the null component for a single object, an explicit projection operator can be formed by applying the algorithm to a set of objects. To construct an orthogonal basis of the null space, the WB method can be applied to $K + C$ objects, where $C$ is a small constant ($C = 5 \sim 20$) representing the degree of oversampling needed to ensure with high probability that a complete basis of the null space components is contained in the set of input objects [37]. These objects can be randomly sampled from an i.i.d. Gaussian distribution. The outputs of the WB method for these objects form a basis for $N({\textbf{H}})$ after removal of linear dependencies. This basis can then be made orthonormal by the use of a thin (or economy size) QR decomposition [29], which requires ${\cal O}(N{(K + C)^2})$ FLOPs. Here, the thin QR decomposition of a matrix ${\textbf{A}} \in {\mathbb E}^{{N \times K}}$ ($K \lt N$) is given by ${\textbf{A}} = {\textbf{QR}}$, where ${\textbf{Q}} \in {\mathbb E}^{{N \times K}}$ is a matrix with orthogonormal columns and ${\textbf{R}} \in {\mathbb E}^{{K \times K}}$ is an upper triangular matrix.

It is also worth noting that, for high ranked deficient imaging operators, that is if $R \lt K$, it is computationally cheaper to construct a basis of ${N_ \bot}({\textbf{H}})$ by means of a modified version of Algorithm 1, and define ${{\textbf{P}}_{{\rm{null}}}}$ using Eqs. (9) and (10). In summary, the computational cost of explicitly constructing ${{\textbf{P}}_{{\rm{null}}}}$ with the Wilson–Barrett method is ${\cal O}((\min (K,R) + C){\kappa ^2})$ MVPs and ${\cal O}(N{(\min (K,R) + C)^2})$ FLOPs. The space complexity is ${\cal O}((\min (K,R) + C)N)$.

B. Randomized Singular Value Decomposition

Although computing the SVD of an imaging operator is often computationally intractable, it may be feasible to extract a small number of dominant singular values and vectors. This can be achieved by applying the so-called truncated SVD [29]. Efficient algorithms for computing truncated SVD are based on Lanczos iteration [38–40] and randomized linear algebra [27]. In particular, the randomized SVD (rSVD) presented in this section has demonstrated several advantages (in terms of computational cost and ease of implementation) compared with Lanczos iteration while still providing an accurate estimate of singular values and singular vectors with high probability. The method can be extended to extract a set of vectors that span the measurable subspace ${N_ \bot}({\textbf{H}})$ of an imaging operator, but the overall efficiency of the algorithm is dependent upon the rank $R$ of the considered matrix. Note that an rSVD does not compute the null space projection operator directly. Instead, this method constructs the basis vectors for ${N_ \bot}({\textbf{H}})$, from which the null space projection operator can be formed by using Eq. (9).

For ease of presentation, the method described below assumes the rank of ${\textbf{H}}$ is known. However, if the rank is not known, the range extension method [27] can be applied to iteratively construct a basis of ${N_ \bot}({\textbf{H}})$. The rSVD method is based on matrix sketching [27], which seeks to find a low-rank approximation of ${\textbf{H}}$ by computing its action on a set of randomly chosen objects. This method can be summarized in three steps: (1) find a matrix with orthonormal columns, ${\textbf{Q}} \in {\mathbb E}^{{M \times R}}$, such that

(15)$${\textbf{H}} \approx {\textbf{Q}}{{\textbf{Q}}^\dagger}{\textbf{H}},$$

(2) construct the small matrix ${\textbf{B}} = {{\textbf{Q}}^\dagger}{\textbf{H}}$ and compute its thin SVD decomposition, and (3) multiply the left singular vectors of ${\textbf{B}}$ by ${\textbf{Q}}$ to obtain the truncated SVD of ${\textbf{H}}$.

A matrix ${\textbf{Q}}$ with orthonormal columns that satisfies Eq. (15) can be found by constructing an orthogonal basis of ${\textbf{Y}} = {\textbf{H}}\boldsymbol\Omega$, where $\Omega \in {\mathbb E}^{{N \times S}}$ is a dense matrix, whose columns are sampled from a multivariate independent identically distributed (i.i.d.) Gaussian distribution [27] and $S = R + C$. Here, $C = 5 \sim 10$ represents a small oversampling factor necessary to achieve an accurate approximation Eq. (15) with high probability.

In addition to using a larger oversampling factor $C$, accuracy can also be increased by applying the following power iteration scheme:

(16)$${\textbf{Y}} = ({\textbf{H}}{{\textbf{H}}^\dagger}{)^P}{\textbf{H}}{\boldsymbol\Omega} \quad {\rm{where}}\,\,{P} \ge {{0}}\,\,{\rm{is}}\,\,{\rm{a}}\,\,{\rm{small}}\,\,{\rm{integer}},$$

leading to the so-called accuracy-enhanced rSVD [41].

A sample algorithm for computing the rSVD is provided in Algorithm 2. Here, $\texttt{qr}(\,)$ and $\texttt{svd}(\,)$ refer to the thin (also referred to as economy-size) QR and SVD factorizations in [29]. The thin SVD only computes ${\min}(N,M)$ singular vectors and values, resulting in significant computational savings when the rank $R$ of the matrix is not known.

Algorithm 2. Randomized Singular Value Decomposition

View Table | View all tables in this article

In terms of computational complexity, Algorithm 2 requires ${\cal O}((R + C))$ MVPs to construct ${\textbf{Y}}$, ${\cal O}(M{(R + C)^2})$ FLOPs to compute ${\textbf{Q}}$, and ${\cal O}(N{(R + C)^2})$ FLOPs to compute the thin SVD of ${\textbf{B}}$. In terms of space required, the most costly steps are the products ${\textbf{HQ}}$ and ${{\textbf{H}}^\dagger}{\textbf{Q}}$, which give this algorithm a space complexity of ${\cal O}(\max (N,M)(R + C))$.

C. Learning-Based Approaches

An alternative approach to establishing a null space projection operator that has been recently proposed is to consider a learning-based framework [28]. In learning-based methods, the problem is framed as the minimization of an empirical risk function and is solved using stochastic optimization algorithms. Mathematically, the problem of establishing a null space projection operator can be formulated as finding a rank $K$ orthogonal projection matrix ${\textbf{P}}$ such that

(17)$${\textbf{H}} {\textbf{P}} {\textbf{f}} = {\textbf{0}}\quad \forall \;{\textbf{f}} \in {\mathbb E}^{N}.$$

To identify an operator ${\textbf{P}}$ that satisfies Eq. (17), a learning approach that employs an autoencoder can be formulated as described below.

Autoencoders (AEs) are artificial neural networks that have been originally developed for data compression tasks [42–44]. An AE has two components: encoder and decoder. The encoder transforms the input data to a lower-dimensional embedding space. The decoder transforms the embedded data back into the input space. The AE is trained to learn a reduced dimensionality approximation of the input, called the embedding, which minimizes lost information. Given ${N_T}$ training pairs of inputs $\{{{\textbf{f}}_l}\} _{l = 1}^{{N_T}}$ and outputs $\{{{{\hat {\textbf f}}}_l}\} _{l = 1}^{{N_T}}$, the AE can be trained to minimize the mean square error loss function

(18)$${L_{\textit{AE}}}({{\textbf{W}}_E},{{\textbf{W}}_D}) = \frac{1}{{{N_T}}}\sum\limits_{l = 1}^{{N_T}} \left\| {{{\textbf{W}}_D^{}}{\textbf{W}}_E^\dagger {{\textbf{f}}_l} - {{{{\hat {\textbf f}}}}_l}} \right\|_2^2,$$

where ${{\textbf{W}}_D},{{\textbf{W}}_E} \in {\mathbb E}^{{N \times A}}$ represent the encoder and decoder, respectively, and $A$ denotes the size of the embedding. When the weights of an AE are tied, ${{\textbf{W}}_E} = {{\textbf{W}}_D}$ [45].

A framework for learning the projection operator (LPO) using a tied-weight linear AE was presented in [28]. Specifically, the null space projection operator is represented as ${{\textbf{P}}_{{\rm{null}}}} = {\textbf{W}}{{\textbf{W}}^\dagger}$, for a properly constructed encoder ${\textbf{W}} \in {\mathbb E}^{{N \times K}}$ with orthogonal columns. As with the WB and rSVD methods described above, explicit access to the entries of ${\textbf{H}}$ is not required. Instead, only the ability to compute the action of ${\textbf{H}}$ and ${{\textbf{H}}^\dagger}$ is needed. Note that direct application of Eq. (18) to learn the null space projection operator is not feasible, since it would require one to precompute the null space components ${{{\hat {\textbf f}}}_l}$ of a large ensemble of objects. Instead, recalling that ${{{\hat {\textbf f}}}_l} \in N({\textbf{H}})$ implies ${\textbf{H}}{{{\hat {\textbf f}}}_l} = {\textbf{0}}$, a self-supervised learning method is obtained.

The parameters of the autoencoder are trained by solving the constrained minimization problem

(19)$$\mathop {\min}\limits_{{\textbf{W}} \in St(N,K)} \frac{1}{{{N_T}}}\sum\limits_{l = 1}^{{N_T}} ||{\textbf{HW}}{{\textbf{W}}^\dagger}{{\textbf{f}}_l}||_2^2,$$

where $\{{{\textbf{f}}_l}\} _{l = 1}^L$ are randomly sampled i.i.d. Gaussian vectors spanning ${\mathbb U}$ and the Stiefel manifold, $St(N,K)$, is the set of matrices in ${\mathbb E}^{{N \times K}}$ with orthonormal columns [46].

In general, there are two approaches for imposing the orthogonality constraint when solving Eq. (19): soft and hard. In the soft approach, the orthogonality constraint is relaxed by adding a nonconvex penalization term [45,47]. However, care must be taken to ensure that the violation of the orthogonality constraint is negligible. The hard approach for imposing orthogonality constraints guarantees orthogonality of ${\textbf{W}}$ by projecting the current iterate onto the Stiefel manifold at each step in the optimization problem [48–50].

In the studies presented below, the hard approach for imposing orthogonality constraints method was implemented to solve Eq. (19). Optimization on the Stiefel manifold was performed using an accelerated Cayley stochastic gradient descent method [48], which is described in Algorithm 3. At the $t$th iteration of the algorithm, a stochastic approximation ${{\cal J}_t}$ of the cost functional in Eq. (19) is defined as

(20)$${{\cal J}_t}({\textbf{W}}) = \frac{1}{{{N_B}}}\sum\limits_{i = 1}^{{N_B}} \parallel {\textbf{HW}}{{\textbf{W}}^\dagger}{{\textbf{f}}_i}\parallel _2^2,$$

where ${N_B}$ is the batch size, and a new batch of objects $\{{{\textbf{f}}_i}\} _{i = 1}^{{N_B}}$ is randomly sampled at each iteration [28]. The heavy ball acceleration method [51] was used to define the update direction as

(21)$${{\textbf{M}}_ +} = \beta {{\textbf{M}}^{(t)}} - \alpha {{\textbf{G}}^{(t)}},$$

where ${{\textbf{G}}^{(t)}}$ is the gradient of ${{\cal J}^{(t)}}({\textbf{W}})$ at the current point ${{\textbf{W}}^{(t)}}$, ${{\textbf{M}}^{(t)}}$ is the momentum term (${{\textbf{M}}^{(0)}} = {\textbf{0}}$), and $\beta \lt 1$ is the damping term. Note that ${{\textbf{G}}^{(t)}}$ can be efficiently computed through automatic differentiation.

Algorithm 3. Accelerated Cayley Stochastic Gradient Descent

View Table | View all tables in this article

Algorithm 4. Cayley Transform Projection on the Stiefel Manifold

View Table | View all tables in this article

Algorithm 4 is then used to project the update direction onto the tangent space to the Stiefel manifold and to compute the next iterate ${{\textbf{W}}^{(t + 1)}} \in St(N,K)$. It can be shown [48] that the projection of ${{\textbf{M}}_ +}$ on the tangent space at point ${{\textbf{W}}^{(t)}}$ is given by

(22)$$\begin{split}{{\textbf{M}}^{(t + 1)}} = & \,\, {{\textbf{A}}^{(t)}} {{\textbf{W}}^{(t)}},\quad {\rm{with}}\\ {{\textbf{A}}^{(t)}} = & \,\, 2 {\rm{skew}}\left({{\textbf{M}}_ +}{{\left({{{\textbf{W}}^{(t)}}} \right)}^\dagger} -\right. \\ &\left. \frac{1}{2}{{\textbf{W}}^{(t)}}{{\left({{{\textbf{W}}^{(t)}}} \right)}^\dagger}{{\textbf{M}}^{(t)}}{{\left({{{\textbf{W}}^{(t)}}} \right)}^\dagger} \right),\end{split}$$

where ${\rm{skew}}({\textbf{B}}) = \frac{1}{2}({\textbf{B}} - {{\textbf{B}}^\dagger})$ is the skew-Hermitian part of the matrix ${\textbf{B}}$. The next iterate ${{\textbf{W}}^{(t + 1)}} \in St(N,K)$ is then computed as

(23)$${{\textbf{W}}^{(t + 1)}} = {{\textbf{Q}}^{(t)}}(\alpha){{\textbf{W}}^{(t)}},$$

where $\alpha \gt 0$ is the learning rate and ${{\textbf{Q}}^{(t)}}(\alpha) \in \mathbb R^{{N \times N}}$ is an orthonormal matrix stemming from the Cayley transform of $\frac{\alpha}{2}{{\textbf{A}}_t}$ [52]. Specifically, ${{\textbf{Q}}^{(t)}}(\alpha)$ is defined as

(24)$$\begin{split}{{\textbf{Q}}^{(t)}}(\alpha) &= {\left({{{\textbf{I}}_N} - \frac{\alpha}{2}{{\textbf{A}}^{(t)}}} \right)^{- 1}}\left({{{\textbf{I}}_N} + \frac{\alpha}{2}{{\textbf{A}}^{(t)}}} \right) \\&\approx \sum\limits_{s = 0}^S {\left({\frac{\alpha}{2}{{\textbf{A}}^{(t)}}} \right)^s}\left({{{\textbf{I}}_N} + \frac{\alpha}{2}{{\textbf{A}}^{(t)}}} \right),\end{split}$$

where a truncated Neumann series expansion is used to approximate the inverse of ${{\textbf{I}}_N} - \frac{\alpha}{2}{{\textbf{A}}^{(t)}}$. In practice, truncating the Neumann series after a few terms ($S \le 4$) can be sufficient to ensure orthogonality of ${{\textbf{W}}^{(t + 1)}}$ up to machine precision [48]. Note that, for computational efficiency in the evaluation of Eqs. (22) and (23), ${{\textbf{A}}^{(t)}}$ is not explicitly computed or stored; instead, the action of ${{\textbf{A}}^{(t)}}$ is computed on-the-fly through matrix-vector products.

Finally, Algorithm 3 stops at iteration $t$ if ${r^{(t)}} = \frac{1}{{{N_V}}}\sum\nolimits_{j = 1}^{{N_V}} \parallel {\textbf{H}}{{\textbf{W}}^{(t)}}{({{{\textbf{W}}^{(t)}}})^\dagger}{{\textbf{f}}_j}\parallel _2^2 \le \varepsilon$ for a given tolerance $\varepsilon \ge 0$. Here, ${N_V}$ denotes the size of the validation data set $\{{{\textbf{f}}_j}\} _{j = 1}^{{N_V}}$. Note that, since evaluating the stopping criterion requires additional evaluations of the imaging operator action, the stopping criterion is computed only every ${N_E}$ iterations to increase computational efficiency. The above approach can be applied to solve for the null space projection operator directly. However, if $K \gt N/2$, it is more computationally efficient to construct the projection operator onto the measurable space and define ${{\textbf{P}}_{{\rm{null}}}}$ according to Eq. (9). In this case, ${\textbf{W}} \in {\mathbb E}^{{N \times R}}$ and the cost functional in Eq. (20) are replaced by

(25)$${{\cal J}_t}({\textbf{W}}) = \frac{1}{{{N_B}}}\sum\limits_{i = 1}^{{N_B}} \parallel {\textbf{H}}({{\textbf{I}}_N} - {\textbf{W}}{{\textbf{W}}^\dagger}){{\textbf{f}}_i}\parallel _2^2.$$

Each iteration of the algorithm requires ${\cal O}({N_B})$ MVPs, ${\cal O}(N{A^2})$ linear algebra FLOPs, and has a space complexity of ${\cal O}(NA)$ where $A = \min (R,K)$. It is important to note that the number of iterations necessary to achieve an accurate estimate of ${{\textbf{P}}_{{\rm{null}}}}$ depends on the condition number of the imaging operator, the batch size ${N_B}$, and the choice of the training set.

4. NUMERICAL STUDIES

Numerical studies were conducted to assess accuracy, computational cost, and memory consumption of the WB, randomized singular value decomposition (rSVD), and learned projection operator (LPO) methods discussed above. Computational costs were evaluated in terms of the number of evaluations of the imaging operator and its adjoint (MVPs) as well as the execution time (wall time) of the algorithm. In particular, since the MVPs involves the imaging operator and its adjoint account for the majority of the floating point operations in the surveyed methods, the number of MVPs provides a reliable measure of the computational cost that is independent of the specific implementation choices and hardware used. To evaluate each method, the discrete radon transform (RT) and circular radon transform (CRT) are considered in Section 4.A. Additional results comparing accuracy and computational efficiency of the rSVD and LPO methods can be found in [28], where a Fourier-based imaging operator is considered. Following the results of this study, the null space components of RT and CRT are compared under limited angular and view sampling conditions in Section 4.B.

For the numerical studies in Section 4.A, two data sets were used to evaluate the accuracy of the methods applied to a set of benchmark problems involving the RT and CRT imaging operators. The data set used for the RT benchmark problems consists of clinical 2D x-ray CT axial slices of human chests [53,54]. The other data set, used for the CRT benchmark problems, consists of 2D cross-sectional slices of impedance maps extracted from 3D anatomically realistic acoustic numerical breast phantoms [55]. Objects from both data sets were normalized between 0 and 1.

Algorithmic parameters for the three methods used in the numerical studies are shown in Table 1. The tolerance for the WB and LPO stopping criteria, $\varepsilon$, was chosen based on the solution obtained through rSVD. To train the LPO method, a single object ${{\textbf{f}}_i}$ was drawn from a zero-mean Gaussian distribution with an identity covariance matrix at each iteration. Objects in the validation set were randomly selected from the two data sets described above. The stopping criterion was evaluated at the end of each epoch, i.e., every ${N_E}$ iteration.

Table 1. Algorithmic Parameters Used in the Numerical Studies

View Table | View all tables in this article

All numerical studies were conducted on a system with an Intel Xeon E5-2620v4 central processing unit (CPU), with eight cores (16 threads) clocked at 2.1 GHz and 256 GB of DDR4 memory. Graphics processing units (GPUs) were not used in these studies. All methods were implemented using the Python programming language. NumPy (version 1.19.2) [56] and SciPy (version 1.4.1) [57] libraries were employed in the implementation of the WB and rSVD methods. The Tensorflow (version 2.2.0) [58] library was used in the implementation of the LPO method. A coordinate list format [59] representation of the imaging operators was constructed using AIR tools II (version 1.0) [60] and loaded into Python as a Scipy and Tensorflow sparse operator. To estimate memory usage, a low-overhead Python profiler, memory profiler [61], was used. To measure execution time, wall time was used while utilizing all 16 CPU threads.

A. Radon Transform and Circular Radon Transform-Based Imaging

1. Imaging Operators

Discrete RT and CRT operators were employed to model a stylized x-ray transmission and ultrasound reflectively imaging system, respectively [62,63]. The number of pixels in the object is denoted as $N = {n_x}{n_y}$ with ${n_x} = {n_y}$ representing the number of pixels in the $x$ and $y$ directions, respectively. The RT imaging operator was implemented using Siddon’s ray tracing approach [64]. This method assumes that the object is piecewise constant on each pixel and computes the exact integral along the ray path based on the length of the intersection between the ray and each pixel. The CRT imaging operator was implemented in a similar manner; however, the straight ray paths were replaced with circular paths. The total number of tomographic views considered for the RT was 30, with $\textit{round}(\sqrt 2 {n_x})$ rays per view. For the CRT operator, the total number of tomographic views considered is 64, with $\textit{round}(\sqrt 2 {n_x})$ concentric circles per view. The numerical rank of ${\textbf{H}}$ was determined by visual inspection of the spectrum of ${{\textbf{H}}^\dagger}{\textbf{H}}$, which was computed using the sparse eigensolver ${\texttt{eigs}}$ in MATLAB [65]. In particular, the rank $R$ was found by plotting the eigenvalues $\{{\mu _n}\}$ of ${{\textbf{H}}^\dagger}{\textbf{H}}$ in a logarithmic scale and selecting the smallest index $R$ such that ${\mu _R} \ll {\mu _1}$ and ${\mu _R}/{\mu _{R + 1}} \gg 1$.

Table 2. Definition of Benchmark Problems: Name of the Benchmark Problem, Number of Tomographic Views, Size of the Object ($N$), Size of the Data ($M$), Rank of the Imaging Operator ($R$), and Spectral Condition Number ($\kappa$)

View Table | View all tables in this article

Fig. 1. Estimation of the null space component for a single image. Object (a) corresponds to RT-large and object (b) corresponds to CRT-large in Table 2. From left to right of each panel: original object (a), (b), null space component obtained by applying the WB method (c), (d) and the LPO method (e), (f). The scales of ${\textbf{f}}$ and ${{\textbf{f}}_{{\rm{null}}}}$ are [0, 1] and [$-0.35, 0.35$], respectively.

Download Full Size | PDF

Four benchmark problems were considered in the numerical experiments: RT-small, RT-large, CRT-small, CRT-large. The object size, number of views, size, rank, and condition number of the imaging operator ${\textbf{H}}$ for each benchmark problem are summarized in Table 2. The implementation of the WB and LPO methods was executed successfully for all benchmark problems. However, the implementation of the rSVD method aborted for the two large-scale tests (RT-large, CRT-large) due to an out-of-memory (OOM) error. Figure 1 shows examples of null space components computed using the WB and LPO methods applied to the RT-large (top) and CRT-large (bottom) benchmark problems.

Table 3. Accuracy ${e_L}$ of the Null Space Projector for the Benchmark Problems in Table 2^a

View Table | View all tables in this article

Table 4. Computational Cost of Determining the Null Space Projection Operator for Each Benchmark Problem in Table 2^a

View Table | View all tables in this article

2. Accuracy Evaluation

No analytical expression of the null-space projection operator for the discrete RT and CRT exists. For this reason, the norm of the projection residual ${\textbf{r}} = {\textbf{H}}{{\textbf{P}}_{{\rm{null}}}}{\textbf{f}}$ is used here to define a quantitative measure of accuracy for the WB, rSVD, and LPO methods. Specifically, given an ensemble ${\{{{\textbf{f}}_l}\} _l}$ of $L = 100$ objects from the 2D acoustic numerical breast phantoms data set, an empirical quantitative measure of accuracy ${e_L}$ is defined as

(26)$${e_L} = \frac{1}{L}\sum\limits_{l = 1}^L \frac{1}{N}\parallel {\textbf{H}}{{\textbf{P}}_{{\rm{null}}}}{{\textbf{f}}_l}\parallel _2^2.$$

Note that, in the absence of any numerical errors, ${e_L} = 0$.

Table 3 reports the accuracy ${e_L}$ achieved by the three methods on the four benchmark problems. The stopping criterion used for the WB and LPO methods is defined in the beginning of Section 4. With the exception of the rSVD whose execution failed for the two large-scale problems (“RT-large” and “CRT-large”) due to memory constraints, all methods achieved high levels of accuracy.

3. Computational Performance Evaluation

The three methods were also evaluated in terms of number of applications of the imaging operator and its adjoint (MVPs), wall time, and peak memory usage. As discussed in Section 4, the number of iterations of the WB and LPO methods was chosen to obtain the same accuracy as that achieved by rSVD on the small-scale problems.

The results for this study are provided in Table 4. For the WB method, two values are shown for each metric. WB-single represents the cost (MVPs, wall time, or peak memory usage) of applying the WB to compute the null-space component of a single object. WB-projection represents an estimate of the cost of computing an explicit null-space projection operator by applying the WB method to $\min (K,R)$ objects, as detailed in Section 3.A. For all benchmark problems, the WB method required the largest amount of MVPs and computational time but also consumed the least amount of memory to form an explicit null-space projection operator. The rSVD method resulted in the smallest number of MVPs and the least amount of wall time for the small-scale benchmark problems but aborted on the large-scale benchmark problems due to an OOM error. Finally, for all benchmark problems, the LPO method showed significant speed-up (both in terms of MVPs and wall time) with respect to the WB method, although requiring about four time as much memory.

B. Visualization of Null-Space Components of RT and CRT Imaging Operators under Different Sampling Conditions

To visualize the impact of limited angular sampling on the imaging operators’ null space, two studies were conducted. In the first study, the spacing between each tomographic view was fixed at $\Delta \phi = \frac{\pi}{{120}}$, and the angular scanning range was varied from 0 to $2\pi$. In the second study, both limited-angle and limited-view sampling conditions were considered simultaneously. The number of views for both the CRT and RT was set to 60, and the angular scanning range $\phi$ was varied from 0 to $2\pi$. An object from the 2D acoustic numerical breast phantoms and USCT measurement data set [55] was used in these studies. The WB method was chosen due to superior time and space complexity for computing the null space component for a single object.

Fig. 2. Comparison of the null space component of a single object for the case of the RT (second column) and CRT (third column). The object is displayed in the first column, along with an indication of the tomographic scanning range, which is different for each row. The angular spacing between tomographic views is fixed. A symmetric logarithm was applied to the scale of the colorbar to enhance contrast.

Download Full Size | PDF

Fig. 3. Comparison of the null space component of a single object for the case of the RT (second column) and CRT (third column). The object is displayed in the first column, along with an indication of the tomographic scanning range, which is different for each row. The number of tomographic views for each case was fixed at 60, which were evenly distributed across the different angular scanning ranges. A symmetric logarithm was applied to the scale of the colorbar to enhance contrast.

Download Full Size | PDF

The results for the case with fixed angular spacing are shown in Fig. 2. When the angular scanning range $\phi$ is less than $\pi$ for the RT case or less than $3\pi /2$ for the CRT case, the null space components contain certain structures in the regions that are outside the convex hull of the measurement geometry. This is consistent with microlocal analysis results demonstrating that the boundaries of an object can be stably reconstructed only when $\phi \ge \pi$ and $\phi \ge 3\pi /2$ for the RT and CRT, respectively. [66–68].

The results for the case where the number of views case are fixed are shown in Fig. 3. Although the number of views is fixed for all examples, it is clear that the scanning range can have a large impact on the null space component. For example, in the RT case, the magnitude of the null space component increases as the angular range is increased from $\pi$ to $2\pi$ due to the overlap in parallel beams. It is important to note that this does not necessarily imply these imaging operators are inferior to those with an angular range of $\pi$; if one were to perform image reconstruction, the reconstructed images will be more robust to noise due to the overlap in parallel beams [62].

5. DISCUSSION AND CONCLUSION

This tutorial surveys three computational methods to construct an explicit null space projection operator for a linear discrete-to-discrete imaging operator: “Wilson–Barrett” (WB), “randomized singular value decomposition” (rSVD), and “learning a projection operator” (LPO). The key feature of all three methods is that the imaging operator need not be computed explicitly as a matrix. Instead, only the ability to compute the actions of the imaging operator and its adjoint on-the-fly through matrix-vector and the corresponding adjoint-matrix-vector products is required. The computational and memory requirements of the methods were analyzed. Additionally, numerical studies were conducted that involved the radon transform and circular radon transform imaging operators.

Table 5. Summary of Symbols and Notation Used in This Tutorial Paper

View Table | View all tables in this article

The results of our studies confirmed that all three methods can accurately compute the null space component of an object if sufficient computer memory is available. When the null space component of a single object is needed, WB requires the lowest amount of wall time and consumes the least amount of memory. However, when an explicit null space projection operator is desired, the rSVD and LPO methods significantly outperform the WB method in both number of imaging operator evaluations and wall time. Although computationally more expensive, the LPO method requires significantly less memory than rSVD. This allows LPO to address larger scale problems.

In this tutorial, the rank of the imaging operator was assumed to be known. In practice, this assumption is unwarranted. In such cases, a desirable feature of the WB method is its ability to compute the null space component of a single object without requiring prior knowledge of the rank of the imaging operator. Generalizations of rSVD to the case of unknown rank, such as the range extension algorithm, are also well-known [27]. For additional information on computing the numerical rank of an imaging operator, the reader is encouraged to consult the following references [69–71].

Several topics for future investigation still remain. A limitation of the rSVD and LPO methods surveyed here is that they explicitly construct a basis of the null or measurable subspace. Although WB can circumvent this issue by representing the projection operator implicitly through an iterative algorithm, the computational cost of WB is prohibitive when the null space component of a large number of objects is desired. Future studies should aim at circumventing these limitations by fully leveraging the flexibility of the LPO method. By framing the construction of the null space projection operator as a learning problem, a wide variety of advanced neural network architectures can be employed. Such extensions hold the promise of enabling computationally efficient evaluation of null space components for large ensembles of objects without relying on an explicit computation of a basis for the null or measurable subspace.

APPENDIX A: NOTATION

Table 5 summarizes the symbols and notation used in this tutorial.

Funding

National Institutes of Health (EB020604, EB023045, EB028652, NS102213).

Disclosures

The author declares no conflicts of interest.

Data availability

The Python code implementing the surveyed methods as well as a few usage examples is publicly available under GPL 3.0 license. It can be downloaded from [72]. Information regarding the availability of the chest x-ray and acoustic breast phantoms data sets used in the numerical studies can be found in [53,55].

REFERENCES

1. F. Natterer, The Mathematics of Computerized Tomography (SIAM, 2001).

2. M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging (CRC Press, 2020).

3. H. H. Barrett and K. J. Myers, Foundations of Image Science (Wiley, 2013).

4. C. L. Epstein, Introduction to the Mathematics of Medical Imaging (SIAM, 2007).

5. M. A. Anastasio and R. W. Schoonover, “Basic principles of inverse problems for optical scientists,” in Encyclopedia of Applied Physics (2003), pp. 1–24.

6. A. K. Louis and W. Törnig, “Ghosts in tomography–the null space of the Radon transform,” Math. Methods Appl. Sci. 3, 1–10 (1981). [CrossRef]

7. R. M. Lewitt, “Multidimensional digital image representations using generalized Kaiser–Bessel window functions,” J. Opt. Soc. Am. A 7, 1834–1846 (1990). [CrossRef]

8. K. M. Hanson and G. W. Wecksung, “Local basis-function approach to computed tomography,” Appl. Opt. 24, 4028–4039 (1985). [CrossRef]

9. K. Wang, R. W. Schoonover, R. Su, A. Oraevsky, and M. A. Anastasio, “Discrete imaging models for three-dimensional optoacoustic tomography using radially symmetric expansion functions,” IEEE Trans. Med. Imaging 33, 1180–1193 (2014). [CrossRef]

10. H. H. Barrett, J. Aarsvold, and T. Roney, “Null functions and eigenfunctions: tools for the analysis of imaging systems,” Prog. Clin. Biol. Res. 363, 211–226 (1991).

11. J. Aarsvold and H. Barrett, “Symmetries of single-slice multiple-pinhole tomographs,” in IEEE Nuclear Science Symposium (IEEE, 1996), Vol. 3, pp. 1673–1677.

12. G. L. Zeng and G. T. Gullberg, “Null-space function estimation for the interior problem,” Phys. Med. Biol. 57, 1873–1877 (2012). [CrossRef]

13. A. K. Jha, H. H. Barrett, E. C. Frey, E. Clarkson, L. Caucci, and M. A. Kupinski, “Singular value decomposition for photon-processing nuclear imaging systems and applications for reconstruction and computing null functions,” Phys. Med. Biol. 60, 7359–7385 (2015). [CrossRef]

14. Y. Ding, L. Caucci, and H. H. Barrett, “Null functions in three-dimensional imaging of alpha and beta particles,” Sci. Rep. 7, 15807 (2017). [CrossRef]

15. C. G. Graff and E. Y. Sidky, “Compressive sensing in medical imaging,” Appl. Opt. 54, C23–C44 (2015). [CrossRef]

16. B. Smith, “Null-space smoothing of tomographic images using TV norm minimization,” in IEEE Nuclear Science Symposium, Medical Imaging Conference and Room-Temperature Semiconductor Detector Workshop (NSS/MIC/RTSD) (IEEE, 2016), pp. 1–4.

17. B. N. Hahn, “Null space and resolution in dynamic computerized tomography,” Inverse Probl. 32, 025006 (2016). [CrossRef]

18. B. Kelly, T. P. Matthews, and M. A. Anastasio, “Deep learning-guided image reconstruction from incomplete data,” arXiv preprint arXiv:1709.00584 (2017).

19. J. Schwab, S. Antholzer, and M. Haltmeier, “Deep null space learning for inverse problems: convergence analysis and rates,” Inverse Probl. 35, 025008 (2019). [CrossRef]

20. P. S. Rowbotham and R. G. Pratt, “Improved inversion through use of the null space,” Geophysics 62, 869–883 (1997). [CrossRef]

21. M. M. Deal and G. Nolet, “Nullspace shuttles,” Geophys. J. Int. 124, 372–380 (1996). [CrossRef]

22. G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning techniques for inverse problems in imaging,” IEEE J. Sel. Areas Inf. Theory 1, 39–56 (2020). [CrossRef]

23. S. Bhadra, V. A. Kelkar, F. J. Brooks, and M. A. Anastasio, “On hallucinations in tomographic image reconstruction,” IEEE Trans. Med. Imaging 40, 3249–3260 (2021). [CrossRef]

24. R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. (SIAM, 1994).

25. V. Klema and A. Laub, “The singular value decomposition: its computation and some applications,” IEEE Trans. Autom. Control 25, 164–176 (1980). [CrossRef]

26. D. W. Wilson and H. H. Barrett, “Decomposition of images and objects into measurement and null components,” Opt. Express 2, 254–260 (1998). [CrossRef]

27. N. Halko, P.-G. Martinsson, and J. Tropp, “Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions,” SIAM Rev. 53, 217–288 (2011). [CrossRef]

28. J. Kuo, J. Granstedt, U. Villa, and M. A. Anastasio, “Learning a projection operator onto the null space of a linear imaging operator,” Proc. SPIE 11595, 115953X (2021). [CrossRef]

29. G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed. (Johns Hopkins University, 2013).

30. G. Strang, G. Strang, G. Strang, and G. Strang, Introduction to Linear Algebra (Wellesley-Cambridge, 1993), Vol. 3.

31. E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users’ Guide (SIAM, 1999).

32. W. Kuo and M. J. Zuo, Optimal Reliability Modeling: Principles and Applications (Wiley, 2003).

33. L. Landweber, “An iteration formula for Fredholm integral equations of the first kind,” Am. J. Math. 73, 615–624 (1951). [CrossRef]

34. M. Jiang and G. Wang, “Convergence of the simultaneous algebraic reconstruction technique (SART),” IEEE Trans. Image Process. 12, 957–961 (2003). [CrossRef]

35. A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems (Springer, 2011), Vol. 120.

36. J. Wang and Y. Zheng, “On the convergence of generalized simultaneous iterative reconstruction algorithms,” IEEE Trans. Image Process. 16, 1–6 (2006). [CrossRef]

37. P.-G. Martinssona, V. Rokhlinb, and M. Tygertb, “A randomized algorithm for the approximation of matrices,” Rapport technique, Technical Report 1361 (Department of Computer Science, 2006).

38. Y. Saad, Numerical Methods for Large Eigenvalue Problems, Revised ed. (SIAM, 2011).

39. R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK Users’ Guide: Solution of Large-scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods (SIAM, 1998).

40. J. Baglama and L. Reichel, “Augmented implicitly restarted Lanczos bidiagonalization methods,” SIAM J. Sci. Comput. 27, 19–42 (2005). [CrossRef]

41. P.-G. Martinsson, “Randomized methods for matrix computations,” The Math. Data 25, 187–231 (2019).

42. P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in ICML Workshop on Unsupervised and Transfer Learning (2012), pp. 37–49.

43. J. Li, T. Luong, and D. Jurafsky, “A hierarchical neural autoencoder for paragraphs and documents,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Association for Computational Linguistics, 2015), pp. 1106–1115.

44. P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in 25th International Conference on Machine Learning, Association for Computing Machinery, New York, New York, USA, 2008), pp. 1096–1103.

45. N. Bansal, X. Chen, and Z. Wang, “Can we gain more from orthogonality regularizations in training deep CNNs?” in 32nd International Conference on Neural Information Processing Systems, Curran Associates, Red Hook, New York, USA, 2018), pp. 4266–4276.

46. I. M. James, “Note on Stiefel Manifolds I,” Bull. London Math. Soc. 2, 199–203 (1970). [CrossRef]

47. N. Xiao, X. Liu, and Y. Yuan, “A class of smooth exact penalty function methods for optimization problems with orthogonality constraints,” in Optimization Methods & Software (2020), pp. 1–37.

48. J. Li, F. Li, and S. Todorovic, “Efficient Riemannian optimization on the Stiefel manifold via the Cayley transform,” in International Conference on Learning Representations (2020).

49. Z. Wen and W. Yin, “A feasible method for optimization with orthogonality constraints,” Math. Program. 142, 397–434 (2013). [CrossRef]

50. Y. Nishimori, “Learning algorithm for independent component analysis by geodesic flows on orthogonal group,” in International Joint Conference on Neural Networks (Cat. No.99CH36339) (1999), Vol. 2, pp. 933–938.

51. B. T. Polyak, “Some methods of speeding up the convergence of iteration methods,” USSR Comput. Math. Math. Phys. 4, 1–17 (1964). [CrossRef]

52. Y. Nishimori and S. Akaho, “Learning algorithms utilizing quasi-geodesic flows on the stiefel manifold,” Neurocomputing 67, 106–135 (2005). [CrossRef]

53. X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).

54. K. Yan, X. Wang, L. Lu, and R. M. Summers, “Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning,” J. Med. Imaging 5, 036501 (2018). [CrossRef]

55. F. Li, U. Villa, S. Park, and M. A. Anastasio, “3-D stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography,” IEEE Trans. Ultrason. Ferroelectr. Freq. Control. 69, 135–146 (2022). [CrossRef]

56. C. R. Harris, K. J. Millman, S. J. van der Walt, et al., “Array programming with NumPy,” Nature 585, 357–362 (2020). [CrossRef]

57. P. Virtanen, R. Gommers, T. E. Oliphant, et al., “SciPy 1.0: fundamental algorithms for scientific computing in python,” Nat. Methods 17, 261–272 (2020). [CrossRef]

58. M. Abadi, A. Agarwal, P. Barham, et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, https://tensorflow.org.

59. J. R. Gilbert, C. Moler, and R. Schreiber, “Sparse matrices in MATLAB: design and implementation,” SIAM J. Matrix Anal. Appl. 13, 333–356 (1992). [CrossRef]

60. P. C. Hansen and J. S. Jørgensen, “AIR tools II: algebraic iterative reconstruction methods, improved implementation,” Numer. Algorithms 79, 107–137 (2018). [CrossRef]

61. F. Pedregosa, P. Gervais, T. Forbes, Victor, V. Niculae, S. U. Kumar, D. Novozhilov, S. Lebedev, B. Bengfort, M. H. Tariq, J. L. Cano, and M. Becker, “pythonprofilers/memory_profiler: monitor memory usage of python code”.

62. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging (IEEE, 1988).

63. G. Ambartsoumian and P. Kuchment, “A range description for the planar circular Radon transform,” SIAM J. Math. Anal. 38, 681–692 (2006). [CrossRef]

64. R. L. Siddon, “Fast calculation of the exact radiological path for a three-dimensional CT array,” Med. Phys. 12, 252–255 (1985). [CrossRef]

65. MATLAB, version 9.8.0 (R2020a) (The MathWorks, 2020).

66. J. Frikel and E. T. Quinto, “Artifacts in incomplete data tomography with applications to photoacoustic tomography and sonar,” SIAM J. Appl. Math. 75, 703–725 (2015). [CrossRef]

67. X. Pan, Y. Zou, and M. A. Anastasio, “Data redundancy and reduced-scan reconstruction in reflectivity tomography,” IEEE Trans. Image Process. 12, 784–795 (2003). [CrossRef]

68. V. P. Krishnan and E. T. Quinto, “Microlocal analysis in tomography,” in Handbook of Mathematical Methods in Imaging (2015), Vol. 1, p. 3.

69. S. Ubaru and Y. Saad, “Fast methods for estimating the numerical rank of large matrices,” in International Conference on Machine Learning (PMLR) (2016), pp. 468–477.

70. J. W. Demmel, Applied Numerical Linear Algebra (SIAM, 1997).

71. T. F. Chan, “Rank revealing QR factorizations,” Linear Algebra Appl. 88, 67–82 (1987).

72. “Null space projection operators,” 2022, https://github.com/comp-imaging-sci/null-space-projection-operators.

WB: Algorithm 1
Step length	$\alpha = 1.9\mu _1^{- 1}$
rSVD: Algorithm 2
Oversampling parameter	$C = 5$
Number of power iterations	$P = 4$
LPO: Algorithm 3
Learning rate	$\alpha ={ 10^{- 3}}$
Momentum coefficient	$\beta = 0.9$
Truncation parameter	$S = 4$
Batch size	${N_B} = 1$
Validation set size	${N_V} = 100$
Epoch size	${N_E} = 10,\!000$

Problem	Tomographic Views	$N$	$M$	$R$	$\kappa$
RT-small	30	65,536	10,830	9678	5482.9
RT-large	30	262,144	21,690	19,446	5529.4
CRT-small	64	65,536	11,648	9888	2842.1
CRT-large	64	262,144	23,232	19,470	3012.1

		Problem
Method		RT-Small	RT-Large	CRT-Small	CRT-Large
MVPs (Million)	WB-single	0.26	0.24	0.08	0.09
	WB-projection	2,508	4,717	756	1,681
	rSVD	0.09	OOM	0.01	OOM
	LPO	52	56	74	76
Wall Time (days)	WB-single	0.009	0.03	0.004	0.01
	WB-projection	90.0	549.4	36.3	201.9
	rSVD	0.1	OOM	0.1	OOM
	LPO	5.5	41.3	7.1	46.5
Peak Mem (GB)	WB-single	0.09	0.37	0.09	0.36
	WB-projection	5.4	43.4	5.15	44.2
	rSVD	34.4	OOM	36.1	OOM
	LPO	21.3	172.9	22.9	173.2

Symbol	Description	Symbol	Description
${\mathbb U}$	Hilbert space (object domain)	$\{{\mu _n}\} _{n = 1}^N$	Eigenvalues of ${{\textbf{H}}^\dagger}{\textbf{H}}$
$\mathbb V$	Hilbert space (data domain)	$\kappa ({\textbf{H}}) = \sqrt {\frac{{{\mu _1}}}{{{\mu _R}}}}$	Spectral condition number of ${\textbf{H}}$
${\mathbb E}^N$	Euclidean space of dimension $N$	$D({\textbf{H}})$	Domain of ${\textbf{H}}$
${\textbf f}\in {\mathbb U}$	Object	$N({\textbf{H}})$	Null subspace of ${\textbf{H}}$
${\textbf g}\in \mathbb V$	Data	${N_ \bot}({\textbf{H}})$	Measurable subspace of ${\textbf{H}}$
${\textbf 0}_{N\times K}\in {\mathbb E}^{N\times K}$	Zero matrix in ${\mathbb E}^{N\times K}$	${{\textbf{P}}_{{\rm{null}}}}$	Orthogonal projection operator onto $N({\textbf{H}})$
${\textbf I}_N\in {\mathbb E}^{N\times N}$	Identity operator in ${\mathbb E}^N$	${{\textbf{P}}_{{\rm{meas}}}}$	Orthogonal projection operator onto ${N_ \bot}({\textbf{H}})$
${\textbf H}:{\mathbb U} \mapsto \mathbb V$	Imaging operator	$St (K,N)\subset {\mathbb E}^{N\times K}$	Set of $N$-by-$K$ matrices with orthonormal columns (Stiefel manifold)
${\textbf H}^+:\mathbb V \mapsto {\mathbb U}$	Moore–Penrose pseudoinverse	WB	Wilson–Barrett method (Algorithm 1)
${\textbf H}^\dagger :\mathbb V \mapsto {\mathbb U}$	Adjoint of imaging operator	rSVD	Randomized singular value decomposition method (Algorithm 2)
$N$	Dimension of the object	LPO	Learning-based method (Algorithm 3)
$M$	Dimension of the data	MVPs	Number of evaluations of the imaging operator and its adjoint
$R$	Rank of the imaging operator	RT	Radon transform
$K = N - R$	Dimension of the null space	CRT	Circular radon transform

WB: Algorithm 1
Step length	$\alpha = 1.9\mu _1^{- 1}$
rSVD: Algorithm 2
Oversampling parameter	$C = 5$
Number of power iterations	$P = 4$
LPO: Algorithm 3
Learning rate	$\alpha ={ 10^{- 3}}$
Momentum coefficient	$\beta = 0.9$
Truncation parameter	$S = 4$
Batch size	${N_B} = 1$
Validation set size	${N_V} = 100$
Epoch size	${N_E} = 10,\!000$

Computing a projection operator onto the null space of a linear imaging operator: tutorial

Abstract

1. INTRODUCTION

2. BACKGROUND

A. Imaging Operators as Mappings between Hilbert Spaces

B. Null and Measurable Subspaces

C. Singular Value Decomposition (SVD)

D. Projecting onto the Null and Measurable Space

3. COMPUTATIONAL PROCEDURES FOR PROJECTING AN OBJECT ONTO THE NULL SPACE

A. Wilson–Barrett Method

B. Randomized Singular Value Decomposition

C. Learning-Based Approaches

4. NUMERICAL STUDIES

A. Radon Transform and Circular Radon Transform-Based Imaging

1. Imaging Operators

2. Accuracy Evaluation

3. Computational Performance Evaluation

B. Visualization of Null-Space Components of RT and CRT Imaging Operators under Different Sampling Conditions

5. DISCUSSION AND CONCLUSION

APPENDIX A: NOTATION

Funding

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (3)

Tables (9)

Equations (26)

Journal of the Optical Society of America A