Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Method for optimizing channelized quadratic observers for binary classification of large-dimensional image datasets

Open Access Open Access

Abstract

We present a new method for computing optimized channels for channelized quadratic observers (CQO) that is feasible for high-dimensional image data. The method for calculating channels is applicable in general and optimal for Gaussian distributed image data. Gradient-based algorithms for determining the channels are presented for five different information-based figures of merit (FOMs). Analytic solutions for the optimum channels for each of the five FOMs are derived for the case of equal mean data for both classes. The optimum channels for three of the FOMs under the equal mean condition are shown to be the same. This result is critical since some of the FOMs are much easier to compute. Implementing the CQO requires a set of channels and the first- and second-order statistics of channelized image data from both classes. The dimensionality reduction from M measurements to L channels is a critical advantage of CQO since estimating image statistics from channelized data requires smaller sample sizes and inverting a smaller covariance matrix is easier. In a simulation study we compare the performance of ideal and Hotelling observers to CQO. The optimal CQO channels are calculated using both eigenanalysis and a new gradient-based algorithm for maximizing Jeffrey’s divergence (J). Optimal channel selection without eigenanalysis makes the J-CQO on large-dimensional image data feasible.

© 2015 Optical Society of America

1. INTRODUCTION

Our work is motivated by a challenge that is common in many imaging applications: sorting image data between two classes of objects (e.g., signal present and signal absent) when linear classifiers do not perform well enough for the application. An optimal quadratic classifier requires either a training set of images from each class or prior knowledge of the first- and second-order statistics of the image data from each class. The first-order statistics are the average images from each class and the second-order statistics are the covariance matrices from each class. If a training set of images is available the first- and second-order sample statistics can be used. Optimal quadratic classifiers are difficult to compute in imaging applications because of the large number of measurements made by most imaging systems. A single image can contain a few million elements and the number of elements in the covariance matrix is equal to the square of this number. When working with the covariance matrix, storing it can be challenging, inverting it can be impractical, and accurately estimating it from finite training data can even be impossible. Our work addresses this big data problem by using a quadratic classifier on images that have been reduced in size by a linear transformation; we will refer to this as a channelized quadratic observer (CQO). This approach demands answering the following question: which linear transform is best for computing a quadratic classifier for a given imaging application? To address this question we have developed a new method for optimizing CQOs for binary classification of large-dimensional image datasets.

To introduce the detection method, begin by considering the relationship between an image and an object as

g=Hf+n.

Here, g is an M×1 vector of measurements made by an imaging system that is represented as a continuous-to-discrete operator H; the measurements of the continuous object f are corrupted by measurement noise n. We will consider post-processing signal detection. That is to say, the forward imaging model H is fixed and can even be unknown since only the statistics of the image data will be used. We are interested in linear combinations of the image data of the form

v=Tg,
where T is an L×M matrix and compression is achieved since L<M. Using terminology from the perception literature, each row of T is referred to as a channel [1]. In this paper, v will be called a channelized image or a channelized data vector. Mathematical observers for detection or classification tasks operate on image data g, or preferably channelized image data v, to form a scalar-valued decision variable. Objective assessment of image quality [2] quantifies the ability of an observer to use image data for performing the scientific task of interest, e.g., detection, classification, or estimation. Channelized data are preferable for mathematical observers since calculating a decision variable usually involves the estimation of parametric likelihoods [3]. In the channelized representation this estimation can be much more accurate given common constraints on finite training data [4,5]. Computational needs are lower because the inverse of a covariance matrix, required for likelihood evaluations, is now L×L instead of M×M.

In this work we will present gradient-based optimization methods for finding the solution to T (once L is selected) that maximizes detection task performance of the ideal observer (i.e., the likelihood ratio) given Gaussian statistics on the channel outputs, v, for both classes. We will consider the first- and second-order statistics of each class to be different, in general, which leads to a quadratic relationship between the likelihood ratio and the image data; we call this a quadratic observer. When the second-order statistics are equal the ideal observer is linear and the optimal solution for T is the Hotelling observer (i.e., a prewhitened match filter). This equal covariance assumption is valid when the two classes differ by the addition of a signal that is weak enough, relative to other sources of variability, so that it does not affect the covariance matrix. When the means are equal but the covariances are different we show a new result: that the same optimal T solution is achieved using optimization with respect to the Bhattacharyya distance, Jeffrey’s divergence, and the area under the curve (AUC). This equal mean assumption is valid in ultrasound imaging [68] and in many texture discrimination tasks.

The next section is devoted to a review of related work. Assumptions and notation are established in Section 3. Then we show an analytic gradient, with respect to the linear channels, for the following: Section 4) Kullback–Liebler (KL) divergence [9]; Section 5) the symmetrized KL divergence (also referred to as Jeffrey’s divergence (J) in information theory [10]); Section 6) the Bhattacharyya distance [11] (also called G(0) in [12]); and Section 7) the area under the ideal-observer receiver operating characteristic (ROC) curve, also known as the AUC [13,14]. We will see by the end of these Sections that the J and G(0) metrics are maximized at the same set of channels that maximizes the AUC when there is no signal in the mean. This results in an important surrogate figure of merit (SFOM) [15] for imaging applications since J is much easier to compute than the AUC.

In Section 8 we focus on the specific case of no signal in the mean and compute the Hessian of J, with L number of linear channels. We use this Hessian to show that there are no more than L+1 local maxima in this case, one of which is the global maximum. This is useful information for a gradient-based algorithm to find the global maximum.

In Section 9 the results of a simulation study are presented. Here, the AUC is computed for the ideal and Hotelling observers and compared to CQO performance. The CQO is computed in two different ways: 1) an iterative algorithm based on a gradient search to maximize J, and 2) an observer introduced in Section 5 that is based on an eigenanalysis of the covariance matrices and is optimal when there is no signal in the mean. The similar task performance between the J-based CQO (J-CQO) and the much less tractable Eigen-CQO indicates strong potential for the J-CQO method in more realistic imaging applications.

2. RELATED WORK

A. Channels in Imaging

For medical imaging applications image channels have been used to approximate Hotelling observers with channelized Hotelling observers (CHO) [16], both of which are linear observers. Channels have also been used to approximate the ideal observer, which is not necessarily linear in the image data [17,18]. These channelized ideal observers (CIOs) have been explored for both standard channels used for CHOs and channels derived from the imaging system operator [19,20].

In computer vision the interpretation of a single image is decomposed into subimage detection and classification tasks that are customized to the desired data product. Here, an image channel can be related to the original image data by both linear and nonlinear transforms; even the channel outputs themselves can be combined to further reduce the original data to features. A succinct review of different types of image channels and feature selection methods in this community is provided in [21].

Across imaging applications, the selection of channels can be motivated by maximizing the performance of the channelized observer; the channels are then referred to as efficient channels [2]. On the other hand anthropomorphic channels are designed to approximate human observer performance and are often based on properties of the human visual system [2224]. The kernel trick used for support vector machines (SVM) employs a nonlinear function of the data and a linear discriminant on that function’s output [25]. SVM seeks nonlinear channels, either efficient or anthropomorphic depending on the application, for a linear observer. This work concentrates on efficient linear channels for quadratic observers.

B. Eigenanalysis for Compression

In this paper, we consider the task of detection among two classes and study the eigenanalysis of the two covariance matrices K1 and K2. When these covariance matrices are unequal the data is called heteroscedastic. Covariance matrix eigenanalysis is rarely practical for modern imaging systems since an image is comprised of several million elements. We will show a new and computational feasible approach for optimizing channel selection, which we denote J-CQO, and compare its detection task performance with an eigenanalysis approach, which we denote Eigen-CQO. We will also show that Eigen-CQO is optimal when the data is heteroscedastic and the mean images are equal. In [26] Fukunaga and Koontz were the first to suggest covariance matrix eigendecomposition for detection and classification tasks; this approach is called the Fukunaga and Koontz transform (FKT). The FKT uses a matrix P to transform the data so that P(K1+K2)P equals the identity matrix. This equality guarantees that both covariance matrices of the transformed data will have the same eigenvectors. Furthermore, the sum of the two eigenspectra, when eigenvalues associated with the same eigenvector are added, is equal to one. Consequently, the transformation by P makes the strongest eigenvectors of one class the weakest eigenvectors of the other. In [27] the statistical properties of the FKT are studied and it is reported that, with certain eigenspectrum assumptions, the FKT is the optimal low-rank approximation to the optimal classifier for zero mean, heteroscadastic, and normally distributed data. FKT is widely used in pattern recognition; its adaptation is called a tuned basis function (TBF) in [28] for finding an optimal solution to T˜ to use in the quadratic test statistic, gT˜g.

In Section 5, we will show that with the normality assumptions used in this paper, and when there is no signal in the mean, an optimal channel matrix for the J FOM can be constructed by using L eigenvectors of K21K1 for the channels, with corresponding eigenvalues κl. These eigenvectors are chosen to have the L largest values of κl+κl1 and we call this observer the Eigen-CQO. Using these same assumptions on the statistics of the data, it was shown in [29] that the Eigen-CQO maximizes the Bhattacharya distance between two classes. Multiple discriminant analysis (MDA), described in [30], is similar to Eigen-CQO but instead of K1 and K2 an intraclass and an extraclass scatter matrix are estimated from finite data; the imaging task is to distinguish among more than two classes. MDA is achieved when the scatter matrices are estimated from N image samples, the FKT eigenspectrum is calculated from a QR decomposition on an N×M matrix, and the final classification decision is formed using a 1-nearest neighbor (1-NN) algorithm.

C. Information Metrics in Imaging

A SFOM is a quantity that correlates with ideal-observer task performance, as measured by the AUC or some other task-based figure of merit (FOM), but that is easier to compute [31]. Statistical distances and divergences used in information theory can be powerful SFOM in imaging if they are proven, analytically or empirically, to correlate with task performance. In [32] it is recognized that J is an alternative FOM to linear discriminant analysis (LDA), which is needed when second-order statistics are unequal among the classes. However, instead of optimizing J directly an upper-bound that is quadratic in the channels is used. Once a channel solution is obtained the classification decision is formed using a 1-NN algorithm and accuracy is reported, as opposed to the channelized ideal observer and ROC analysis used in this work. The KL-divergence and Chernoff distances were used in [33] to quantify the effect of compression when the matrix T is populated with random entries and the statistics are described by uncorrelated normal distributions. These results describe upper- and lower-bounds for the J divergence, which are valid with high probability for random channel matrices. In this paper, we are interested in the exceptions to these bounds, i.e., the solutions to T that produce large values of J and are unlikely samples from a random distribution.

D. Relation to Compressive Sensing

In the unrealistic limit of infinite computational resources, and perfect knowledge of the image statistics, efficient channels for mathematical observers are not necessary. As pointed out in [34] “Conventional sensing systems typically first acquire data in an uncompressed form (e.g., individual pixels in an image) and then perform compression for storage or transmission. In contrast, compressive sensing (CS) researchers would like to acquire data in an already compressed form, reducing the quantity of data that need be measured in the first place.” This work on channelized observers is a post-processing method that is complementary to CS. Given the large popularity of this area of research we will remark on the relationship to CS for the sake of clarity and context.

For CS we expand the object function as a finite series in terms of basis functions

f(r)=n=0Nfnψn(r).
The imaging equation [see Eq. (1)] is then replaced with
g=Hf+n.

The M×N matrix H is called the sensing matrix or system matrix. The astonishing breakthrough of CS was based on optimal signal recovery by selecting H to take advantage of sparsity constraints on f [35]. This leads to asymptotic relationships between residuals of the recovered signal and stochastic models for H, f, and n [36]. However, even in some of the early CS studies, it was recognized that improved signal recovery could be achieved when knowledge of the underlying signal of interest was used to select H [37]. Research in this field has expanded to include evaluating candidate solutions to H with respect to information-based metrics from communications theory, such as Shannon information [38]. It was pointed out in [39] that an information-theoretic metric is particularly attractive as it allows an upper-bound on the task-specific performance of an imager, independent of the algorithm used to extract the relevant information. Although this might be advantageous for hardware design, for post-processing of the image data an algorithm, i.e., mathematical observer, is required. Metrics related to estimation, classification, and detection tasks have also been investigated for CS applications; these metrics are then a function of both H and the mathematical observer [40]. To compare multiple solutions for H a constraint must be used, such as equivalent number of photon counts, so that the noise statistics are part of the trade-off with the measurement scheme and the compression ratio. Increasing the quantity of measurements of f naturally transmits more information about the object, but this increase in the dimensionality of H is penalized by a larger measurement noise contribution. The extreme case of a single measurement of f integrates all available photons and suffers minimal measurement deviation due to noise. The optimal solution for H will depend upon the FOM, the statistics of f, and the statistics of n. By contrast, this work concentrates exclusively on post-processing for detection; a test statistic is calculated from the image data via Eq. (9). We seek the optimal solution for T, which depends upon the chosen FOM for the detection task and the statistics of g. An important distinction between these two problems is that in post-processing T is applied to Hf and to n. Thus, the noise statistics of the channelized image data depends on the channel matrix T.

3. FORMULATION OF THE PROBLEM

As a first step, we will introduce some notation and describe the problem that we are considering. All vectors are column vectors unless otherwise specified. The M-dimensional vector g will represent the input to the classifier, which will classify this vector as either belonging to the population corresponding to the probability density function (PDF) pr1(g) or the population corresponding to the PDF pr2(g). This vector may be a direct image, a reconstructed image, or the raw data being produced by an imaging system.

The ideal classifier uses the log-likelihood ratio

λI(g)=ln[pr1(g)]ln[pr2(g)],
as a decision variable and compares the result to a threshold. If the decision variable is above the threshold, then the data vector is assigned to pr1(g), and otherwise it is assigned to pr2(g). This observer maximizes the AUC, as well as other task-based FOMs [2,14]. In imaging applications the dimension M of the vector g is very large. We will assume for i=1,2 that pri(g) is a PDF with mean g¯i and covariance matrix Ki. This creates two problems when we try to implement the ideal observer for the classification task. The first problem is computational; even for Gaussian PDFs we will have to invert two M×M covariance matrices Ki, which may not be feasible if, for example, the input images contain millions of pixels. The second problem is that, if we are estimating the image statistics from data, which is often the case, we will need a very large number of samples to get reliable estimates. For example, in the Gaussian case, the number of samples needs to be at least M to get invertible estimates of the Ki, and typically needs to be an order of magnitude greater to get reliable estimates. This provides a motivation for trying to reduce the dimension of the data vector before implementing the ideal observer.

The data reduction will be implemented by an L×M dimensional matrix T via the equation v=Tg. We refer to the vector v as the channelized data vector and the rows of T as the channels for the data reduction. The number L is the dimension of the channelized data and satisfies LM. We will always assume that T is a full rank matrix so that the channels are linearly independent. We will assume that the PDFs for the channelized data for the two populations are given by normal distributions

pri(v)=[(2π)Ldet(TKiT)]12×exp[12(vTg¯i)(TKiT)1(vTg¯i)],
for i=1,2. The channelized data may be at least approximately Gaussian distributed even if the original data are not. In practice, almost all useful channel matrices are fully populated. Therefore, if the original data components are sufficiently independent from each other, then we can argue from the central limit theorem that the channel outputs will be approximately Gaussian distributed for almost all channel matrices [41]. In this case, we would expect the methods developed in this work to produce channel matrices that are very near optimal in terms of the FOMs discussed below.

Given a fixed set of L channels the optimal classifier will compute the channelized log-likelihood

λ(T,g)=ln[pr1(Tg)]ln[pr2(Tg)],
and compare this decision variable to a threshold. For computational purposes we define, for i=1,2
Qi(v)=(vTg¯i)(TKiT)1(vTg¯i).

Then we have

λCQO(T,g)=12[Q2(Tg)Q1(Tg)lndet(TK1T)+lndet(TK2T)],
where the notation denotes the CQO. The last two terms are usually dropped, since they do not depend on the data. We will not be doing this because we want to vary the channel matrix to search for an optimal set of channels for a given L and therefore we need an unmodified likelihood ratio. All of the FOMs we will be considering depend on moments of λCQO(T,g), which we will be referring to as λ(v).

For any function f(v) we will use the notation f(v)i for the expectation of f(v) in each class,

f(v)i=RLf(v)pri(v)dLv.

The following FOMs will be examined and optimal channels will be determined for each. The FOMs are listed in increasing order of the difficulty of the channel optimization calculations. The first two FOMs are Kullback–Liebler (KL) divergences,

F1(T)=λ(v)1=DKL(pr1pr2),
F2(T)=λ(v)2=DKL(pr2pr1);
we will refer to these FOMs as KL1 and KL2, respectively. (Note that, from here onward, pri will refer to the corresponding PDF for the channelized data.) These two FOMs are measures of the difference between two PDFs and are used extensively in information theory. The third FOM is the symmetrized KL divergence, commonly referred to as J,
F3(T)=λ(v)1λ(v)2.

This symmetric measure of the difference between two PDFs has been related to the ideal-observer AUC in applications where g¯1=g¯2, such as ultrasound imaging [68]. This will be an important special case in our development, below. The fourth FOM is called G(0) and has also been related to ideal-observer detectability [12,42]. This FOM is given by

F4(T)=4lnexp[12λ(v)]1,=4lnexp[12λ(v)]2,=4DB(pr1,pr2).

In this expression DB(pr1,pr2) is the Bhattacharyya distance between the two PDFs, which also has applications in information theory. The fifth and final FOM we will consider for optimizing the channel matrix is the AUC for the channelized ideal observer. It is not obvious, but it can be shown [12] that this FOM can be written as

F5(T)=114π|exp[(12+iα)λ(v)]2|2dαα2+14.

This is not the usual expression for the ideal-observer AUC, but it will be useful for our calculations. The optimization problems we are considering may now be stated.

Optimization Problem j: With the normality assumption above, and for a given value of L, what channel matrix T will maximize Fj(T)?

A proper FOM for channel optimization should satisfy

F(MT)=F(T)
for any invertible L×L matrix M. This is because the channels for MT are all linear combinations of the channels for T, and vice-versa. We will see below that the FOMs Fj(T) defined above all satisfy this requirement. This condition implies that the FOM can be defined as a function on the Grassmanian manifold Gr(L,M) [43]. Since this is a compact manifold and all the FOMs defined above are continuous functions, we know that there is a maximum for each Fj(T). Methods have been developed for optimization on Grassmannians that use coordinate systems on the manifold itself [44]. However, the calculations described below do not follow this approach, but instead simply compute the gradient of Fj(T) with respect to T and set it equal to zero. In a sense, we may regard the entries of the matrix T as a set of homogeneous coordinates for the corresponding point on Gr(L,M) and we have found that, for the functions that we are optimizing, these coordinates are useful.

In all that follows we will define the signal in the mean as s=g¯1g¯2. If K1=K2=K, then the ideal linear observer is equivalent to one that computes the test statistic

λH(g)=(K1s)g,
and compares to a threshold. In the perception literature this is called the Hotelling observer and in other applications is called a prewhitened matched filter or the Fisher discriminant. This test statistic is a sufficient statistic for the detection task when the data are Gaussian and the covariances are equal. A single channel directly reduces the data to a scalar, i.e., L=1. A set of channels will be optimal, in this case, if some linear combination of them is equal to the Hotelling template (K1s). In other words, if there is an L-dimensional vector c such that cT=(K1s). Note that if T is replaced by MT as above, then we can replace c with (M)1c and this condition will still be satisfied. Thus, optimality is a property of a point on Gr(L,M) and does not depend on the specific matrix T used to represent that point. Having disposed of this special case, we will now assume for the remainder of this paper that K1K2 so that the ideal observer is a quadratic observer.

4. KL1 AND KL2

First, we compute the FOM, a computation presented in the Appendices. To describe the end result we introduce notation for the covariance matrices of the channelized data for each data class

Ci(T)=TKiT.

Then we define the matrix function

R12(T)=C21(T)C1(T),
where the inverse exists since T is full rank. Next, we define the scalar function
H2(T)=sTC21(T)Ts.

The KL1 FOM can now be written as

2F1(T)=L+tr[R12(T)]+H2(T)lndet[R12(T)].

The second term in this KL divergence is dominated by the largest eigenvalues of R12(T). The third term is a Hotelling trace term in the channelized data space for the second-hypothesis covariance. The last term is dominated by the smallest eigenvalues of R12(T) if we assume that they are near zero. Now, we wish to maximize this expectation over all possible channel matrices of a given dimension. To do this we will compute the gradient F1(T) of F1(T) and set it equal to zero. This gradient is an L×M matrix-valued function of T and is defined by the relation

ddtF1(T+tE)|t=0=tr[EF1(T)],
which holds for all L×M matrices E. We will use the notation Δ1(T)=F1(T). From Appendix A we then have
Δ1(T)=C21(T)T(K1+ss)[ITC21(T)TK2]C21(T)TK2+C11(T)TK1.

At present we do not have an analytic solution to Δ1(T)=0 unless s=0, which is a special case that we will examine below. The simplest iterative algorithm to try is

T(n+1)=T(n)+ϵΔ1(T(n)),
where the iterative process ends at n=n˜ for some stopping criteria T(n˜)T(n˜+1). The decision variable for this CQO is then calculated by
λCQO(T(n˜),g).

We have successfully used this type of algorithm for the J-FOM; see the Simulations section for these results. Two interesting properties of the gradient, which follow from the invariance property of the FOM, are Δ1(T)T=0 and Δ1(MT)=(M)1Δ1(T). Note that the second property implies that the iteration depends explicitly on the starting matrix T(0). If a different matrix MT(0) that represents the same subspace is used, then the iteration follows a different path on the Gassmannian Gr(L,M). This has not proved to be a problem in implementation so far, but an algorithm that does not have this property is

T(n+1)=T(n)[I+ϵ(T(n))Δ1(T(n))].

This algorithm follows the same path in Gr(L,M) no matter which channel matrix is used to represent the initial subspace.

When the signal is not in the mean s=0, an analytic solution to the gradient equation Δ1(T)=0 is available.

Proposition 1: With the normality assumptions used in this paper, and when there is no signal in the mean, an optimal channel matrix for the KL1 FOM F1(T) can be constructed by using L eigenvectors of K21K1 for the channels, with corresponding eigenvalues κl. These eigenvectors are chosen to have the L largest values of κllnκl. If this channel matrix is denoted Teig then the decision variable for this Eigen-CQO is calculated by

λCQO(Teig,g).
Proof: With no signal in the mean the gradient equation can now be written as
[R12(T)+I](TK2T)1TK2=[R12(T)+I](TK1T)1TK1.

The eigenvalues of R12(T) are the same as the eigenvalues of

C212(T)R12(T)C212(T)=C212(T)C1(T)C212(T),
which is symmetric and positive definite. These eigenvalues are therefore real and positive. This implies that R12(T)+I is invertible and may be removed from the gradient equation. After doing this, taking the adjoint, and performing some additional matrix algebra we have
K21K1T=TR12(T).

Now let M be an invertible L×L matrix. If the channel matrix T produces the maximum value for F1(T), then the matrix MT does also. Setting the gradient equal to zero for this matrix gives

K21K1TM=TM[(M)1R12(T)M],
which is indeed equivalent to the original gradient equation for T. Since, as noted above, R12(T) is related to a symmetric positive definite matrix by a similarity transformation, it is diagonalizable and, as we said before, the eigenvalues are all positive real numbers. If we choose M to diagonalize R12(T), then we have
K21K1TM=TMΛ(T),
where Λ(T) is a diagonal matrix. Therefore, the columns of (MT) are eigenvectors of K21K1 and the numbers on the main diagonal of Λ(T) are eigenvalues of K21K1. Now, we will replace T with MT in F1(T) and find that, when the gradient is zero at T, we have
F(T)=F(MT)=l(κllnκl),
where the sum is over the eigenvalues that appear in Λ(T). Therefore, to maximize F1(T) we should choose the rows of T to be the L eigenvectors tl of K21K1 that have the L largest values for κllnκl. ▪

We will find similar answers for other FOMs, the only difference being which eigenvectors to choose. Note that the analytic solution depends on inverting K2 and finding the eigenvectors for K21K1. Since these are both M×M matrices, this may be computationally difficult. On the other hand, the iterative algorithms only require inverting C1(T) and C2(T), which are L×L matrices. Thus, even when the signal in the mean is zero, we may want to use an iterative algorithm for practical reasons. There is a problem with local maxima for the iterative algorithms presented above and this will be discussed in detail for the J FOM, below.

With obvious notational changes, the results for the KL2 FOM can be written down from the KL1 FOM by interchanging 12 in the subscripts. We then have

2F2(T)=L+tr[R21(T)]+H1(T)lndet[R21(T)].

The gradient of F2(T) is given by

Δ2(T)=C11(T)T(K2+ss)[ITC11(T)TK1]C11(T)TK1+C21(T)TK2.

We can use this in iterative algorithms, as we did for the KL1 FOM. For no signal in the mean, there is again an analytic solution to the optimization problem.

Proposition 2: With the normality assumptions used in this paper, and when there is no signal in the mean, an optimal channel matrix for the KL2 FOM F2(T) can be constructed by using L eigenvectors of K21K1 for the channels, with corresponding eigenvalues κl. These eigenvectors are chosen to have the L largest values of κl1+lnκl.

Proof: We only need to note that K11K2=(K21K1)1 and repeat the proof of Proposition 1 with the index change 12. ▪

Note that the optimal KL1 and KL2 channel matrices both draw from the eigenvectors of K21K1 but may choose different channels due to the different emphasis on using the large as opposed to the small eigenvalues of this matrix.

5.

J

Since F3(T)=F1(T)+F2(T) the symmetrized KL divergence is given by

2F3(T)=2L+tr[R12(T)]+H2(T)+tr[R21(T)]+H1(T).

The gradient of this FOM is

Δ3(T)=C11(T)T(K2+ss)[ITC11(T)TK1]+C21(T)T(K1+ss)[ITC21(T)TK2].

This gradient can be put into either of the iterative algorithms and is the one used in the simulations, for reasons to be discussed below. When there is no signal in the mean the equation Δ3(T)=0 can again be solved analytically.

Proposition 3: With the normality assumptions used in this paper, and when there is no signal in the mean, an optimal channel matrix for the J FOM F3(T) can be constructed by using L eigenvectors of K21K1 for the channels, with corresponding eigenvalues κl. These eigenvectors are chosen to have the L largest values of κl+κl1.

Proof: With no signal in the mean the equation Δ3(T)=0 can be written as

[R12(T)R21(T)](TK1T)1TK1=[R12(T)R21(T)](TK2T)1TK2.

We will assume that the matrix in the square brackets is invertible and check this assumption later. This leads to the familiar equation

K21K1T=TR12(T).

As in Proposition 1, this leads to the result that an optimal channel matrix may be constructed from L eigenvectors of K21K1 with corresponding eigenvalues κl. The value of the J FOM at this matrix is

F3(T)=12l(κl+1κl)L.

This expression leads to the last statement in the proposition.

Now we need to go back and check the invertibility assumption for the matrix R12(T)R21(T). For positive κ, the minimum value for κ+κ1 occurs at κ=1. If the corresponding eigenvector ends up in the channel matrix T, then R12(T)R21(T) will not be invertible, since this eigenvector would be a null vector of this matrix. However, there would be no point in including such a channel since it would not help to distinguish the two hypotheses. We can see this by writing the eigenvector equation for the channels as K1tl=κlK2tl. We can use the properties of symmetric matrices to show that these eigenvectors may be chosen to satisfy the dual orthogonality properties,

tlK2tl=δll,tlK1tl=κlδll.

Thus, the optimal channelized vector components are independent under both hypotheses, since they are Gaussian distributed in both cases. Also, if κl=1, then the corresponding component has the same variance under both hypotheses. Thus, that component will not appear in the log-likelihood, and we do not need to use it. The end result is that we can remove this row from the channel matrix T without any loss of information and reduce the number of channels by one. If, in the process of choosing eigenvectors that maximize κl+κl1, we are forced to choose one with κl=1, then we do not need L channels at all. In this case there would be no loss of information in using L1 channels. Of course, there is also no harm in including such a channel and, in this sense, the statement of the proposition still holds. ▪

Note that the set of eigenvectors we choose from for optimizing J, is the same as for optimizing KL1 and KL2 but, again, we may make different choices for the L eigenvectors to use for the channel matrix T. We will see that the last two FOMs lead to the same choice of the L eigenvectors for the optimal channel matrix as the J FOM.

6. G(0)

The expectation that appears in the G(0) FOM can be computed after some effort. We know that F4(T)=4lnI(T), where

I(T)=RL[pr1(v)pr2(v)]12dLv.

We need to complete the square on the exponent that appears in the integrand. First, we define a symmetric matrix C(T) by 2C1(T)=C11(T)+C21(T) and the vector c(T) by 2c(T)=C(T)[C11(T)Tg¯1+C21(T)Tg¯2]. Next, we define the function H(T) by

4H(T)=2c(T)C1(T)c(T)g¯1TC11(T)Tg¯1g¯2TC21(T)Tg¯2.

Then we have the following relation,

[vc(T)]C1(T)[vc(T)]H(T)=12(vTg¯1)C11(T)(vTg¯1)+12(vTg¯2)C21(T)(vTg¯2).

After performing the integral we have an expression for the G(0) FOM:

F4(T)=2lndetC(T)+lndetC1(T)+lndetC2(T)+H(T).

Taking the gradient of this expression is very complicated and we will not pursue it further. However, when there is no signal in the mean H(T)=0 and the gradient calculation simplifies considerably. This equal mean case was considered in [29] and includes a proof that the optimal channel matrix for the G(0) FOM F4(T) can be constructed by using L eigenvectors of K21K1 for the channels, with corresponding eigenvalues κl. These eigenvectors are chosen to have the L largest values of κl+κl1.

This means that a channel matrix that is optimal for the G(0) FOM is also optimal for the J FOM, and vice versa. This happens even though these two SFOMs are clearly different. Both of these FOMs have been put forward as good approximations to the square of the ideal-observer detectability dA(T) [12], which is defined by the relation

F5(T)=12+12erf[12dA(T)],
where erf[·] is the error function. It remains to be seen which of these two FOMs gives a better approximation to this detectability.

7. AUC

For the AUC FOM we will only consider the case where there is no signal in the mean. The result in this case is familiar by now.

Proposition 4: With the normality assumptions used in this paper, and when there is no signal in the mean, an optimal channel matrix for the AUC FOM F5(T) can be constructed by using L eigenvectors of K21K1 for the channels, with corresponding eigenvalues κl. These eigenvectors are chosen to have the L largest values of κl+κl1.

Proof: We will assume that g¯1=g¯2=0. This involves no loss of generality since we could subtract the known mean channelized data vector from the channelized data vector as a preprocessing step that would not affect the channelized ideal observer (CIO) AUC. As noted above, the CIO AUC can be written as

F5(T)=114π|exp[(12+iα)λ(v)]2|2dαα2+14.

The expectation factors into two terms, one from the Gaussian normalization factors and one from the expectation integral

Λ12+iα(v)2=N(α,T)I(α,T).

The normalization component is

N(α,T)=[1(2π)LdetC1(T)]12+iα×[1(2π)LdetC2(T)]12iα.

The square magnitude of this factor

|N(α,T)|2=1(2π)L[1detC1(T)][1detC1(T)]
does not depend on α and therefore factors out of the α integral. The integral factor I(α,T) is given by
RLexp{14v[C11(T)+C21(T)]viα2v[C11(T)C21(T)]v}dv.

Now we replace T with MT. This does not change the CIO AUC since the variable v in the integral will be replaced by u=Mv and the Jacobian factor from the change of variable in I(α,T) will cancel out with a |det(M)| factor from N(α,T). We may choose M to diagonalize both channel covariance matrices with MC2(T)M=I and MC1(T)M=D(T). The diagonal matrix D(T) has the eigenvalues of R12(T) along the diagonal. These diagonal entries are also the eigenvalues of C212(T)C1(T)C212(T), which is symmetric and positive definite. Therefore, the diagonal entries of D(T) are real and positive. We will call these numbers μl for l=1,,L. Now we have for the normalization component

|N(α,MT)|2=1(2π)Ll=1Lμl.

We may also write this equation as

|N(α,MT)|2=1(2π)LdetR12(T).

For the integral factor we have

I(α,MT)=RLexp[14l=1L(1μl+1)ul2iα2l=1L(1μl1)ul2]dLu.

This is a product of one dimensional integrals and can be computed analytically with the result

I(α,MT)=πL2l=1L[14(1μl+1)+iα2(1μl1)]12.

We may also write this equation as

I(α,MT)=(2π)L2det[(12+iα)R21(T)+(12iα)I],
since R21(T)=R121(T). Combining the results so far we have
|Λ12+iα(v)2|2=[P(α,T)]12
with P(α,T)=detP(α,T) and P(α,T) is the matrix given by
[(12+iα)R12(T)+(12iα)I]×[(12+iα)R21(T)+(12iα)I].

This matrix simplifies to

P(α,T)=(14+α2)[R12(T)+R21(T)]+2(14α2)I.

We now have

F5(T)=114π[P(α,T)]12dαα2+14,
which we can also write as
F5(T)=114πl=1L[(14+α2)(μl+1μl)+2(14α2)]12dαα2+14.

To compute the gradient of F5(T) we need the gradient of P(α,T). This can be found in the Appendices. The end result of setting Δ5(T)=0 is the familiar equation

K21K1T=TR12(T).

We may conclude, as before, that the μl are eigenvalues of K21K1 and that we may take the rows of the channel matrix T to be the corresponding eigenvectors.

We will replace μl with κl now and pick the κl to minimize the integral expression

l=1L[(14+α2)(κl+1κl)+2(14α2)]12dαα2+14.

The corresponding eigenvectors of K21K1 will give the rows of the optimum channel matrix. This integral expression can also be written as

l=1L[14(κl+1)2κl+α2(κl1)2κl]12dαα2+14.

Now, define

yl=4κl(κl+1)2,
and write the integral expression we want to minimize by our choice of the κl as
l=1Lyl1+4α2(1yl)dαα2+14.

Now, there are two facts we can use to solve our problem. A little algebra shows that 0<yl1 since all of the κl are positive. Within this range the function

f(y)=y1+4α2(1y)
is monotonic, increasing for any α. Therefore, we should choose the eigenvalues κl that produce the L minimum values for yl. But
1yl=14(κl+1κl+2).
Therefore, we may conclude that to maximize F5(T) we should choose the rows of T to be the eigenvectors of K21K1 whose corresponding eigenvalues κl produce the L largest values for κl+κl1. If we go back to Appendix B and examine the matrices U1(T) and U2(T) for this channel matrix we find that they are diagonal and invertible as long as κl1. If any of these eigenvalues are unity, then this situation can be dealt with as in Proposition 3. ▪

The optimal channel matrix for the AUC FOM is the same optimal channel matrix we arrived at for the J and G(0) FOMs. Thus, these quantities can be regarded as surrogate FOMs for the CIO AUC in the sense that they are optimized at the same channel matrix that optimizes the AUC FOM, which is the preferred FOM since it is directly related to task performance. On the other hand the J and G(0) FOMs are also easier to compute than the AUC FOM. The J FOM, in particular, is much easier to deal with than the AUC FOM and, since it leads to the same optimal channel matrix, we have used the J FOM in our simulations. This leads to an examination of the possibility of local maxima for the J FOM, which we examine in the next section.

8. LOCAL MAXIMA FOR J WITH NO SIGNAL IN THE MEAN

To determine the local maxima we need the Hessian for F3(T). This Hessian evaluated at T is a linear operator D(T) from the space of L×M matrices to itself. If E is an arbitrary L×M matrix, then D(T) is determined uniquely as the operator that satisfies

d2dt2F3(T+tE)|t=0=tr[ED(T)E]
for all E. In this notation D(T)E is the L×M matrix that results from the linear operator D(T) acting on E. Specifically, we are interested in the eigenvalues of the Hessian at the points where the gradient of F3(T) is zero.

Proposition 5: With the normality assumptions used in this paper, and when there is no signal in the mean, any local maximum for the J FOM F5(T) can be constructed by using L eigenvectors of K21K1 for the channels, with corresponding eigenvalues κl. For an integer N with 0NL these eigenvectors must be chosen to have the N largest values of κl that are also >1, and the LN smallest values of κl that are also <1. Thus, barring degeneracies, there are no more than L+1 local maxima, one of which is the global maximum. ▪

Proof: We will fix the channel matrix T and the perturbation matrix E and write Ci(T+tE)=C1(t) and Rij(T+tE)=Rij(t). Then we have F3(t)=tr[R12(t)]+tr[R21(t)] with the relations C2(t)R12(t)=C1(t) and C1(t)R21(t)=C2(t). The basic ingredients of the calculation are the following equations:

Cj(0)=TKjT,
Cj(0)=EKjT+TKjE,
Cj(0)=2EKjE,
Rij(0)=Cj1(0)[Ci(0)Cj(0)Rij(0)],
Rij(0)=Cj1(0)[Ci(0)Cj(0)Rij(0)2Cj(0)Rij(0)],
F3(0)=tr[R12(0)]+tr[R21(0)].

With these equations we can compute tr[ED(T)E] for any channel matrix T and perturbation matrix E.

When we are at a point where the gradient is zero, then, as noted above, we may assume that K1T=K2TΛ where Λ is a diagonal matrix. We also may assume the dual orthogonality conditions [see Eq. (41)] on the rows of T, which we can write as C1(0)=Λ and C2(0)=I.

To proceed further it is useful to define the matrix T0 as the (ML)×M matrix whose rows form the set of eigenvectors of K21K1 that do not appear in the given channel matrix T. We also define Λ0 as the diagonal matrix with the corresponding eigenvalues κ0m on the diagonal. Now we have the orthogonality relations T0KjT=0 and TKjT0=0. We may also assume that these eigenvectors satisfy T0K1T0=Λ0 and T0K2T0=I. We may then write a decomposition of the perturbation matrix E=BT+B0T0 where B is L×L and B0 is L×(ML). Then R12(0)=Λ and R21(0)=Λ1 and

C1(0)=BΛ+ΛB
C2(0)=B+B
C1(0)=2BΛB+2B0Λ0B0
C2(0)=2BB+2B0B0.

Placing these in the formula above for F3(0) we find that all of the terms with B in them cancel, as they should since a perturbation of the form E=BT would give F3(t)=F3(0) for all t in an interval about 0. Therefore, we may restrict our perturbations to those of the form E=B0T0, since these are the directions that move us away from the point corresponding to T on the Grassmann manifold Gr(L,M). The final result is

tr[ED(T)E]=2tr[(Λ1Λ)(B0B0Λ1B0Λ0B0)].

Using the orthogonality relations of the eigenvectors we can also write this as

tr[ED(T)E]=2tr[(B0T0)(Λ1Λ)(B0Λ1B0Λ0)T0K2].

Therefore, the Hessian operator at a point on Gr(L,M) corresponding to a channel matrix T where the gradient of F3(T) is zero is given by

D(T)B0T0=(Λ1Λ)(B0Λ1B0Λ0)T0K2.

We want to know the eigenvalues of this operator.

First, we will show that, without loss of generality, we may assume that K2 is the identity matrix. If this is not the case define a new data vector by g=K212g. Then, in the obvious notation, we have new covariance matrices K2=I and K1=K212K1K212. If T is a channel matrix, define T=TK212. Then we have for channel outputs v=Tg=Tg. It may also be easily verified that

F3(T)=tr[(TK2T)1(TK1T)]+tr[(TK1T)1(TK2T)]=F3(T),
which we would expect since the channel outputs are the same. Therefore, T is a local maximum for F3(T) if and only if T is a local maximum for F3(T). Also note that the eigenvalues of K21K1 are the same as those for (K2)1K1, although the eigenvectors are different.

Dropping the primes we will now assume that K2=I. Let us define an L×(ML) matrix by [B0pq]lm=δplδqm. We then have

D(T)B0pqT0=(1κpκp)(1κ0qκp)B0pqT0.

We have a local maximum if and only if all of the eigenvalues of D(T) are negative. Therefore, the necessary and sufficient conditions for a local maximum are:

  • 1) For all κp<1 we have κ0q>κp for all q
  • 2) For all κp>1 we have κ0q<κp for all q.

Therefore, the κp at a local maximum must consist of the N largest eigenvalues of K21K1 that are also >1, and the LN smallest eigenvalues that are also <1, for some N with 0NL. Thus, barring degeneracies, there are no more than L+1 local maxima, one of which is the global maximum. ▪

In principle running a gradient-based algorithm repeatedly from different starting points would eventually locate all of the local maxima, and thus be able to determine the global maximum. The corresponding channel matrix will also provide a global maximum for the G(0) and AUC FOMs.

9. SIMULATIONS

In this section, the AUC of the J-CQO calculated from a gradient-based search [see Eq. (25)] is compared to three conventional observers: ideal [see Eq. (5)], Hotelling [see Eq. (17)], and Eigen-CQO [see Eq. (27)] under several simulated imaging conditions which correspond to various differences in the first- and second-order statistics of the two classes. The two classes of images are simulated on a 50×50 pixel grid. Here, M=502 and this number is selected to accommodate the computational time required for the Ideal, Hotelling, and Eigen-CQO which all require an M×M matrix inverse. By contrast the J-CQO only requires inversion of L×L matrices where L is the number of channels.

The covariance matrix of each class is chosen to be circulant with a Gaussian kernel and parameterized by a correlation length as in

[Ki]n,m=(b¯SNR)2×exp(((nxmx)mod50)2+((nymy)mod50)22σi2)
where σi is the correlation length of the ith class, n and m are two-dimensional pixel indices, the SNR is set to 100 and the average value of the background b¯ is 25 through the simulation study. The mean of the first class is constant across the image plane, i.e., all elements of g¯1 equal b¯. The correlation length of the second class is always σ2=0.3 pixels. This relatively simple model of the covariance matrices is chosen to provide an analytic solution for the Eigen-CQO decision variable.

The observers’ performance is compared for different choices of g¯2 and σ1. The case of signal present in the mean is simulated by

g¯2=g¯1+Ass
where the support of s is one pixel in the middle of the image. The signal strength is expressed as contrast which is defined as (As+b¯)/b¯; a contrast of one indicates the signal is absent. Differences in the covariance matrices of the two classes are expressed as the difference in their correlation lengths, Δσ=σ1σ2. Example images from each class are displayed side by side in Fig. 1. In Fig. 1(a) Δσ=0.2 pixels and in Fig. 1(b) Δσ=0.5 pixels.

 figure: Fig. 1.

Fig. 1. Top image is a sample from the first class and the bottom image is a sample from the second class. The images look more similar in (a) since Δσ is smaller compared to (b). Movies for (a) and (b) are generated by repeatedly sampling from the ensemble (Media 1 and Media 2). (a) Δσ= 0.2 pixels. (b) Δσ = 0:5 pixels.

Download Full Size | PDF

The AUC is estimated from the percent correct of 1,000 independent sample images [2]. Five independent trials of the AUC estimates are calculated and the mean and standard deviation are displayed as error plots in Figs. 2, 4, and 5. The kth sample image from the ith class is computed from

gik=g¯i+Ki1/2w
where w is a zero mean image of uncorrelated white noise of unit variance.

 figure: Fig. 2.

Fig. 2. AUC comparison for (a) equal and (b) unequal mean images for the two classes. The AUC is calculated at various values of Δσ, the correlation length difference between the covariance matrices of the two classes. The number of channels is constant at L=25 and the compression ratio M/L=100. (a) Signal absent in the mean, (b) signal present in the mean.

Download Full Size | PDF

 figure: Fig. 3.

Fig. 3. Eigenspectrum of K21K1 for six values of Δσ. The highest 25 values of κ+1/κ are marked in red (in the top left part of each curve) to denote the Eigen-CQO selections. The number of channels is constant at L=25. Δσ = 0.1 pixels (a), 0.15 pixels (b), 0.2 pixels (c), 0.25 pixels (d), 0.30 pixels (e), 0.35 pixels (f).

Download Full Size | PDF

 figure: Fig. 4.

Fig. 4. Observer comparison for unequal mean images and varying the magnitude of the mean difference. The number of channels and second-order statistics are held constant; L=25 and Δσ=0.15 pixels.

Download Full Size | PDF

 figure: Fig. 5.

Fig. 5. Observer comparison for (a) equal and (b) unequal mean images under both classes and varying the number of channels. The second-order statistics are constant; Δσ=0.15 pixels. (a) Signal absent in the mean, (b) signal present in the mean.

Download Full Size | PDF

In Fig. 2 the AUC is calculated as a function of correlation length differences between the two classes while the number of channels, L, remains constant. This is repeated for both signal absent and present in the image means; see Figs. 2(a) and 2(b) respectively. In Fig. 2(a) all observers achieve an AUC of 0.5 as expected when Δσ=0 and g1=g2 since there is no difference between the two classes in this case. The AUC of the ideal observer saturates to 1.0 when a relatively small correlation length difference is introduced. The Hotelling template is zero when the means of the two classes are equal so its AUC remains at 0.5 and is invariant to the correlation length difference. As Δσ increases the performance of both the J-CQO and Eigen-CQO increase. The AUC of the Eigen-CQO yields better performance as predicted in Proposition 4 but in this simulation the difference is less than about 0.02 in AUC and the performance gap decreases as Δσ increases. Both observers saturate to an AUC of 1.0 when Δσ is sufficiently large.

As shown in Fig. 2(b) the Eigen-CQO is not necessarily better than the J-CQO when the image means are different. Here, the AUC is evaluated over the same range of Δσ but the AUC is not 0.5 at Δσ=0 since g1g2. At Δσ=0 all observers have approximately the same AUC of 0.68 except the Eigen-CQO which is close to 0.50. This is expected since the Eigen-CQO uses only second-order statistics of the image which are identical in this case. The Hotelling observer AUC is not expected to change as a function of Δσ since it only uses first-order statistics of the images (i.e., an affine transform) to calculate a decision variable. The change in Hotelling AUC in Fig. 2(b) is due to sample statistics from estimating the AUC using 5 independent trials of 1,000 images from each class. As expected the ideal observer’s AUC is equal to or greater than the AUC for the signal absent case since this task is easier. Again, the Eigen-CQO and J-CQO performance both increase with Δσ but when the signal is present in the mean the J-CQO performance is better for the more difficult tasks. In addition to this performance increase a critical advantage of J-CQO over Eigen-CQO is tractability of the calculation for high-dimensional images. Here, images are simulated on only a 50x50 grid to allow reasonable calculation times of the Eigen-CQO. For larger images calculating the channels for CQO from the gradient-based method would remain feasible while the M×M matrix inverse required for the Eigen-CQO would become infeasible.

To understand the computation of the Eigen-CQO, Fig. 3 presents the eigenvalues of K21K1 for several values of Δσ. Recalling Eq. (27) the channels for Eigen-CQO are computed by selecting the eigenvectors that correspond to the largest values of κ+1/κ. Comparing the eigenspectrum in Fig. 3 for six Δσ values reveals that as the difference between the two classes increases the abundance of small eigenvalues that contribute to the κ+1/κ rank-ordering also increases. If the number of channels were increased eventually larger eigenvalues would also be included in the κ+1/κ rank-ordering. These trends are specific to the circulant Gaussian covariance matrices used in this example [see Eq. (86)]. Note that according to Eq. (41) each κl is the ratio between the lth channel output variances under each of the two classes. If these variances are equal then the channel output offers no discrimination when g1=g2. The maximum discrimination occurs when this ratio is either large or small compared to one. In theory different combinations of eigenvalues from the high and low end of the eigenspectrum result in 26 (i.e., L+1) local maxima as discussed in Section 8. In this simulation study independent random samples were used as the initial solution in the iterative algorithm (see Fig. 6 for both mean equal and unequal cases) and resulted in different values of J which confirms the presence of local maxima.

 figure: Fig. 6.

Fig. 6. J versus iteration number in Eq. (24) for (a) equal and (b) unequal means. Gradient term in Eq. (24) versus iteration number for (c) equal and (d) unequal means. The second-order statistics are constant; Δσ=0.25 pixels. Discrepancies for different runs (using independent random samples for the initial solution) of the iterative gradient-based search algorithm indicate the presence of local maxima. (a), (c) Signal absent in the mean. (b), (d) Signal present in the mean.

Download Full Size | PDF

To investigate the change in observer performance as a function of signal strength, Fig. 4 shows all observer’s AUC as the signal contrast changes. The number of channels for the J-CQO and the Eigen-CQO is held constant. Per our definition of signal contrast, 1.0 is equivalent to signal absent, hence the Hotelling AUC is 0.5 at this point. However at this point the J-CQO is about 0.81 since Δσ is held constant at 0.15 pixels. The J-CQO AUC is higher than Hotelling, both increase with signal contrast, and their values become increasingly similar as the contrast is increased. The AUC for both observers eventually saturates to 1.0 as the contrast is increased. The Eigen-CQO is constant with respect to signal contrast (within fluctuations due to sample statistics) since it uses only second-order statistics to calculate a decision variable and the second-order statistics are not changing. Another critical advantage of the J-CQO over the Eigen-CQO is that both first- and second-order statistics of the images are combined to calculate a decision variable.

Figure 5 is an investigation into the effect of number of channels on Eigen-CQO and J-CQO for both (a) signal absent and (b) present in the mean. The number of channels range from 1 to 50 which corresponds to a compression ratio of 2500 and 50, respectively. As expected, the ideal observer’s AUC is saturated to 1.0 and the Hotelling AUC is 0.50 when the signal is absent in the mean. In Fig. 5(a) for the signal absent case both the Eigen-CQO and J-CQO AUC increase with number of channels, the rate of this increase is approximately the same for both, and the Eigen-CQO performs slightly better as predicted in Proposition 4. A very weak signal contrast of 1.01 is used in Fig. 5(b) for the signal present case. Here, the J-CQO’s AUC is higher than the Eigen-CQO over the selected range of number of channels. As the number of channels increases the Eigen-CQO and J-CQO performance become increasingly similar. Since the J-CQO implementation requires selecting the number of channels prior to image processing the sensitivity of performance to number of channels will be an important quantity to determine experimentally when using the J-CQO in a new imaging application.

The channels of the J-CQO are calculated using a nonlinear iterative algorithm with an additive update term in Eq. (24). For both signal present and absent in the mean, Figs. 6(c) and 6(d) show the magnitude of the gradient, with respect to iteration number, multiplied by a scalar step-size factor ϵ which is used to scale the additive update. The calculation was repeated three times for independent random initial values for the channel matrix entries. Smooth convergence toward a negligible gradient is observed for all three runs. Figs. 6(a) and 6(b) shows the scalar-valued quantity to be maximized (i.e., J) as a function of iteration number, also for signal absent and present and three independent runs each. All three runs converge quickly over this range of iterations but to slightly different values of J for each run. For the case of signal absent in the mean [Fig. 6(a)] the three trial runs of J-CQO are compared to the Eigen-CQO value of J which, as expected, is greater.

Figure 7(a) is the eigenspectrum of R12 for the equal means case. Here, L=25 and for the equal means case the 25 eigenvalues for each independent run are so nearly equal that they are plotted on top one another in the figure. As shown in Eq. (40) the value of (1/2)(κl+(1/κl))L is equal to the contribution that the channel corresponding to the eigenvalue κl makes to the J FOM. Therefore, the plot of (κl+(1/κl)) versus κl can be interpreted as rank-ordering the importance of each eigenvector of R12 with respect to the J FOM. It is interesting to note from Eq. (40) that if one channel is added with a value of (κl+(1/κl)) equal to 2 then the value of J would not change by adding that channel. In Fig. 7(a) none of these values are equal to 2 and therefore deleting any of the channels would decrease the value of J.

 figure: Fig. 7.

Fig. 7. Eigenspectrum of R12 [see Eq. (19)] for (a) equal and (b) unequal means and Δσ=0.25 pixels and L=25. Discrepancies for different runs (using independent random samples as the initial solution) of the iterative gradient-based search algorithm indicate the presence of local maxima. When a signal in the mean is introduced an eigenvalue close to one appears in the eigenspectrum. The corresponding channel is the matrix-vector product of T and the corresponding eigenvector. This channel is shown in (c) to demonstrate that the signal in the mean affects the J-CQO channels. (a) Signal absent in the mean and (b) signal present in the mean. (c) T.

Download Full Size | PDF

Figure 7(b) is the eigenspectrum of R12 for the unequal means case. When the two classes have different mean values the eigenspectrum changes, as compared to the equal means case, most notably by a single eigenvalue that appears close to one. The disagreement between each independent run is most notable in the selection of this eigenvalue closest to one. The corresponding channel is the matrix-vector product of T and the eigenvector. For one of the independent runs this channel is given in Fig. 7(c) to show the dependence of the J-CQO channels on the difference in the means between the two classes, which is isolated in a single pixel at the center of the image.

In Fig. 8 the top row displays the channels solutions (i.e., the rows of the matrix T) for the J-CQO iterative algorithm for three values of the correlation length difference between the two classes with equal means. These solutions are nonunique since T and MT give the same value for J [see Eq. (16)]. To allow visual comparisons between different Δσ the bottom row of Fig. 8 shows the eigenvectors of R12 [see Eq. (19)] back projected by T. This corresponds to choosing the matrix M that diagonalizes R12 therefore the channel matrices in the top row produce the same value of J as the corresponding channel matrices in the bottom row. The Eigen-CQO channels, which are optimal for the equal means case, are expected to be spatial frequency channels since the matrix K11K2 is stationary in this example [see Eq. (86)]. The appearance of the J-CQO channels in the bottom row of Fig. 8 are encouraging since they exhibit obvious frequency and orientation characteristics.

 figure: Fig. 8.

Fig. 8. Top row displays the channels (i.e., rows of the matrix T) for the J-CQO iterative algorithm for three values of the correlation length difference between the two classes with equal means. L=25 and each row of the channel matrix is displayed as a single image in a 5×5 tiling. The bottom row shows the eigenvectors of R12 [see Eq. (19)] back projected by T. (a) J-CQO T (Δσ=0.2 pixels), (b) J-CQO T (Δσ=0.5 pixels), (c) J-CQO T (Δσ=1 pixel), (d) backprojection of R12 eigenvectors by T (Δσ=0.2 pixels), (e) backprojection of R12 eigenvectors by T (Δσ=0.5 pixels), (f) backprojection of R12 eigenvector by T (Δσ=1 pixel).

Download Full Size | PDF

10. CONCLUSIONS

In this paper, we have shown a promising new method for computing optimized channels for quadratic observers utilizing high-dimensional image data. We have assumed Gaussian statistics for the channelized image data, but the method for calculating channels is applicable in general. Gradient-based algorithms for determining the channels have been presented for five different information-based FOMs: KL1, KL2, J, G(0), and AUC. Analytic solutions for the optimum channels for each of the five FOMs have been derived for the case of equal mean data for both classes. The optimum channels for the J, G(0), and AUC FOMs under the equal mean condition are the same. This result is critical for implementing J as a surrogate FOM for AUC since the J FOM is much easier to compute. Since the algorithm is gradient-based, local maxima are an important consideration. For the J FOM, with the equal mean condition, we have shown that there are no more than L+1 local maxima where L is the number of channels. In principle all of the local maxima of the equal-mean J FOM could be located by using multiple random starting locations in the algorithm. In a simulation study we have calculated the performance of ideal and Hotelling observers and compared these to CQO performance. Optimal channel selection for the CQO has been calculated from two different methods: eigenanalysis and an iterative gradient-based algorithm. The gradient-based algorithm is an important new contribution since it makes optimal CQO implementation possible without inverting an M×M matrix for eigenanalysis.

Implementing the CQO in a new imaging application requires a set of channels and the first- and second-order statistics of channelized image data from both classes. A common technique to estimate the channelized first- and second-order statistics is from finite training data for both classes. Here, the dimensionality reduction from M number of original measurements to L number of channels is a critical advantage of the J-CQO method since estimating first- and second-order statistics from uncompressed image data would require larger sample sizes. Another important advantage of the CQO is that we only need to invert two L×L matrices instead of inverting two M×M matrices. In this simulation study we have shown excellent CQO performance for compression ratios (M/L), on the order of 100. In practice, this compression ratio will depend upon the difficulty level of the task and performance requirements of the observer.

The J-CQO algorithm has the advantages of the CQO plus the property that the channels can be estimated using image samples from both classes. This can make binary classification possible in an exploratory imaging application where the relationship between the signal to be detected and the image data is not well understood. In such an application the J-CQO channel solution gives insight into which measurements are most useful for the detection task. The iterative solution for the J-CQO channels [see Eqs. (38) and (24)] does not require estimates of the image covariances K1 and K2 as an intermediate step. At each step in the iteration, training images from each class are channelized (i.e., compressed) using the candidate channel matrix solution. The sample means and sample covariance matrices for the channel outputs are computed. The sample cross-covariance matrices between the channel outputs and the original data are also computed (these are the matrices TK1 and TK2). The sample channelized means, sample cross-covariance matrices, and inverses of the sample channelized covariance matrices are then used to calculate the gradient of J with respect to the channel matrix T. As long as the number of independent training images from each class exceeds the number of channels, the sample channelized covariance matrices will be full rank and therefore invertible. The novelty of this approach is that a prior description of the signal is not required. Instead, training data from each class can be used to estimate the channels.

We are currently working on analytic results for the case of nonequal means. Our simulation results showed that J-CQO performed better than Eigen-CQO for the case of nonequal means; see Figs. 2(b) and 5(b). In this situation, results in Fig. 7(c) indicate that to, achieve optimal-performance, Hotelling-type channels that are not eigenvectors of K21K1 are necessary. We are also developing analytic results and algorithms for optimizing channelized observer performance for estimation tasks. In this case, we use the Fisher information as a FOM. To meet the demands of a wider variety of imaging applications we will extend these CQO methods to classification tasks where there are more than two classes. Results in [45] indicate that human visual adaptation is nonlinear, this motivates testing the predictive power of J-CQO performance to human performance for texture discrimination.

APPENDIX A: KL1 AND KL2

To compute F1(T), first we find

Q1(v)1=trC11(T)(vTg¯1)(vTg¯1)1=L.

For the other expectation we need we have, after some computation,

Q2(v)1=tr{C21(T)[C1(T)+TssT]}.

Using properties of the trace we then have

2λ(v)1=L+tr[C21(T)C1(T)]+sTC21(T)Tslndet[C21(T)C1(T)].

This leads to the expression in the main text,

2F1(T)=L+tr[R12(T)]+H2(T)lndet[R12(T)].

To compute F1(T) we start with

D1(T)=ddttr[R12(T+tE)]|t=0.

After some computation we find that D1(T) is given by

2tr{E[R12(T)C21(T)TK2+C21(T)TK1]}.

Next, we have

D2(T)=ddtlndet[R12(T+tE)]|t=0.

A useful fact here is that

ddtlndet[R12(T+tE)]=tr[R121(T+tE)ddtR12(T+tE)].

After some computation we find that D2(T) is given by

2tr{E[C21(T)TK2+C11(T)TK1]}.

Finally, we have

D3(T)=ddtH2(T+tE)|t=0,
which eventually simplifies to
2tr{E[C21(T)TssTC21(T)TK2+C21(T)Tss]}.

The gradient F1(T) is therefore

C21(T)T(K1+ss)[ITC21(T)TK2]C21(T)TK2+C11(T)TK1.

APPENDIX B: AUC

We calculate the following derivatives first:

ddtR12(T+tE)|t=0=C21(T)[(EK2T+TK2E)R12(T)+(EK1T+TK1E)],
and
ddtR21(T+tE)|t=0=C11(T)[(EK1T+TK1E)R21(T)+(EK2T+TK2E)].

The calculation of the required gradient is now straightforward, if a bit tedious. We define

Q1(α,T)=P1(α,T)C11(T)=[C1(T)P(α,T)]1
and
Q2(α,T)=P1(α,T)C21(T)=[C2(T)P(α,T)]1.

These are symmetric matrices. We also define

S1(α,T)=R21(T)Q1(α,T)=[C1(T)P(α,T)R12(T)]1
and
S2(α,T)=R12(T)Q2(α,T)=[C2(T)P(α,T)R21(T)]1.

These are also symmetric matrices. Then, the gradient of P(α,T) is P(α,T) times the matrix function

2(14+α2){[Q2(α,T)S1(α,T)]TK1+[Q1(α,T)S2(α,T)]TK2}.

Now define

U1(T)=[P(α,T)]12[Q2(α,T)S1(α,T)]dα,
and
U2(T)=[P(α,T)]12[Q1(α,T)S2(α,T)]dα.

Then, setting the gradient of AUC (T) equal to zero is equivalent to

U1(T)TK1=U2(T)TK2.

We will assume that U1(T) is invertible and check this assumption in the proof of Proposition 4. Then we have the familiar equation

K21K1T=TU2(T)U11(T).

This means that

U2(T)U11(T)=R12(T).

ACKNOWLEDGMENTS

This material is based upon work supported by the National Science Foundation under CHE-1313892 and the National Institutes of Health under R01-EB000803 and P41-EB002035. The authors would like to thank Harrison Barrett and Luca Caucci for their valuable feedback. The comments by our anonymous reviewers improved this manuscript and we appreciate their work.

REFERENCES

1. D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A 4, 2379–2394 (1987). [CrossRef]  

2. H. Barrett and K. Myers, Foundations of Image Science (Wiley, 2013).

3. C. K. Abbey, H. H. Barrett, and M. P. Eckstein, “Practical issues and methodology in assessment of image quality using model observers,” Proc. SPIE 3032, 182–194 (1997).

4. M. Kupinski, E. Clarkson, and J. Hesterman, “Bias in Hotelling observer performance computed from finite data,” Proc. SPIE 6515, 65150S (2007).

5. B. D. Gallas, “Variance of the channelized-hotelling observer from a finite number of trainers and testers,” Proc. SPIE 5034, 100–111 (2003).

6. N. Nguyen, C. Abbey, and M. Insana, “Objective assessment of sonographic quality I: Task information,” IEEE Trans. Med. Imaging 32, 683–690 (2013). [CrossRef]  

7. C. Abbey, R. Zemp, J. Liu, K. Lindfors, and M. Insana, “Observer efficiency in discrimination tasks simulating malignant and benign breast lesions imaged with ultrasound,” IEEE Trans. Med. Imaging 25, 198–209 (2006). [CrossRef]  

8. N. Nguyen, C. Abbey, and M. Insana, “Objective assessment of sonographic quality II: Acquisition information spectrum,” IEEE Trans. Med. Imaging 32, 691–698 (2013). [CrossRef]  

9. S. Kullback and R. Leibler, “On information and sufficiency,” Ann. Math. Statistics 22, 79–86 (1951). [CrossRef]  

10. H. Jeffreys, “An invariant form for the prior probability in estimation problems,” Proc. R. Soc. London A 186, 453–461 (1946). [CrossRef]  

11. A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distributions,” Bulletin Cal. Math. Soc. 35, 99–109 (1943).

12. H. Barrett, C. Abbey, and E. Clarkson, “Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions,” J. Opt. Soc. Am. A 15, 1520–1535 (1998). [CrossRef]  

13. W. Peterson, T. Birdsall, and W. Fox, “The theory of signal detectability,” Trans. IRE Prof. Group Inf. Theory 4, 171–212 (1954). [CrossRef]  

14. C. E. Metz, “Receiver operating characteristic analysis: a tool for the quantitative evaluation of observer performance and imaging systems,” J. Am. Coll. Radiol. 3, 413–422 (2006). [CrossRef]  

15. E. Clarkson, “Asymptotic ideal observers and surrogate figures of merit for signal detection with list-mode data,” J. Opt. Soc. Am. A 29, 2204–2216 (2012). [CrossRef]  

16. K. J. Myers and H. H. Barrett, “Addition of a channel mechanism to the ideal-observer model,” J. Opt. Soc. Am. A 4, 2447–2457 (1987).

17. S. Park, M. A. Kupinski, E. Clarkson, and H. Barrett, “Efficient channels for the ideal observer,” Proc. SPIE 5372, 12–21 (2004).

18. S. Park, H. Barrett, E. Clarkson, M. Kupinski, and K. Myers, “Channelized-ideal observer using Laguerre-Gauss channels in detection tasks involving non-Gaussian distributed lumpy backgrounds and a Gaussian signal,” J. Opt. Soc. Am. A 24, B136–B150 (2007).

19. J. Witten, S. Park, and K. Myers, “Partial least squares: a method to estimate efficient channels for the ideal observers,” IEEE Trans. Med. Imaging 29, 1050–1058 (2010). [CrossRef]  

20. S. Park, J. Witten, and K. Myers, “Singular vectors of a linear imaging system as efficient channels for the Bayesian ideal observer,” IEEE Trans. Med. Imaging 28, 657–668 (2009). [CrossRef]  

21. P. Dollár, Z. Tu, P. Perona, and S. Belongie, “Integral channel features,” BMVC 2, 5 (2009).

22. H. Barrett, J. Yao, J. Rolland, and K. Myers, “Model observers for assessment of image quality,” Proc. Natl. Acad. Sci. USA 90, 9758–9765 (1993). [CrossRef]  

23. J. Brankov, Y. Yang, L. Wei, I. El-Naqa, and M. Wernick, “Learning a channelized observer for image quality assessment,” IEEE Trans. Med. Imaging 28, 991–999 (2009).

24. D. J. Field, “What is the goal of sensory coding?” Neural Computation 6, 559–601 (1994). [CrossRef]  

25. B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation 10, 1299–1319 (1998). [CrossRef]  

26. K. Fukunaga and W. L. Koontz, “Application of the Karhunen-Loeve expansion to feature selection and ordering,” IEEE Trans, Comput. C-19, 311–318 (1970). [CrossRef]  

27. X. Huo, “A statistical analysis of Fukunaga-Koontz transform,” IEEE Signal Process. Lett. 11, 123–126 (2004). [CrossRef]  

28. A. Mahalanobis, R. R. Muise, S. R. Stanfill, and A. Van Nevel, “Design and application of quadratic correlation filters for target detection,” IEEE Trans. Aerosp. Electron. Syst. 40, 837–850 (2004). [CrossRef]  

29. K. Fukunaga, Introduction to Statistical Pattern Recognition (Academic, 1990).

30. S. Zhang and T. Sim, “Discriminant subspace analysis: A Fukunaga-Koontz approach,” IEEE Trans Pattern Anal. Mach, Intell. 29, 1732–1745 (2007). [CrossRef]  

31. E. Clarkson and F. Shen, “Fisher information and surrogate figures of merit for the task-based assessment of image quality,” J. Opt. Soc. Am. A 27, 2313–2326 (2010). [CrossRef]  

32. F. De la Torre and T. Kanade, “Multimodal oriented discriminant analysis,” in Proceedings of the 22nd International Conference on Machine Learning (ACM, 2005), pp. 177–184.

33. T. Wimalajeewa, H. Chen, and P. K. Varshney, “Performance limits of compressive sensing-based signal classification,” IEEE Trans. Signal Process. 60, 2758–2770 (2012). [CrossRef]  

34. W. R. Carson, M. Chen, M. R. Rodrigues, R. Calderbank, and L. Carin, “Communications-inspired projection design with application to compressive sensing,” SIAM J. Imag. Sci. 5, 1185–1212 (2012). [CrossRef]  

35. E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52, 489–509 (2006). [CrossRef]  

36. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52, 1289–1306 (2006). [CrossRef]  

37. S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Signal Process. 56, 2346–2356 (2008). [CrossRef]  

38. M. A. Neifeld, A. Ashok, and P. K. Baheti, “Task-specific information for imaging system analysis,” J. Opt. Soc. Am. A 24, B25–B41 (2007). [CrossRef]  

39. A. Ashok, P. K. Baheti, and M. A. Neifeld, “Compressive imaging system design using task-specific information,” Appl. Opt. 47, 4457–4471 (2008). [CrossRef]  

40. M. A. Davenport, M. B. Wakin, and R. G. Baraniuk, “Detection and estimation with compressive measurements,” Technical Report TREE 0610 (Department of ECE, Rice University, 2006).

41. A. DasGupta, Asymptotic Theory of Statistics and Probability (Springer, 2008).

42. E. Clarkson and H. Barrett, “Approximations to ideal-observer performance on signal-detection tasks,” Appl. Opt. 39, 1783–1793 (2000). [CrossRef]  

43. D. Husemoller, Fibre Bundles (Springer, 1994).

44. A. Edelman, T. A. Arias, and S. T. Smith, “The geometry of algorithms with orthogonality constraints,” SIAM J. Matrix Anal. Appl. 20, 303–353 (1998). [CrossRef]  

45. S. M. Smirnakis, M. J. Berry, D. K. Warland, W. Bialek, and M. Meister, “Adaptation of retinal processing to image contrast and spatial scale,” Nature 386, 69–73 (1997). [CrossRef]  

Supplementary Material (2)

Media 1: MP4 (3090 KB)     
Media 2: MP4 (3076 KB)     

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1.
Fig. 1. Top image is a sample from the first class and the bottom image is a sample from the second class. The images look more similar in (a) since Δ σ is smaller compared to (b). Movies for (a) and (b) are generated by repeatedly sampling from the ensemble (Media 1 and Media 2). (a)  Δ σ = 0.2 pixels. (b)  Δ σ = 0:5 pixels.
Fig. 2.
Fig. 2. AUC comparison for (a) equal and (b) unequal mean images for the two classes. The AUC is calculated at various values of Δ σ , the correlation length difference between the covariance matrices of the two classes. The number of channels is constant at L = 25 and the compression ratio M / L = 100 . (a) Signal absent in the mean, (b) signal present in the mean.
Fig. 3.
Fig. 3. Eigenspectrum of K 2 1 K 1 for six values of Δ σ . The highest 25 values of κ + 1 / κ are marked in red (in the top left part of each curve) to denote the Eigen-CQO selections. The number of channels is constant at L = 25 . Δ σ = 0.1 pixels (a), 0.15 pixels (b), 0.2 pixels (c), 0.25 pixels (d), 0.30 pixels (e), 0.35 pixels (f).
Fig. 4.
Fig. 4. Observer comparison for unequal mean images and varying the magnitude of the mean difference. The number of channels and second-order statistics are held constant; L = 25 and Δ σ = 0.15 pixels.
Fig. 5.
Fig. 5. Observer comparison for (a) equal and (b) unequal mean images under both classes and varying the number of channels. The second-order statistics are constant; Δ σ = 0.15 pixels. (a) Signal absent in the mean, (b) signal present in the mean.
Fig. 6.
Fig. 6. J versus iteration number in Eq. (24) for (a) equal and (b) unequal means. Gradient term in Eq. (24) versus iteration number for (c) equal and (d) unequal means. The second-order statistics are constant; Δ σ = 0.25 pixels. Discrepancies for different runs (using independent random samples for the initial solution) of the iterative gradient-based search algorithm indicate the presence of local maxima. (a), (c) Signal absent in the mean. (b), (d) Signal present in the mean.
Fig. 7.
Fig. 7. Eigenspectrum of R 12 [see Eq. (19)] for (a) equal and (b) unequal means and Δ σ = 0.25 pixels and L = 25 . Discrepancies for different runs (using independent random samples as the initial solution) of the iterative gradient-based search algorithm indicate the presence of local maxima. When a signal in the mean is introduced an eigenvalue close to one appears in the eigenspectrum. The corresponding channel is the matrix-vector product of T and the corresponding eigenvector. This channel is shown in (c) to demonstrate that the signal in the mean affects the J-CQO channels. (a) Signal absent in the mean and (b) signal present in the mean. (c)  T .
Fig. 8.
Fig. 8. Top row displays the channels (i.e., rows of the matrix T ) for the J-CQO iterative algorithm for three values of the correlation length difference between the two classes with equal means. L = 25 and each row of the channel matrix is displayed as a single image in a 5 × 5 tiling. The bottom row shows the eigenvectors of R 12 [see Eq. (19)] back projected by T . (a) J-CQO T ( Δ σ = 0.2 pixels), (b) J-CQO T ( Δ σ = 0.5 pixels), (c) J-CQO T ( Δ σ = 1 pixel), (d) backprojection of R 12 eigenvectors by T ( Δ σ = 0.2 pixels), (e) backprojection of R 12 eigenvectors by T ( Δ σ = 0.5 pixels), (f) backprojection of R 12 eigenvector by T ( Δ σ = 1 pixel).

Equations (111)

Equations on this page are rendered with MathJax. Learn more.

g = H f + n .
v = Tg ,
f ( r ) = n = 0 N f n ψ n ( r ) .
g = Hf + n .
λ I ( g ) = ln [ p r 1 ( g ) ] ln [ p r 2 ( g ) ] ,
p r i ( v ) = [ ( 2 π ) L det ( TK i T ) ] 1 2 × exp [ 1 2 ( v T g ¯ i ) ( TK i T ) 1 ( v T g ¯ i ) ] ,
λ ( T , g ) = ln [ p r 1 ( Tg ) ] ln [ p r 2 ( Tg ) ] ,
Q i ( v ) = ( v T g ¯ i ) ( TK i T ) 1 ( v T g ¯ i ) .
λ CQO ( T , g ) = 1 2 [ Q 2 ( Tg ) Q 1 ( Tg ) ln det ( TK 1 T ) + ln det ( TK 2 T ) ] ,
f ( v ) i = R L f ( v ) p r i ( v ) d L v .
F 1 ( T ) = λ ( v ) 1 = D K L ( p r 1 p r 2 ) ,
F 2 ( T ) = λ ( v ) 2 = D K L ( p r 2 p r 1 ) ;
F 3 ( T ) = λ ( v ) 1 λ ( v ) 2 .
F 4 ( T ) = 4 ln exp [ 1 2 λ ( v ) ] 1 , = 4 ln exp [ 1 2 λ ( v ) ] 2 , = 4 D B ( p r 1 , p r 2 ) .
F 5 ( T ) = 1 1 4 π | exp [ ( 1 2 + i α ) λ ( v ) ] 2 | 2 d α α 2 + 1 4 .
F ( MT ) = F ( T )
λ H ( g ) = ( K 1 s ) g ,
C i ( T ) = TK i T .
R 12 ( T ) = C 2 1 ( T ) C 1 ( T ) ,
H 2 ( T ) = s T C 2 1 ( T ) Ts .
2 F 1 ( T ) = L + tr [ R 12 ( T ) ] + H 2 ( T ) ln det [ R 12 ( T ) ] .
d d t F 1 ( T + t E ) | t = 0 = tr [ E F 1 ( T ) ] ,
Δ 1 ( T ) = C 2 1 ( T ) T ( K 1 + ss ) [ I T C 2 1 ( T ) TK 2 ] C 2 1 ( T ) TK 2 + C 1 1 ( T ) TK 1 .
T ( n + 1 ) = T ( n ) + ϵ Δ 1 ( T ( n ) ) ,
λ CQO ( T ( n ˜ ) , g ) .
T ( n + 1 ) = T ( n ) [ I + ϵ ( T ( n ) ) Δ 1 ( T ( n ) ) ] .
λ CQO ( T eig , g ) .
[ R 12 ( T ) + I ] ( TK 2 T ) 1 TK 2 = [ R 12 ( T ) + I ] ( TK 1 T ) 1 TK 1 .
C 2 1 2 ( T ) R 12 ( T ) C 2 1 2 ( T ) = C 2 1 2 ( T ) C 1 ( T ) C 2 1 2 ( T ) ,
K 2 1 K 1 T = T R 12 ( T ) .
K 2 1 K 1 T M = T M [ ( M ) 1 R 12 ( T ) M ] ,
K 2 1 K 1 T M = T M Λ ( T ) ,
F ( T ) = F ( MT ) = l ( κ l ln κ l ) ,
2 F 2 ( T ) = L + tr [ R 21 ( T ) ] + H 1 ( T ) ln det [ R 21 ( T ) ] .
Δ 2 ( T ) = C 1 1 ( T ) T ( K 2 + ss ) [ I T C 1 1 ( T ) TK 1 ] C 1 1 ( T ) TK 1 + C 2 1 ( T ) TK 2 .
2 F 3 ( T ) = 2 L + tr [ R 12 ( T ) ] + H 2 ( T ) + tr [ R 21 ( T ) ] + H 1 ( T ) .
Δ 3 ( T ) = C 1 1 ( T ) T ( K 2 + ss ) [ I T C 1 1 ( T ) TK 1 ] + C 2 1 ( T ) T ( K 1 + ss ) [ I T C 2 1 ( T ) TK 2 ] .
[ R 12 ( T ) R 21 ( T ) ] ( TK 1 T ) 1 TK 1 = [ R 12 ( T ) R 21 ( T ) ] ( TK 2 T ) 1 TK 2 .
K 2 1 K 1 T = T R 12 ( T ) .
F 3 ( T ) = 1 2 l ( κ l + 1 κ l ) L .
t l K 2 t l = δ l l , t l K 1 t l = κ l δ l l .
I ( T ) = R L [ p r 1 ( v ) p r 2 ( v ) ] 1 2 d L v .
4 H ( T ) = 2 c ( T ) C 1 ( T ) c ( T ) g ¯ 1 T C 1 1 ( T ) T g ¯ 1 g ¯ 2 T C 2 1 ( T ) T g ¯ 2 .
[ v c ( T ) ] C 1 ( T ) [ v c ( T ) ] H ( T ) = 1 2 ( v T g ¯ 1 ) C 1 1 ( T ) ( v T g ¯ 1 ) + 1 2 ( v T g ¯ 2 ) C 2 1 ( T ) ( v T g ¯ 2 ) .
F 4 ( T ) = 2 ln det C ( T ) + ln det C 1 ( T ) + ln det C 2 ( T ) + H ( T ) .
F 5 ( T ) = 1 2 + 1 2 erf [ 1 2 d A ( T ) ] ,
F 5 ( T ) = 1 1 4 π | exp [ ( 1 2 + i α ) λ ( v ) ] 2 | 2 d α α 2 + 1 4 .
Λ 1 2 + i α ( v ) 2 = N ( α , T ) I ( α , T ) .
N ( α , T ) = [ 1 ( 2 π ) L det C 1 ( T ) ] 1 2 + i α × [ 1 ( 2 π ) L det C 2 ( T ) ] 1 2 i α .
| N ( α , T ) | 2 = 1 ( 2 π ) L [ 1 det C 1 ( T ) ] [ 1 det C 1 ( T ) ]
R L exp { 1 4 v [ C 1 1 ( T ) + C 2 1 ( T ) ] v i α 2 v [ C 1 1 ( T ) C 2 1 ( T ) ] v } d v .
| N ( α , MT ) | 2 = 1 ( 2 π ) L l = 1 L μ l .
| N ( α , MT ) | 2 = 1 ( 2 π ) L det R 12 ( T ) .
I ( α , MT ) = R L exp [ 1 4 l = 1 L ( 1 μ l + 1 ) u l 2 i α 2 l = 1 L ( 1 μ l 1 ) u l 2 ] d L u .
I ( α , MT ) = π L 2 l = 1 L [ 1 4 ( 1 μ l + 1 ) + i α 2 ( 1 μ l 1 ) ] 1 2 .
I ( α , MT ) = ( 2 π ) L 2 det [ ( 1 2 + i α ) R 21 ( T ) + ( 1 2 i α ) I ] ,
| Λ 1 2 + i α ( v ) 2 | 2 = [ P ( α , T ) ] 1 2
[ ( 1 2 + i α ) R 12 ( T ) + ( 1 2 i α ) I ] × [ ( 1 2 + i α ) R 21 ( T ) + ( 1 2 i α ) I ] .
P ( α , T ) = ( 1 4 + α 2 ) [ R 12 ( T ) + R 21 ( T ) ] + 2 ( 1 4 α 2 ) I .
F 5 ( T ) = 1 1 4 π [ P ( α , T ) ] 1 2 d α α 2 + 1 4 ,
F 5 ( T ) = 1 1 4 π l = 1 L [ ( 1 4 + α 2 ) ( μ l + 1 μ l ) + 2 ( 1 4 α 2 ) ] 1 2 d α α 2 + 1 4 .
K 2 1 K 1 T = T R 12 ( T ) .
l = 1 L [ ( 1 4 + α 2 ) ( κ l + 1 κ l ) + 2 ( 1 4 α 2 ) ] 1 2 d α α 2 + 1 4 .
l = 1 L [ 1 4 ( κ l + 1 ) 2 κ l + α 2 ( κ l 1 ) 2 κ l ] 1 2 d α α 2 + 1 4 .
y l = 4 κ l ( κ l + 1 ) 2 ,
l = 1 L y l 1 + 4 α 2 ( 1 y l ) d α α 2 + 1 4 .
f ( y ) = y 1 + 4 α 2 ( 1 y )
1 y l = 1 4 ( κ l + 1 κ l + 2 ) .
d 2 d t 2 F 3 ( T + t E ) | t = 0 = tr [ E D ( T ) E ]
C j ( 0 ) = TK j T ,
C j ( 0 ) = EK j T + TK j E ,
C j ( 0 ) = 2 EK j E ,
R i j ( 0 ) = C j 1 ( 0 ) [ C i ( 0 ) C j ( 0 ) R i j ( 0 ) ] ,
R i j ( 0 ) = C j 1 ( 0 ) [ C i ( 0 ) C j ( 0 ) R i j ( 0 ) 2 C j ( 0 ) R i j ( 0 ) ] ,
F 3 ( 0 ) = tr [ R 12 ( 0 ) ] + tr [ R 21 ( 0 ) ] .
C 1 ( 0 ) = B Λ + Λ B
C 2 ( 0 ) = B + B
C 1 ( 0 ) = 2 B Λ B + 2 B 0 Λ 0 B 0
C 2 ( 0 ) = 2 BB + 2 B 0 B 0 .
tr [ E D ( T ) E ] = 2 tr [ ( Λ 1 Λ ) ( B 0 B 0 Λ 1 B 0 Λ 0 B 0 ) ] .
tr [ E D ( T ) E ] = 2 tr [ ( B 0 T 0 ) ( Λ 1 Λ ) ( B 0 Λ 1 B 0 Λ 0 ) T 0 K 2 ] .
D ( T ) B 0 T 0 = ( Λ 1 Λ ) ( B 0 Λ 1 B 0 Λ 0 ) T 0 K 2 .
F 3 ( T ) = tr [ ( T K 2 T ) 1 ( T K 1 T ) ] + tr [ ( T K 1 T ) 1 ( T K 2 T ) ] = F 3 ( T ) ,
D ( T ) B 0 p q T 0 = ( 1 κ p κ p ) ( 1 κ 0 q κ p ) B 0 p q T 0 .
[ K i ] n , m = ( b ¯ SNR ) 2 × exp ( ( ( n x m x ) mod 50 ) 2 + ( ( n y m y ) mod 50 ) 2 2 σ i 2 )
g ¯ 2 = g ¯ 1 + A s s
g i k = g ¯ i + K i 1 / 2 w
Q 1 ( v ) 1 = tr C 1 1 ( T ) ( v T g ¯ 1 ) ( v T g ¯ 1 ) 1 = L .
Q 2 ( v ) 1 = tr { C 2 1 ( T ) [ C 1 ( T ) + Tss T ] } .
2 λ ( v ) 1 = L + tr [ C 2 1 ( T ) C 1 ( T ) ] + s T C 2 1 ( T ) Ts ln det [ C 2 1 ( T ) C 1 ( T ) ] .
2 F 1 ( T ) = L + tr [ R 12 ( T ) ] + H 2 ( T ) ln det [ R 12 ( T ) ] .
D 1 ( T ) = d d t tr [ R 12 ( T + t E ) ] | t = 0 .
2 tr { E [ R 12 ( T ) C 2 1 ( T ) TK 2 + C 2 1 ( T ) TK 1 ] } .
D 2 ( T ) = d d t ln det [ R 12 ( T + t E ) ] | t = 0 .
d d t ln det [ R 12 ( T + t E ) ] = tr [ R 12 1 ( T + t E ) d dt R 12 ( T + t E ) ] .
2 tr { E [ C 2 1 ( T ) TK 2 + C 1 1 ( T ) TK 1 ] } .
D 3 ( T ) = d d t H 2 ( T + t E ) | t = 0 ,
2 tr { E [ C 2 1 ( T ) Tss T C 2 1 ( T ) TK 2 + C 2 1 ( T ) Tss ] } .
C 2 1 ( T ) T ( K 1 + ss ) [ I T C 2 1 ( T ) TK 2 ] C 2 1 ( T ) TK 2 + C 1 1 ( T ) TK 1 .
d d t R 12 ( T + t E ) | t = 0 = C 2 1 ( T ) [ ( EK 2 T + TK 2 E ) R 12 ( T ) + ( EK 1 T + TK 1 E ) ] ,
d d t R 21 ( T + t E ) | t = 0 = C 1 1 ( T ) [ ( EK 1 T + TK 1 E ) R 21 ( T ) + ( EK 2 T + TK 2 E ) ] .
Q 1 ( α , T ) = P 1 ( α , T ) C 1 1 ( T ) = [ C 1 ( T ) P ( α , T ) ] 1
Q 2 ( α , T ) = P 1 ( α , T ) C 2 1 ( T ) = [ C 2 ( T ) P ( α , T ) ] 1 .
S 1 ( α , T ) = R 21 ( T ) Q 1 ( α , T ) = [ C 1 ( T ) P ( α , T ) R 12 ( T ) ] 1
S 2 ( α , T ) = R 12 ( T ) Q 2 ( α , T ) = [ C 2 ( T ) P ( α , T ) R 21 ( T ) ] 1 .
2 ( 1 4 + α 2 ) { [ Q 2 ( α , T ) S 1 ( α , T ) ] TK 1 + [ Q 1 ( α , T ) S 2 ( α , T ) ] TK 2 } .
U 1 ( T ) = [ P ( α , T ) ] 1 2 [ Q 2 ( α , T ) S 1 ( α , T ) ] d α ,
U 2 ( T ) = [ P ( α , T ) ] 1 2 [ Q 1 ( α , T ) S 2 ( α , T ) ] d α .
U 1 ( T ) TK 1 = U 2 ( T ) TK 2 .
K 2 1 K 1 T = T U 2 ( T ) U 1 1 ( T ) .
U 2 ( T ) U 1 1 ( T ) = R 12 ( T ) .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.