Convolutional neural networks for whole slide image superresolution

Lopamudra Mukherjee; Adib Keikhosravi; Dat Bui; Kevin W. Eliceiri

doi:10.1364/BOE.9.005368

1. Introduction

Whole slide imaging (WSI) or virtual microscopy is a type of imaging modality which is used to convert animal or human pathology tissue slides to digital images for teaching, research or clinical applications. This method is popular due to education and clinical demands [1–3]. Although modern whole slide scanners can now scan tissue slides with high resolution in a relatively short period of time, significant challenges, including high cost of equipment and data storage, still remain unsolved [4]. However, WSI can have numerous advantages for pathologists. The ability to send and store slides digitally leads to convenient access, regardless of location of the pathologist, which in turn results in faster ways to get second opinions, digital conferences, and decentralized primary diagnostic reviews. Digital storage also allows integration of digital slides into the patient’s electronic profile as well as easy access to archived slides [5,6]. Despite these advantages, data storage and communication remain major drawbacks in high resolution digital pathology [4], in addition to the prohibitive cost of high resolution scanners. One potential way to address these issues is to use LR images from low magnification slide scanners. Such devices are widely available, easy to use, relatively cheap and can also quickly produce images with smaller storage requirements. However, LR images can increase the chance of misdiagnosis and false treatment if used as the primary source by a pathologist. For example, cancer grading normally requires identifying tumor cells based on size and morphology assessments [7], which can be easily distorted in low magnification images. Addressing these concerns requires a way to improve the resolution of the images on-the-fly, without substantial increase in storage and computational requirements.

The general goal of the extraction of high resolution features from low resolution data is known in computer vision research as super-resolution (SR). SR is a widely researched framework which aims at constructing a high resolution (HR) image given only a (or a set of) low resolution (LR) image(s) as input. It is often applicable in scenarios where such HR images are otherwise unavailable but may be needed for downstream processing. However, solving the super-resolution problem is challenging in practice. This is because of the ill-posed nature of the problem, given that there is generally no unique solution for a given LR image: a large number of different HR images, when downsampled can give rise to the same LR image. This issue is especially evident at higher magnification ratios. While there is no one solution that works for all SR problem domains, this issue is typically mitigated by constraining the solution space by strong domain specific a priori information. The SR problem occurs in a number of different scenarios, such as image enhancement, analyzing range images, face recognition, as well as medical/biological applications [8–12]. One such area in which super-resolution problem is naturally applicable is microscopic imaging, where insights into biological functions depend on the ability of observing the cellular dynamics, but is sometimes limited by the temporal resolution of acquisition devices. Note that most existing techniques for image SR are designed for natural image based applications, where image are acquired using digital cameras. These methods make use of image features, such as transformed exemplars [13], textures [14] and other high-level features. However, it is often difficult to obtain such high-level cues from low resolution whole slides images, making it hard to use available off-the-shelf solvers directly to improve their resolution. Next we describe the main focus of the paper, which is to show how the SR problem can be adapted to can address this important challenge in this context of whole slide imaging.

Our Contribution: Here, we investigate whether SR techniques can be used to generate high resolution images using only low resolution WSI images as an input. It is desired that these generated high resolution images should match well with an image acquired using a high-quality, expensive scanner in terms of quality and also be potentially useful for diagnostic purposes. To this end, we develop a convolutional neural network framework (CNN) with the following functionality. After the model has been trained, it enables generating at test time, a high-resolution image corresponding to a low resolution image which has been provided as an input. CNNs have been used in the past for the SR problem [15,16], since they can learn complex transformations across the image domains. But we find that such methods do not work directly for our application, and often result in an overtly smooth output image. This problem is explained by a recent paper [17], which show that in such CNN frameworks, the single output tend to look like a “smoothed” average of all the potential images that could be generated. Such smoothed images can lead to poor quality of segmentation, which is needed for identifying the different tissue types for further diagnosis. SI images as input, which can match an images acquired using high-quality, expensive scanners in terms of quality and use for diagnostic purposes. To this end, we develop a convolutional neural network framework (CNN), which when trained, outputs a reconstructed high-resolution image, given only a low resolution image as input. CNNs have been used in the past for the SR problem [15,16], since they can learn complex transformations across the image domains. But we find that such methods do not work directly for our application.

In this paper, we propose a CNN-KNN framework for the super-resolution, that is specifically designed for the slide imaging and address the issues mentioned above in the following ways: 1) We design a customized CNN framework, whose architecture is designed specifically to optimize the outputs for such microscopic images, including smaller filter sizes and increasing the number of filters. For datasets like ours with smaller sample sizes, it has been shown that filters with fewer weights often works well. The Scattering transform idea by Mallat’s group takes this premise to the extreme [18]. The rationale of using smaller filter sizes was motivated and informed by this body of work. In the same vein, the reasoning behind increasing the number of filters, is to account for the complexity of transformation. It is well known that filters in the neural net is used to learn features are most relevant for the purposes of the learning problem. Increasing the number of filters allows the network to learn a wider range of features, hence it can model more complex changes in input-output characteristic of each layer. Among a large set of experiments we conducted, the proposed architecture seemed to yield the most consistent, reliable and reproducible results. We believe that it was important to choose a setting that is robust across characteristics that are often different across different labs such as sample sizes and this serves as an important design consideration in the architecture of the proposed network. 2) In training the model, we incorporate other forms of optimization objectives beyond mean square error (MSE), which is most commonly used to measure similarity between HR ground truth image and its reconstruction from the LR observation. In particular, we include metrics that capture human perception of image quality and can lead to quality reconstructions. 3) In addition, we enhance the output from the CNN, by capturing fine-grained details though a nearest-neighbor search over a dictionary. Such an approach is useful in restoring the image specific high-frequency components and results in a much higher quality of reconstruction. Results of two different cell-lines show our method outperforms other approaches for SR reconstructions, both qualitatively and quantitatively, generating images which match the HR images in quality and can be used for subsequent end-user applications. We describe our architecture of CNN and KNN in Section 3. But first, we briefly review the prior work related to image super-resolution in the next section.

2. Related work

Here we review the existing literature on single image super resolution techniques. Unsupervised interpolation-based methods were among the first approaches for this problem which includes linear, bicubic or Lanczos filtering [19]. These can be very fast, but usually yield solutions which are blurry and corrupted with aliasing artifacts for natural images. Edge based interpolation is another popular method for this problem [20, 21]. However, the most efficient approaches for SR problem are learning-based where a correspondence/mapping function is learnt between LR and HR images. One family of approaches under this umbrella are sparsity-based techniques. Sparse coding is an effective mechanism that assumes any natural image can be sparsely represented as a combination of dictionary elements, which can be learnt through a training process [22,23]. Other important works in this regard include Glasner [11] who exploited patch redundancies across scales within the image to drive the reconstruction, Huang [13] who extended self dictionaries to further allow for small transformations and shape variations and [24] who proposed a convolutional sparse coding approach that improves consistency by processing the whole image rather than overlapping patches. In addition, Zhang [25] proposed a multi-scale dictionary to capture redundancies of similar image patches at different scales. Another line of algorithms is neighborhood based embedding approaches which up sample a LR image patch by finding similar training patches in a low dimensional manifold and combining their corresponding high-resolution patches for reconstruction [26]. This was later improved by [12] who formulated a more general map of example pairs using kernel ridge regression. The regression problem can also be solved various different ways as well [27].

Image representations derived from deep networks have recently also shown promise for SR problems. Stacked collaborative local auto-encoders are used [28] to construct the LR image layer by layer. [29] suggested a method for SR based on an extension of the predictive convolutional sparse coding framework. A multiple layer convolutional neural network (CNN), similar to our model, inspired by sparse-coding methods is proposed in [15,16,30]. Chen [31] proposed to use multi-stage trainable nonlinear reaction diffusion (TNRD) as an alternative to CNN where the weights and the nonlinearity is trainable. Wang [8] trained a cascaded sparse coding network from end to end inspired by LISTA (Learning iterative shrinkage and thresholding algorithm) [32] to fully exploit the natural sparsity of images. Recently, [14] proposed a method for automated texture synthesis in reconstructed images by using a perceptual loss focusing on creating realistic textures. Several recent ideas have involved reducing the training complexity of the learning models using approaches such as Laplacian Pyramids [33], removing unnecessary components of CNN [34] and addressing the mutual dependencies of low and high resolution images using Deep Back-Projection Networks [35]. In addition, Generative adversarial networks (GAN) have also been used for the problem of single image super-resolution, these include [36–39]. Other deep network based models for image super-resolution problem includes [40–43].

Fig. 1 Architecture of our proposed convolutional neural network for image superresolution.

Download Full Size | PDF

3. Methods

Here, we discuss our main model for obtaining high-resolution images. But first, we briefly outline the problem setting. Let H and L denote the high and low resolution image sets respectively. For training/learning, we assume that the corresponding high resolution image H_i for each low resolution image L_i is available. We extract patches from the low resolution image L_i and represent each patch as a high-dimensional vector (where the vector for jth patch is reffered to as l_ji). The goal of the training process is to learn a non-linear function f, when applied to l_ji, transforms it into a high-resolution reconstructed patch r_ji. That is r_ji = f(l_ji). Then, we aggregate all such patches to form the reconstructed high resolution image R. The objective driving the training process often minimizes some metric of difference between R and H. Most CNN [15,16,30] based models employ a number of convolutional layers to learn the complex mapping between the imaging domains. But there are some salient properties of our data which makes it hard to apply these existing approaches directly to our problem. First, these models are trained by upscaling the LR input image to the size of HR image, before it is passed through the CNN layers, which often leads to higher computational times to train the model. Secondly, these methods are not designed to handle cases where the LR image may be from a different modality, such as in our case, where the complexity of the transformation is much greater. Also, most CNN based methods use mean square error(MSE) as the metric to evaluate the similarity of H and R, that is $MSE (H, R) = \sum_{i} {‖ H_{i} - R_{i} ‖}_{2}^{2}$ , where R_i is the reconstuction of the ith image. Such a loss function is easy to minimize, but it correlates poorly with human perception of image quality and as a result, the resultant images are sometimes blurry and/or lacking the high-frequency components of the original images. We address this issue by adding image saliency based terms in the objective. In addition, a nearest neighbor based procedure is employed to restore the image specific details which may have been lost in the convolutional process. We describe the network next.

3.1. Convolutional neural network design

In this section, we describe the internal architecture of our convolution neural network, see Figure 1.

Feature extraction layer: The first step in the convolution process is to extract features from the low resolution input images. Note that for most feature extraction methods such as Haar, DCT etc, the key problem can be posed as the task of learning a function f̂, which takes as input the low resolution images and outputs the learned features f̂(L_i). Therefore the feature extraction process can be learned as a layer of the convolutional neural network, which constitutes the first layer of our network. This can be expressed as

Y_{1} = σ (θ_{1} \times L + b_{1})

where L is the entire corpus of low resolution images and θ₁ and b₁ represent the weights and biases of the first layer. The weights are composed of n₁ = 64 convolutions on each image patch, with each convolution filter being of size 2 × 2. Therefore this layer has 64 filters, each of size 2 × 2. The bias vector is of size b_i ∈ R^n₁. We keep filter sizes small at this level, so as it extract more fine grained features from each patch. The σ(x) function implements a ReLU function, which can be written as σ(x) = max(0, x).

Feature mapping layer The second layer is similar to the previous layer except the filter sizes are set to 1 × 1. The number of filters are still set to 64. The purpose of this layer is to obtain a weighted sum pool of features across various feature-maps of the previous layer. The output of this layer is referred to as Y₂.

Intermediate convolutional layers: The feature extraction layer is followed by three convolutional layers. In this setting, we assume that for the ith layer (i ∈ {3, 4, 5}), the previous layer output is given Y_i−1, which is then served as input to the ith layer. The convolutional filter functions in these intermediate layers can be written as follows:

Y_{i} = σ (θ_{i} \times Y_{i - 1} + b_{i}) i \in 3, 4, 5

where θ_i and b_i represent the weights and biases of the ith layer. Each of the weights θ_i is composed of n_i filters of size n_i−1 × f_i × f_i. We set n_i = 2⁸⁻ⁱ. This makes n₃ = 32 and the number of filter decreases by a factor of 2, with each subsequent layer. We observe this has computational advantages, without noticeable decay in reconstruction performance. The filter sizes f_i are set to {3, 2, 1} for each of the three layers respectively. This is akin to first applying the non-linear mapping to 3 × 3 patch of the feature map and then progressively reducing the size to 1. This structure is inspired by hierarchal CNN models, as described in [44].

Subpixel layer: The purpose of the final (6th) layer is to increase the resolution of the LR image to convert it to a HR image from the learnt LR feature maps. For this, we use a subpixel layer similar to the one proposed in [45]. The advantage of using such sub-pixel layer is that other previous layers operate on the reduced LR image, which reduce the computational and memory complexity substantially.

The upscaling of the LR image to the size of the HR image is implemented as a convolution with a filter θ_sub whose stride is $\frac{1}{r}$ (r is the resolution ratio between the HR and LR images). Let the size of the filter θ_sub be f_sub. A convolution with stride of $\frac{1}{r}$ in the LR space with a filter θ_sub (weight spacing $\frac{1}{r}$ ) would activate different parts of θ_sub for the convolution. The weights that fall between the pixels will not be activated. The patterns are activated at periodic intervals of mod(x, r) and mod(y, r) where x, y are the pixel position in HR space. Alternatively, this can be implemented as a filter θ₆, whose size is n₅ × r² × f₆ × f₆, given that $f_{6} = \frac{f_{sub}}{r}$ and mod(f_sub, r) = 0. This can be written as

Y_{6} = γ (θ_{6} \times Y_{5} + b_{6})

where γ is periodic shuffling operator which rearranges r² channels of the output to the size of the HR image (See [46] for the detailed reasoning).

3.2. Training and loss function

The objective function, based on which the CNN is trained, is crucial in determining the quality of the high resolution reconstructions. Most SR systems minimize the pixel-wise mean squared error (MSE) between the HR and the reconstructed image, which while easy to optimize, often correlates poorly with human perception of image quality. This is because MSE estimator returns the average of a number of possible solutions, which does not perform well for high-dimensional data [14]. The paper by [47] shows that two very different reconstructions of the same image, can have the same MSE error and reconstructions based on MSE alone has been shown to be blurry and/or lack high frequency components of the original image [14,48].

To address this issue, we train our CNN using linear combination function of Multi-scale structured similarity (MSSIM) in addition to mean square error between the reconstructed image (R) and the high resolution image (H). We briefly describe this objective next. In particular we choose the MSSIM, since it better calibrated to capture perceptual metrics of image quality. Also, its pixel-wise gradient has a simple analytical form and is inexpensive to compute and therefore can be easily incorporated in gradient descent based back-propagation. MSSIM is the multi-scale extension of structured similarity (SIM), which is defined based on the following parameters. Let x and y be two patches of equal size from the two images H and R being compared. Assume μ_x (μ_y) denote the mean, $σ_{x}^{2} (σ_{y}^{2})$ denote the variance of the patch x(y) respectively, and σ_xy denote their covariance. Therefore, the SIM function can be defined as:

SIM (x, y) = I {(x, y)}^{α} C {(x, y)}^{β} S {(x . y)}^{γ}

were

I (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1})}

is the luminance based comparison,

C (x, y) = \frac{(2 σ_{x} σ_{y} + c_{2})}{(σ_{x}^{2} + σ_{y}^{2} + c_{2})}

is a measure of contrast difference and

S (x, y) = \frac{σ_{x y} + c_{3}}{σ_{x} σ_{y} + c_{3}}

is the measure of structural differences between the two images. c_i for i = {1, 2, 3} are small values added for numerical stability and the α, β and γ are the relative exponent weights in the combination. The structured similarity between the images H and R is averaged over all corresponding patches x and y. This single-scale measure assumes a fixed image sampling density and viewing distance, and may only be appropriate for certain range of image scales. To make it more broadly applicable, a variant of SIM, called the multi-scale structured similarity (MSSIM) has been proposed. Here, the input x and y are iteratively downsampled by a factor of 2 with a low-pass filter, (with scale 1 denoting the original scale). The contrast and structural components of SIM are calcuated at all scales (denoted by C_p and S_p for scale p). The luminance component is applied only at the highest scale(say P). The multi-scale structured similarity function can be written as

MSSIM (x, y) = I_{P} {(x, y)}^{α} \prod_{p = 1}^{P} C_{p} {(x, y)}^{β_{p}} S_{p} {(x . y)}^{γ_{p}}

In our case, all the weights in the exponents are kept the same. We compute the MSSIM using 4 different scales, and use window sizes of 4 × 4 to calculate the metrics across both images.

Our loss function can be written as follows:

L (H, R) = ρ MSE (H, R) + (ρ - 1) MSSIM (H, R)

where ρ is between 0 and 1. Since both terms in the objective is differentiable, we can train the neural network using gradient descent, adopting standard back propagation methods.

3.3. Nearest neighbor enhancement

Our goal here is to enhance the output of CNN, by producing a high-quality image which retains the finer details of the original HR image. As mentioned earlier, it has been observed by [17] and others, that deep learning frameworks including CNNs suffer from the problem of having difficulty in interpreting the results, therefore is hard to know precisely how the synthesized output was generated. Furthermore CNNs in particular, produce outputs which look like a ”smoothed” average of all the potential HR images. This issue manifests in the quality of images produced by any CNN method. One way to avoid this issue is to introduce a post processing framework that retrieves some of the image specific details, which may have been lost in the convolution process. To do this, we introduce a K-Nearest Neighbor based image enhancement framework similar to the concept proposed in [17]. We describe this idea next.

To do this, we adopt a dictionary based approach for KNN learning. We use a small dataset (about 50) of images for training. We assume that the corresponding high resolution image H_i for each low resolution image L_i is available in the training data. We scale each L_i to the size of corresponding H_i and extract patches from both types of images. For simplicity, we refer to l_ji as the jth patch (represented as a vector) of L_i (similar notation is used for h_ji). We extract first and second order moments for each low-resolution patch, which is appended to l_ji. This creates a collection of corresponding patches for both high and low resolution images, which we call dictionaries D_h and D_l (whose columns are h_ji and l_ji respectively) for high and low resolution patches. We also maintain the mean and variance for each HR patch in h_ji.

We use a simple bi-level K-NN approach which works by matching a query to the training set and returning corresponding outputs in two nested levels. Given a new test image l, let the corresponding output of the CNN be r. We scale up l to the size of r and extract patches from both (similar to the way dictionaries are created). For a patch l_j ∈ l, we look up k₁ closest patches in the low-resolution dictionary D_l. Once the closest matches in the low resolution dictionary has been determined, we create a sub-dictionary by choosing only those corresponding patches from D_h, for which matches have been found in D_l. Next we do another KNN search, this time using r_j as query and search the sub dictionary for k₂ < k₁ nearest neighbors. The high resolution patches found in this search are averaged to create a template patch $h_{j}^{t}$ . r_j is then enhanced so that its mean and variance match the template patch ( $h_{j}^{t}$ ), to generate a new patch r̂_j. All such patches r̂_j are aggregated to produce the reconstructed image r̂.

We use patch size of size 3 × 3 for the nearest neighbor search. The implementation is done in Matlab using CUDA’s fast KNN libraries and parallel programming toolbox, which makes this bilevel KNN implementation efficient in practice. An example of the CNN+KNN output compared to the CNN output is shown in Figure 2. The images obtained from the CNN+KNN pipeline are sharper in quality and captures the minute details such as tissue characteristics better than simply the CNN output. We also find the resultant images have smaller reconstruction error, with the average PSNR increasing by 2 units.

Fig. 2 Results of effect of KNN applied to the CNN outputs. Row 1 shows the original images, Row 2 shows a small region of the corresponding images in Row 1, zoomed in.

Download Full Size | PDF

4. Experiments

We performed experiments to evaluate our SR approach on two large tissue microarray (TMA) datasets, a Breast TMA dataset consisting of 202 images [49], and a Kidney TMA dataset with 129 images [50]. TMAs are a popular histopathology format as they can provide many separate patient tissue cores in array fashion in order to allow multiplex histological analysis. For each dataset, we train the dictionary on a subset of the HR images and then use it to reconstruct the learned HR image from the LR images. We compare our method with 6 other approaches: bicubic interpolation, which is a standard baseline, the patch based sparse coding approach (ScR) in [23, 51, 52], the deep learning approach (CSCN) [8, 53], the convolutional neural network based framework (FSRCNN) [16], a CNN model which uses a subpixel layer (ESCNN), a sparse coding based Dictionary learning method implemented using deep learning (SCDL) [54] and a GAN based implementation of SR (SRGAN) [55,56]. All methods were trained with the same training batch of images. We evaluate the following aspects in our experiments: 1) how well the obtained reconstruction matches the high resolution image 2) how the resultant segmentation quality is affected when using the reconstructed images 3) how the model parameters affect the quality of reconstruction and finally 4) analysis of the running time of our model. But before that, we briefly discuss the acquisition setup.

4.1. Materials and methods

Human tissue microarrays

A human renal cell carcinoma tissue microarray (TMA) block was constructed by the Translational Research Initiatives in Pathology (TRIP) lab at the University of Wisconsin-Madison (UW-Madison). A section of 5um thickness was cut from the TMA block containing 600um diameter tissue TMA cores. The section was then placed on a glass slide, stained with standard hematoxylin and eosin (H&E), and mounted under a 1.5 glass coverslip. Different tissue cores were from different patients. This TMA slide was originally prepared for another study [50]. A tissue microarray, which contains tumor tissue cores from 207 breast cancer patients, was used for the analysis. Five samples were excluded because of staining issues or sample folding on the slide. So the total of 202 images were selected for this study. This TMA has been previously made in our lab and used by Conklin, and full detail can be found at [49].

Imaging systems

High resolution images were acquired and digitalized at 20× using an Aperio CS2 Digital Pathology Scanner (Leica Biosystems) [57], with 4 pixels per micron, and low resolution images were acquired and digitized using PathScan Enabler IV [58] with .29 pixels per micron.

4.2. Reconstruction quality

Comparison with state of art

We evaluate the reconstruction quality of the obtained images by our approach by evaluating it relative to HR ground truth image and calculating eight different metrics: 1) root mean square error (RMSE), 2) signal to noise ratio (SNR), 3) structured similarity (SSIM) and 4) Mutual Information (MI) 5) Multiscale Structured Similarity (MSSIM) 6) Information Fidelity Criteria(IFC) [59] 7) Noise quality measure(NQM) [60] and 8) Weighted peak signal-to-noise ratio (WSNR) [61]. RMSE should be as low as possible, whereas SNR, SSIM (1 being the maximum), MSSIM (1 being the maximum) and the remaining metrics, should be high for good reconstruction. We use the same metrics of evaluation for the other 6 methods. Also note that SNR and RMSE are correlated measures. For these experiments, we set the resolution difference to a factor of 2. The results are shown in Table 1 for breast images and Table 2 for kidney images. We can see that our methods outperforms the other algorithms on most of the metrics used. This is especially true for the SIM and MSSIM measures, which are known to have high correlation with human perceptual scores. Qualitative results of reconstruction of our method is shown in Figure 3 for breast cells and Figure 4 for kidney cells. Results of reconstructions by other methods is shown in Figure 5. These results show that comparable methods such as SCDL, ESCNN and SRGAN, where the reconstruction looks most similar (in Rows 2 and 4) to the high resolution image have a higher values for MSSIM, compared to SNR. Note that our MSSIM values are the best among all the all the other comparable method on both datasets.

Table 1. Quantitative results from reconstructed Breast images.

View Table | View all tables in this article

Table 2. Quantitative results from reconstructed Kidney images.

View Table | View all tables in this article

Fig. 3 Results of reconstruction: Columns 1 and 3 show high and low resolution images and Column 2 shows the reconstructed image. Row 3 shows a small ROI of the breast images from Row 2 at full view.

Download Full Size | PDF

Fig. 4 Results of reconstruction: Columns 1 and 3 show high and low resolution images and Column 2 shows the reconstructed image. Row 3 shows a small ROI of the kidney images from Row 2 at full view.

Download Full Size | PDF

Fig. 5 Results of reconstruction of a breast image (row 1 and row 2) and a kidney image (row 3 and row 4) from other methods: Row 1 and 3 shows ScR, CSCN and FSRCNN, Row 2 and 4 shows ESCNN, SCDL and SRGAN, our results on this image in shown in Figure 3 and Figure 4.

Download Full Size | PDF

Reconstruction as a function of frequency

The above metrics computes the quality of reconstruction as a single scalar value. However, it has been observed that substantial amount of information may be lost whenever the characteristics of a high-dimensional image are summarized by a single scalar value. In order to see the performances of the reconstruction algorithms wrt to spatial frequencies, we use the ESP algorithm proposed by [47] which outputs the Fourier radial Error Spectrum Plot, which provides us a glimpse of how the reconstruction error varies according to different spatial frequency components. The results are shown in Figure 6 for a randomly chosen Kidney image. We see that our algorithm reconstructs the image much better at the lower frequencies, whereas at higher frequencies all methods perform similarly. This makes intuitive sense, since all methods are able to capture the high-frequency components in the reconstruction such as edges, our method outperforms other methods in capturing the subtle variations in the images which often correspond to changes in tissue density of other biological characteristics.

Fig. 6 Reconstruction error as a function of frequency.

Download Full Size | PDF

4.3. Quality of segmentation

Pathological diagnosis largely depends on nuclei localization and shape analysis. We used a simple color segmentation method to segment the nuclei using K-means clustering to segment the image into four different classes based on pixel values in Lab color space [62]. Following this, we use the Hadamard product of each class with the gray level image of the original bright-field image, computed average of pixel intensities in each class, and assigned the lowest value to the cell nuclei. To evaluate our results, we compare the segmentation of the reconstructed images with the results from HR images (ground truth) for 50 samples from the breast and kidney from each group by computing the misclassification error, which calculates the percentage of pixels misclassified. We compare our algorithm with the other methods used in the previous section. Results show that number of pixels misclassified from images generated using our method, is in most cases better than the other methods compared. Qualitative results of the segmentation masks (using blue lines as boundaries) are shown in Figure 7.

Fig. 7 Results of segmentation: Cols 1 & 3 show the segmentation mask on the high resolution image for a breast and kidney image respectively, Cols 2 & 4 shows the segmentation masks for corresponding reconstructions.

Download Full Size | PDF

Table 3. Misclassification error from segmentation.

View Table | View all tables in this article

4.4. Model parameters

Here, we study how model parameters affect the reconstruction output. First we study the variation of CNN model parameters and its effect on the reconstructed image, by systematically varying the filter sizes, number of filters and number of layers. These results are obtained before applying the nearest neighbor enhancement, to avoid confounds related to the mixed effects of the two stages of the reconstruction process (CNN and KNN). These results help justify the different choices made in the network design. Next, we also see the performance of our algorithm relative to resolution variation. We discuss these issues next.

Filter Size

In order to study the network sensitivity to filter sizes, we conducted number of experiments with different size of filters. Note that as mentioned earlier, for our experiments, we use filter sizes of 2, 1, 3, 2 and 1 for each of first five consecutive layers, this is denoted as 2-1-3-2-1. In addition, we varied the filter sizes both at the input layers 5-2-3-2-1 and the output layers 2-1-3-5-2 and measured the psnr in each case. Finally, we also ran experiments in the setting where both the input and output layer filter sizes have been increased 10-5-6-10-5. The results are shown in Figure 8. As we can see, that the reconstruction error increases slightly when the filter sizes at the input side is changed and more so, when they are increased on the output side. We see a significant increase in the error, when the filter sizes are increased to 10-5-6-10-5. This shows that for this application, small filter sizes work best for a good reconstruction.

Fig. 8 Reconstruction as a function of filter size.

Download Full Size | PDF

Fig. 9 Reconstruction as a function of filter size.

Download Full Size | PDF

Number of Filters

Here we study the performance of our algorithm wrt increasing the number of filters. Recall that in our network design, number of filters at each layer are assigned as follows: n₁ and n₂ are set to 64, and n₃, n₄ and n₅ are set to 32, 16 and 8 respectively. Let us denote this as {64, 32, 16, 8}. In addition to this, we also run experiments in two other settings of filters numbers: a) {64, 64, 64, 64}, where the filter numbers are kept same and b) {128, 64, 32, 16}, where the number of filters at each layer are doubled compared to our experimental setting, Figure 9 . We see that in our case, keeping the number of filters fixed at all layers, or doubling the number of filter have a negligible effect on the quality of reconstruction. This was observed by Dong as well [15], where increasing the number of filter had a marginal effect on corresponding psnr values. However, larger filter numbers contribute to an increase in the computational time. Considering this, we see that filter numbers {64, 32, 16, 8} are a good choice for our model.

Number of Layers

It is generally believed that increasing the depth of CNN, by adding more layers leads to an improvement in performance of the learning framework. However for image super resolution problem in particular, Dong [15] observed that “..the effectiveness of simple deeper structures for a super-resolution is not apparent” for super-resolution applications as it is for image classification tasks. We evaluate the effectiveness of depth of the network wrt to reconstruction performance. For this, we apply additional convolutional layers after the super-pixel upscaling layer. Therefore, the output of our network is passed as an input to additional layers (identical to convolutional layers in our network), which are stacked at the output. The results are shown in Figure 10. Similar to Dong, we do not observe significant changes in the quality of reconstruction, as a function of increase in the number of layers.

Fig. 10 Reconstruction as a function of number of layers.

Download Full Size | PDF

Reconstruction as a function of resolution: We performed experiments to study the performance of our algorithm relative to resolution variation. Using a randomly selected subset of 20 images from the kidney dataset, we resized the images such that the resolution difference is a factor of {2, 3, 4} respectively. The training was done separately for each setting. We compared the mean psnr values in each case, see Table 4. The psnr is best when the resolution difference is the lowest, and degrades as the resolution increases, which is expected. However we noticed that the change in reconstruction error is not significant, indicating our method still performs well at higher resolutions.

Table 4. Reconstruction error (measured as psnr) as a function of resolution variation.

View Table | View all tables in this article

4.5. Running time

Here, we briefly discuss the computational issues related to our model. Since each prior method we discussed earlier, has been implemented under different platform and libraries, it is not possible to do a meaningful and fair assessment of the runtime comparison of these methods. Therefore, we report on the computational times of our method only.

We implemented our model in TensorFlow using Python, which has inherent GPU utilization capability. We used a workstation with an AMD processor with a 2.4 GHz CPU, 16 Gb RAM and NVIDIA GPU Quadro K2200 graphics card. All our experiments have been performed using GPU, which shows significant performance gains compared to CPU runtime. The training time of our models depends on various factors such as dataset volume, network size, learning rate, batch size and number of training epochs. To report running times for training, we fix the network size to {64, 32, 16, 8}, learning rate to 10⁻³, dataset volume to 100 images, batchsize to 10 and number of training epochs to 10⁵. We vary the data set volume by training the network for different resolutions {2, 3, 4} and obtain the training time for each setting. The results are shown in Figure 11.

Fig. 11 Runtime as a function of resolution.

Download Full Size | PDF

The results show running time of our method has almost a linear dependence of the resolution factor, since all images go through the same number of convolutions. The time to generate a new high resolution image once the network is trained takes 2 – 3 minutes. The test-time speed of our model can be further accelerated by approximating or simplifying the trained networks with possible slight degradation in performance.

5. Future directions

This paper provides an interesting way to utilize low-resolution images, produced by slide scanners in end-user diagnostic applications. Until now, this was not feasible due to the lack of discriminatory features of tissue types, which are not obserable from low-resolution images. In addition, this work also leads to several interesting ideas which we will pursue as future work. We discuss these briefly next.

Besides the slide scanner images and the high-resolution images (20×) we have also acquired two intermediate resolutions as well. This gives us a sequence of resolutions for each image. This data is one of a kind and gives rise to a unique problem for super-resolution methodologies where the data has sequence structure. Note that the closest related work to this is Multi-Frame SR [63] applied to video reconstructions, which make use of motions of video frames. In our case, not only do we not have such motion information to help the reconstruction, the number of frames are far fewer than traditional video sequences.
One of our future goals is also generalize our technique so that it can learn the mapping between any two modalities. Particularly, we will adapt our technique to generate another modality such as phase contrast images, given the high-resolution images as input. This would provide a way to automatically generate specific modalities, without the need for actual acquisition procedure.
Our last goal is aimed at making our model scalable to large datasets and accelerating the CNN model to produce high-resolution images with a reduced computational and memory footprint. To do this, we will adopt recent developments in deep learning which show that one can substantially improve the running time of Deep CNNs by approximating by linear filters and other related ideas [64].

6. Conclusion

This paper provides an efficient way to utilize LR slide scanner images for more fine-grained pathological diagnosis by generating high quality reconstructed images which are similar in performance to images from expensive scanners. Experiments show promising results when compared against state-of-the-art methods on a number of test images. This approach is not only more cost effective over currently used approaches but may also open up new opportunities in histopathology research and clinical application, due to ease of use and quick scanning speed of low-resolution scanners over their high-resolution counterparts.

Funding

UW Laboratory for Optical and Computational Instrumentation; the Morgridge Institute for Research; Standup to Cancer; NIH (R01CA199996); NSF CGV (1219016).

Acknowledgments

We thank Dr. Sara Best and Dr. Andreas Friedl of the University of Wisconsin-Madison for the use of the renal and breast cancer TMAs.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. R. S. Weinstein, M. R. Descour, C. Liang, G. Barker, K. M. Scott, L. Richter, E. A. Krupinski, A. K. Bhattacharyya, J. R. Davis, A. Graham, M. Rennels, W. Russum, J. Goodall, P. Zhou, A. Olszak, B. Williams, J. Wyant, and P. Bartels, “An array microscope for ultrarapid virtual slide processing and telepathology. design, fabrication, and validation study,” Hum. Pathol. 35, 1303–1314 (2004). [CrossRef]

2. D. C. Wilbur, K. Madi, R. B. Colvin, L. M. Duncan, W. C. Faquin, J. A. Ferry, M. P. Frosch, S. L. Houser, R. L. Kradin, Gregory Y. Lauwers, D. Louis, E. Mark, M. Mino-Kenudson, J. Misdraji, G. Nielsen, M. Pitman, A. Rosenberg, R. Smith, A. Sohani, J. Stone, R. Tambouret, C. Wu, R. H. Young, A. Zembowicz, and W. Klietmann, “Whole-slide imaging digital pathology as a platform for teleconsultation: a pilot study using paired subspecialist correlations,” Arch. Pathology & Laboratory Medicine 133, 1949–1953 (2009).

3. L. Pantanowitz, M. Hornish, and R. A. Goulart, “The impact of digital imaging in the field of cytopathology,” Cytojournal 6, 6 (2009). [CrossRef] [PubMed]

4. L. Pantanowitz, P. N. Valenstein, A. J. Evans, K. J. Kaplan, J. D. Pfeifer, D. C. Wilbur, L. C. Collins, and T. J. Colgan, “Review of the current state of whole slide imaging in pathology,” J. Pathol. Informatics 2, 36 (2011). [CrossRef]

5. J. R. Gilbertson, J. Ho, L. Anthony, D. M. Jukic, Y. Yagi, and A. V. Parwani, “Primary histologic diagnosis using automated whole slide imaging: a validation study,” BMC Clin. Pathol. 6, 4 (2006). [CrossRef] [PubMed]

6. D. C. Wilbur, “Digital cytology: current state of the art and prospects for the future,” Acta Cytol. 55, 227–238 (2011). [CrossRef] [PubMed]

7. “Tumor grade,” https://www.cancer.gov/about-cancer/diagnosis-staging/prognosis/tumor-grade-fact-sheet.

8. Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in “2015 IEEE ICCV,” (2015), pp. 370–378.

9. M. Zheng, J. Bu, C. Chen, C. Wang, L. Zhang, G. Qiu, and D. Cai, “Graph regularized sparse coding for image representation,” IEEE TIP 20, 1327–1336 (2011).

10. J. Yang, Z. Lin, and S. Cohen, “Fast image super-resolution based on in-place example regression,” in “CVPR,” (2013), pp. 1059–1066.

11. D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in “IEEE ICCV,” (IEEE, 2009), pp. 349–356.

12. K. I. Kim and Y. Kwon, “Single-image super-resolution using sparse regression and natural image prior,” IEEE PAMI 32, 1127–1133 (2010). [CrossRef]

13. J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in “Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,” (2015), pp. 5197–5206.

14. M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “Enhancenet: Single image super-resolution through automated texture synthesis,” arXiv preprint arXiv:1612.07919 (2016).

15. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis Mach. intelligence 38, 295–307 (2016). [CrossRef]

16. C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in “ECCV,” (Springer, 2016), pp. 391–407.

17. A. Bansal, Y. Sheikh, and D. Ramanan, “Pixelnn: Example-based image synthesis,” arXiv preprint arXiv:1708.05349 (2017).

18. T. Angles and S. Mallat, “Generative networks as inverse problems with scattering transforms,” (2018).

19. C. E. Duchon, “Lanczos filtering in one and two dimensions,” J. Appl. Meteorol. 18, 1016–1022 (1979). [CrossRef]

20. J. Allebach and P. W. Wong, “Edge-directed interpolation,” in “ICIP,”, vol. 3 (IEEE, 1996), vol. 3, pp. 707–710.

21. X. Li and M. T. Orchard, “New edge-directed interpolation,” IEEE TIP 10, 1521–1527 (2001).

22. W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization,” IEEE Transactions on Image Process. 20, 1838–1857 (2011). [CrossRef]

23. J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution as sparse representation of raw image patches,” in “CVPR,” (IEEE, 2008), pp. 1–8.

24. S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, and L. Zhang, “Convolutional sparse coding for image super-resolution,” in “IEEE ICCV,” (2015), pp. 1823–1831.

25. K. Zhang, X. Gao, D. Tao, and X. Li, “Multi-scale dictionary for single image super-resolution,” in “CVPR,” (IEEE, 2012), pp. 1114–1121.

26. R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in “IEEE ICCV,” (2013), pp. 1920–1927.

27. H. He and W.-C. Siu, “Single image super-resolution using gaussian process regression,” in “Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on,” (IEEE, 2011), pp. 449–456.

28. Z. Cui, H. Chang, S. Shan, B. Zhong, and X. Chen, “Deep network cascade for image super-resolution,” in “European Conference on Computer Vision,” (Springer, 2014), pp. 49–64.

29. C. Osendorfer, H. Soyer, and P. Van Der Smagt, “Image super-resolution with fast approximate convolutional sparse coding,” in “International Conference on Neural Information Processing,” (Springer, 2014), pp. 250–257.

30. C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in “ECCV,” (Springer, 2014), pp. 184–199.

31. Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Transactions on Pattern Analysis Mach. Intell. 39, 1256–1272 (2017). [CrossRef]

32. K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in “ICML,” (2010), pp. 399–406.

33. W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Fast and accurate image super-resolution with deep laplacian pyramid networks,” arXiv preprint arXiv:1710.01992 (2017).

34. B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in “The IEEE conference on computer vision and pattern recognition (CVPR) workshops,”, vol. 1 (2017), vol. 1, p. 4.

35. M. Haris, G. Shakhnarovich, and N. Ukita, “Deep backprojection networks for super-resolution,” in “Conference on Computer Vision and Pattern Recognition,” (2018).

36. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” CoRR (2016).

37. B. Wu, H. Duan, Z. Liu, and G. Sun, “Srpgan: Perceptual generative adversarial network for single image super resolution,” CoRRabs/1712.05927 (2017).

38. J. Li, Z. Lu, G. Zeng, R. Gan, and H. Zha, “Similarity-aware patchwork assembly for depth image super-resolution,” in “Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,” (2014), pp. 3374–3381.

39. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in “European Conference on Computer Vision,” (Springer, 2016), pp. 694–711.

40. J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in “Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,” (2016), pp. 1646–1654.

41. J. Kim, J. Kwon Lee, and K. Mu Lee, “Deeply-recursive convolutional network for image super-resolution,” in “Proceedings of the IEEE conference on computer vision and pattern recognition,” (2016), pp. 1637–1645.

42. R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve example-based single image super resolution,” in “Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on,” (IEEE, 2016), pp. 1865–1873.

43. S. Schulter, C. Leistner, and H. Bischof, “Fast and accurate image upscaling with super-resolution forests,” in “Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,” (2015), pp. 3791–3799.

44. J.-H. Jacobsen, E. Oyallon, S. Mallat, and A. W. Smeulders, “Multiscale hierarchical convolutional networks,” arXiv preprint arXiv:1703.04140 (2017).

45. W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in “Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,” (2016), pp. 1874–1883.

46. W. Shi, J. Caballero, L. Theis, F. Huszar, A. Aitken, C. Ledig, and Z. Wang, “Is the deconvolution layer the same as a convolutional layer?” arXiv preprint arXiv:1609.07009 (2016).

47. T. H. Kim and J. P. Haldar, “The Fourier radial error spectrum plot: A more nuanced quantitative evaluation of image reconstruction quality,” in “Proceedings of the International conference on Biomedical Imaging(ISBI),” (2018).

48. J. Snell, K. Ridgeway, R. Liao, B. D. Roads, M. C. Mozer, and R. S. Zemel, “Learning to generate images with perceptual similarity metrics,” in “Image Processing (ICIP), 2017 IEEE International Conference on,” (IEEE, 2017), pp. 4277–4281.

49. M. W. Conklin, J. C. Eickhoff, K. M. Riching, C. A. Pehlke, K. W. Eliceiri, P. P. Provenzano, A. Friedl, and P. J. Keely, “Aligned collagen is a prognostic signature for survival in human breast carcinoma,” Am. J. Pathol. 178, 1221–1232 (2011). [CrossRef] [PubMed]

50. S. Best, Y. Liu, A. Keikhosravi, C. Drifka, K. Woo, G. Mehta, M. Altwegg, T. Thimm, M. Houlihan, J. Bredfeldt, E. Abel, W. Huang, and K. Eliceiri, “Collagen organization of renal cell carcinoma differs between low and high grade tumors,” Urol. (submitted) (2018).

51. J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE TIP 19, 2861–2873 (2010).

52. J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictionary training for image super-resolution,” IEEE Transactions on Image Process. 21, 3467–3478 (2012). [CrossRef]

53. D. Liu, Z. Wang, B. Wen, J. Yang, W. Han, and T. S. Huang, “Robust single image super-resolution via deep networks with sparse prior,” IEEE Transactions on Image Process. 25, 3194–3207 (2016). [CrossRef]

54. L. Mukherjee, A. Keikhosravi, and K. Eliceiri, “Neighborhood regularized image superresolution for applications to microscopic imaging,” in “Proceedings of the International conference on Biomedical Imaging(ISBI),” (2018).

55. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in “CVPR,” (2017), vol. 2, p. 4.

56. H. Dong, A. Supratak, L. Mai, F. Liu, A. Oehmichen, S. Yu, and Y. Guo, “TensorLayer: A Versatile Library for Efficient Deep Learning Development,” ACM Multimed. (2017).

57. “Leica biosystems,” http://www.leicabiosystems.com/digital-pathology/aperio-digital-pathology-slide-scanners/products/aperio-cs2/.

58. M. Instrum., “Pathscan enabler iv, digital pathology slide scanner,”.

59. H. R. Sheikh, A. C. Bovik, and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Transactions on Image Process. 14, 2117–2128 (2005). [CrossRef]

60. N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A. C. Bovik, “Image quality assessment based on a degradation model,” IEEE Transactions on Image Process. 9, 636–650 (2000). [CrossRef]

61. N. S. Bunker, “Optimization of weighted signal-to-noise ratio for a digital video encoder,” (1996). US Patent 5,525,984.

62. Y. Liu, A. Keikhosravi, G. S. Mehta, C. R. Drifka, and K. W. Eliceiri, “Methods for quantifying fibrillar collagen alignment,” in “Fibrosis, Methods in Molecular Biology,” (Springer, 2017), pp. 429–451.

63. Y. Huang, W. Wang, and L. Wang, “Bidirectional recurrent convolutional networks for multi-frame super-resolution,” in “Advances in Neural Information Processing Systems 28,” (2015), pp. 235–243.

64. X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating very deep convolutional networks for classification and detection,” IEEE Transactions on Pattern Analysis Mach. Intell. 38, 1943–1955 (2016). [CrossRef]

Breast TMA

Method	RMSE	SNR	SSIM	MI	MSSIM	IFC	NQM	WSNR
Ours	18.39	22.99	0.7968	0.2784	0.94	2.86	19.40	23.73
Bicubic	34.94	17.39	0.48	0.20	.77	1.09	9.05	16.97
ScR	32.32	18.28	0.60	0.23	0.82	0.98	8.44	17.59
CSCN	18.82	22.87	0.66	0.16	.65	.32	3.6	18.48
FSRCNN	20.11	22.26	0.65	0.15	0.68	0.40	3.13	16.51
ESCNN	26.63	19.74	0.71	0.21	0.85	1.65	10.39	20.0
SCDL	16.69	23.92	0.69	0.23	0.85	1.27	9.31	20.71
SRGAN	29.76	18.74	0.73	.19	0.81	1.22	9.70	19.50

Kidney TMA

Method	RMSE	SNR	SSIM	MI	MSSIM	IFC	NQM	WSNR
Ours	20.48	21.96	.75	0.21	0.93	1.99	13.41	30.05
Bicubic	36.66	16.39	0.49	0.16	0.71	.60	1.07	17.46
ScR	32.32	18.28	0.60	0.19	0.78	0.84	5.26	19.31
CSCN	31.55	18.20	.47	0.18	0.66	0.61	5.36	17.30
FSRCNN	35.86	18.43	.58	0.15	0.64	0.25	4.94	20.40
ESCNN	33.53	17.69	0.62	0.17	0.80	0.90	4.24	19.75
SCDL	22.34	21.18	.63	0.20	0.87	1.93	8.05	24.36
SRGAN	29.11	18.92	0.72	.15	0.81	0.87	7.028	23.31

Dataset	Ours	ESCNN	FSRCNN	SCN	ScR	SCDL	SRGAN

Breast	.19	.23	.34	.35	.33	.22	.20
Kidney	.17	.17	.20	.25	.25	.21	.19

Dataset	2	3	4
Kidney	21.96	20.36	19.93
Breast	22.99	22.10	21.95

Breast TMA

Method	RMSE	SNR	SSIM	MI	MSSIM	IFC	NQM	WSNR
Ours	18.39	22.99	0.7968	0.2784	0.94	2.86	19.40	23.73
Bicubic	34.94	17.39	0.48	0.20	.77	1.09	9.05	16.97
ScR	32.32	18.28	0.60	0.23	0.82	0.98	8.44	17.59
CSCN	18.82	22.87	0.66	0.16	.65	.32	3.6	18.48
FSRCNN	20.11	22.26	0.65	0.15	0.68	0.40	3.13	16.51
ESCNN	26.63	19.74	0.71	0.21	0.85	1.65	10.39	20.0
SCDL	16.69	23.92	0.69	0.23	0.85	1.27	9.31	20.71
SRGAN	29.76	18.74	0.73	.19	0.81	1.22	9.70	19.50

Convolutional neural networks for whole slide image superresolution

Abstract

1. Introduction

2. Related work

3. Methods

3.1. Convolutional neural network design

3.2. Training and loss function

3.3. Nearest neighbor enhancement

4. Experiments

4.1. Materials and methods

Human tissue microarrays

Imaging systems

4.2. Reconstruction quality

Comparison with state of art

Reconstruction as a function of frequency

4.3. Quality of segmentation

4.4. Model parameters

Filter Size

Number of Filters

Number of Layers

4.5. Running time

5. Future directions

6. Conclusion

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (11)

Tables (4)

Equations (6)

Biomedical Optics Express