Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Agile wide-field imaging with selective high resolution

Open Access Open Access

Abstract

Wide-field and high-resolution (HR) imaging are essential for various applications such as aviation reconnaissance, topographic mapping, and safety monitoring. The existing techniques require a large-scale detector array to capture HR images of the whole field, resulting in high complexity and heavy cost. In this work, we report an agile wide-field imaging framework with selective high resolution that requires only two detectors. It builds on the statistical sparsity prior of natural scenes that the important targets locate only at small regions of interest (ROI), instead of the whole field. Under this assumption, we use a short-focal camera to image a wide field with a certain low resolution and use a long-focal camera to acquire the HR images of ROI. To automatically locate ROI in the wide field in real time, we propose an efficient deep-learning-based multiscale registration method that is robust and blind to the large setting differences (focal, white balance, etc) between the two cameras. Using the registered location, the long-focal camera mounted on a gimbal enables real-time tracking of the ROI for continuous HR imaging. We demonstrated the novel imaging framework by building a proof-of-concept setup with only 1181 gram weight, and assembled it on an unmanned aerial vehicle for air-to-ground monitoring. Experiments show that the setup maintains 120° wide field of view (FOV) with selective 0.45mrad instantaneous FOV.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Wide-field and high-resolution (HR) imaging is essential for various applications such as aviation reconnaissance, topographic mapping and safety monitoring [14]. Unfortunately, the field of view (FOV) and resolution of a camera are usually limited to each other, due to the fixed system throughput [5]. For the cameras with long focal length, the resolution is high but the field of view is narrow, and vice versa. The existing solution to the contradiction is to build a camera array system, and stitch the HR images of each camera together to extend the field of view [611]. The hardware is complex with high cost and heavy electrical complexity, and the data amount is huge that is difficult to transmit and process in real time. In addition, it is hard to optionally adjust focal length simultaneously for all the cameras.

In practical applications, however, the regions of interests (ROI) occupy only a small part of the wide FOV, known as the statistical sparsity prior in the spatial domain [12]. Taking aerial reconnaissance as an example, as shown in Fig. 1, the important target (cars) is much smaller compared to the entire field of view. In other words, there is no need to acquire HR information of all the subfields that is laboursome. Instead, HR imaging of only the key ROI provides sufficient information for most applications.

 figure: Fig. 1.

Fig. 1. The reported agile wide-field imaging framework with selective high resolution. It is constructed of a reference camera to acquire wide-field images, and a dynamic camera to automatically and adaptively track regions of interest and acquire high-resolution images. The entire system has only 1181 gram weight, with 120$^{\circ }$ wide field of view (FOV) and selective 0.45$mrad$ instantaneous FOV.

Download Full Size | PDF

Based on the above observation, we report an agile wide-field imaging framework with selective high resolution, as shown in Fig. 1 and Fig. 2. The system requires only two cameras, including a short-focal camera to image wide field for reference, and a long-focal camera mounted on a gimbal [13] to acquire HR images of ROI. To automatically locate ROI in the wide field, we propose an efficient deep-learning-based multiscale registration method that enables real-time registration of the HR image with the wide-field low-resolution (LR) image. The technique is robust and blind to the large setting differences (focal length, white balance, etc) between the two cameras, while the conventional registration algorithms [7,8,14,15] fail in such a case that different devices maintain different unknown imaging parameters. In such a multiscale registration strategy with negative feedback, the long-focal camera mounted on a gimbal enables real-time tracking of the ROI for continuous HR imaging, and the system is robust to platform vibration and fast target movement.

In summary, the main innovations of the reported agile imaging framework include:

  • • The system is agile with only two cameras compared to the conventional large-scale detector arrays. Together with a gimbal, the system enables continuous HR imaging of ROI in a wide field. The multiscale imaging strategy also ensures strong robustness to platform vibration and fast target movement.
  • • We report an efficient multiscale registration technique to automatically locate ROI in the wide field. By fusing multiscale convolution features to construct multiscale feature descriptors, the technique is robust and blind to different imaging parameters of the two cameras and maintains real-time running efficiency (0.1s/frame).
  • • We built a prototype setup that integrates a JAI GO-5000 camera and a Foxtech Seeker10 camera for imaging, and an NVIDIA TX2 processor for real-time processing and control. The setup’s weight is only 1181 grams, making it applicable for load-limited platforms such as an unmanned aerial vehicle (UAV). Experiments validate its 120° wide FOV with selective 0.45$mrad$ instantaneous FOV.
The rest of this article is organized as follows. The details of the reported imaging framework are presented in section 2. The experiment results are shown in section 3. In section 4, we conclude this work with further discussions.

2. Method

2.1 Agile imaging framework

The reported agile wide-field and high-resolution imaging system is shown in Fig. 1, which consists of a reference camera with a short focal length, and a dynamic camera with a long focal length. The reference camera is fixed to acquire wide-field images but with low resolution. The dynamic camera is mounted on a gimbal, which is rotated to track ROI and acquire its HR images.

 figure: Fig. 2.

Fig. 2. The workflow of the reported wide-field and high-resolution imaging technique. The reference camera is employed to take wide-field images $I_x$ with a certain low resolution, and the dynamic camera is to acquire HR images $I_y$ of ROI. A deep-learning-based algorithm is employed to register the different-scale images, in order to find the location relationship between the two fields of view of the two cameras. The algorithm first extracts multi-scale feature descriptors from the output of multiple different intermediate network layers, and then calculates the feature point distance matrix based on these feature descriptors. Under a bidirectional feature point matching strategy, high-accuracy feature point matching pairs are obtained that produce the homography transformation matrix of the two input images. The coordinate $Y$ of $I_y$ in $I_x$ is calculated using the homography transformation matrix, and input to the gimbal to update the field of view. Benefiting from the negative feedback mechanism, the entire system enables continuous wide-field and HR imaging.

Download Full Size | PDF

To realize robust tracking, the location of the narrow-field HR images (ROI) $I_y$ on the wide-field image $I_x$ should be identified in real time, which is realized using the following introduced multiscale registration technique. Then, we convert the location of the ROI in the reference camera into coordinate $Y$ of the dynamic HR camera. The coordinate conversion relationship between the static reference camera and the dynamic HR camera is calibrated in advance. Afterward, the dynamic HR camera is rotated to the coordinate $Y$ to take HR images. The HR images are further matched to the wide-filed LR images to verify whether the coordinates of the dynamic HR camera rotation are correct. By iteratively updating the coordinate in a negative feedback manner as shown in Fig. 2, the coordinate conversion relationship between the cameras can be adaptively corrected, and continuous tracking and HR imaging of ROI is realized.

2.2 Blind multiscale registration

To locate ROI in the wide field for continuous tracking and HR imaging, we propose an efficient multiscale registration algorithm enabling to register $I_x$ and $I_y$ acquired by the two cameras. Due to the large focal difference between the two cameras, there exists a large scale difference between $I_x$ and $I_y$. Besides, the focal difference (i.e., scale difference) is usually unknown in practical applications. What’s more, the two cameras may maintain different white balance settings, resulting in different spectra of the two images. In such a case, the conventional image registration methods fail to find correspondence between these two images.

Considering that the agile imaging system requires real-time efficiency in practical applications, we employ deep learning for blind multiscale registration. Briefly, the feature points and corresponding feature descriptors of the two images are first detected and constructed using a neural network. Then, they are matched in a bidirectional manner to produce a high-accuracy homography transformation matrix $H$. Finally, the input image $I_y$ is aligned with $I_x$, and the location relationship is obtained. The detailed steps are summarized in Fig. 2.

2.2.1 Construct feature descriptors

To efficiently extract multi-scale feature maps from the input images, we designed a multi-scale feature extraction network, whose detailed structure is shown in Fig. 2. It has a total of 18 network layers, consisting of 9 convolutional layers, 6 max-pooling layers, and 3 fully connected layers. The convolutional layers employ 3$\times$3 convolution kernels. The size of the pooling layer is 2$\times$2. Each of the first six convolutional blocks halves the feature map size and doubles the number of channels. We trained the multi-scale feature extraction network on the classification data set CIFAR-10 to achieve a classification accuracy of 93%. In this way, the network obtains a strong feature extraction capability.

Based on the visualization results of the convolutional layer [16] and a large number of comparative experiments on extracting the output of different layers, we chose the output of three of the pooling layers (the fourth, fifth and sixth layers) to construct feature descriptors. These layers search for a set of common patterns and generate special characteristic response values, which can cover different sizes of perception domains [17], making it robust to large unknown scale difference between the two input images. The lower layers are not selected because their output is affected by specific detection objects that are not suitable for detecting general features.

As a result, the network generates a feature point for each 16$\times$ 16 pixel block of the input image, and we use the middle layer output of the multi-layer convolutional neural network to generate the feature descriptors, including $F1$, $F2$ and $F3$ from the fourth, fifth, and sixth pooling layers. Among them, each $F1$ feature descriptor is generated by 1 feature point, each $F2$ feature descriptor is composed of 4 feature points, and each $F3$ feature descriptor is composed of 16 feature points. In our implementation, the input image is resized to 448$\times$448 for high efficiency. Consequently, a total number of 784 128-dimensional $F1$ descriptors, 196 256-dimensional $F2$ feature descriptors, and 49 512-dimensional $F3$ feature descriptors are generated.

2.2.2 Calculate the distance matrix of feature points

The feature distance between two feature points $x$ and $y$ is a weighted sum of three metrics as

$$d(x,y)=2d_1(x,y)+\sqrt{2}d_2(x,y)+d_3(x,y),$$
The weights are employed to balance the different scales of the feature descriptors. Each distance component is defined as the Euclidean distance between two feature descriptors $D_i(x)$ and $D_i(y)$ as
$$d_i(x,y)=Euclidean(D_i(x),D_i(y)).$$
The size of the $F1$ feature descriptor is 784$\times$128, and the size of its feature distance matrix is 784$\times$784. To improve calculation efficiency, we introduced the Harris corner detection technique [18] to produce the coordinate information of corner points. Then, the coordinate information of the corner points is resized into a 784$\times$1 vector, which is then employed as an indicator of whether the feature points are corner points. Only the corner points are used to calculate the distance between the feature points of the two images. In such a strategy, the distance matrix of $F2$ (192$\times$192) and $F3$ (49$\times$49) feature descriptors are generated. Finally, the three feature distance matrix are fused together into a characteristic distance matrix (784$\times$784) following Eq. (1).

2.2.3 Bidirectional feature matching and registration

To increase feature matching accuracy and improve matching robustness, we employ a bidirectional matching strategy to search for the feature correspondence between $I_x$ and $I_y$. As a demonstration, the partial one-way feature point matching algorithm is presented in Algorithm 1. By denoting two feature points as $x$ and $y$, and defining a matching threshold $\theta$, we select 128 feature point pairs of the highest matching accuracy. Under this strategy, two sets of feature point pairs are obtained by bidirectional feature point matching. Then, we take the intersection of these two sets as the final feature point matching pairs.

The matched feature point pairs are then used to calculate the homography transformation matrix [19] between the input two images. The homography transformation matrix is a 3$\times$3 matrix, which is calculated in the least square scheme. We employed the RANSC algorithm [20] to eliminate poor matching pairs of feature points and obtain accurate registration result. With the homography matrix, the pixels in one image can be mapped to the corresponding pixels in the other image, and we can finally obtain the coordinate $Y$ for the dynamic camera.

3. Simulation and experiment

In the next, we implemented both simulations and experiments to validate the effectiveness of the reported agile imaging framework. We first ran the reported algorithm on the public SUIRD [21] and OSCD datasets [22] to test its registration matching accuracy. The SUIRD dataset includes 60 pairs of images, which contain viewpoint changes in horizontal, vertical and their mixture with small overlap and image distortion. The OSCD dataset contains 24 pairs of satellite remote sensing images of the same location taken at different times. As a comparison, we also tested the state-of-the-art image registration techniques, including SIFT [23], ORB [24], AKAZE [25], TFeat [26], HardNet [27] and Super-point [28]. The first three are the traditional feature-matching-based registration techniques, and the latter three are based on deep learning. Among them, the TFeat and HardNet are both based on the siamese network to generate accurate feature descriptors for image registration, and the Super-point method runs in a self-supervised way to extract feature points and calculate descriptors simultaneously. We also built a proof-of-concept prototype setup and applied it for both ground and air-to-ground monitoring experiments. The details are introduced as follows.

3.1 Multiscale image registration accuracy

Considering that the two cameras in the reported agile imaging framework maintain different focal lengths, we synthesized a multiscale copy for each image in both the datasets, with the scale difference being 16, 64, and 256, respectively. The multiscale copy is input to the registration algorithms to validate the effectiveness for blind multiscale registration.

Because feature matching is the most important step in the image registration process, we quantify registration performance using feature point matching accuracy. By denoting the correct feature matching pair as $TP$ and the wrong feature matching pair as $FP$, we define $TPR=TP/(TP+FP)$ as the quantitative metric of feature matching accuracy. The quantitative results are shown in Table 1, with the exemplar visual results of multiscale feature point matching shown in Fig. 3.

Tables Icon

Table 1. Feature matching accuracy and running time of different techniques for different scale differences.

 figure: Fig. 3.

Fig. 3. Exemplar feature matching results of different techniques, with the scale difference between two input images being 64. The yellow lines mark the correct matching pairs, and the blue ones mark the wrong matching pairs.

Download Full Size | PDF

From the results in Table 1, we can see that the reported algorithm is superior to the other techniques. The performance advantage is more obvious when the scale difference between two input images is larger. The TFeat method obtains the lowest accuracy, which dues to its small network scale and shallow depth. Although it is faster than the HardNet and Super-point methods, it cannot extract accurate feature descriptors from the input image blocks, especially when the scales of the input image blocks are different. It even fails when the scale difference is 256. Among the three traditional feature-matching-based methods, ORB works better when the scale difference is small. However, when the scale differs greatly, the SIFT technique maintains greater adaptability than the other two methods. When the scale difference between the cameras reaches a certain level (such as 256), all the conventional techniques fail to find correct feature matching pairs, while the reported technique still works well with much higher registration accuracy. The visual results in Fig. 3 show the clear comparison of wrong matched feature pairs (denoted by blue connects).

We note that the running time of our algorithm changes little under different scale differences. This is because the network first adjusts the input image size to 448*448 and then divides it into 28*28 image patches. Each image patch would generate a feature point. In this way, the number of feature points generated by our algorithm does not change with the input image size. When the resolution of the input image is high, it can still quickly generate feature points and efficiently implemente feature point matching.

 figure: Fig. 4.

Fig. 4. Exemplar feature matching results under different white balance parameters of the input two images, with the scale difference fixed to be 16. The image pairs of $\#1$ and $\#2$ are from the OSCD dataset that were acquired at different time. The image pairs of $\#3$ and $\#4$ are from the SUIRD dataset, which simulates the situation under large white balance difference. The yellow lines mark the correct matching pairs, and the blue ones mark the wrong matching pairs.

Download Full Size | PDF

3.2 Adaptability to different white balance parameters

In practical applications, two individual cameras may maintain different white balance parameters, which results in spectrum difference of the acquired images. To test the adaptability of the algorithms to different white balance parameters, we synthesized image pairs of different spectra by color channel separation, and applied the above algorithms to produce their feature matching accuracy. The quantitative results are shown in Table 2, with the exemplar visualization results shown in Fig. 4. We can see that the performance comparison is similar to that in the above section, namely that the reported algorithm outperforms the others with higher feature matching accuracy and running efficiency. Among the three traditional methods, the AKAZE algorithm benefits from its nonlinear working space that retains more image detail information when the spectrum of input image changes.

Tables Icon

Table 2. Feature matching accuracy and running time of different techniques for different spectrum under different scales.

3.3 Experiments

To validate the effectiveness of the reported agile imaging framework for practical applications, we built a prototype system as shown in Fig. 1. The reference camera is JAI GO-5000 matched with an LM6HC lens (the focal length is 6mm) to obtain wide-field images, with an angle of view of 120°. The dynamic camera is Foxtech Seeker10 enabling two-axis rotation, to obtain high-resolution images with 0.45$mrad$ instantaneous FOV (the focal length reaches up to 49mm). The two cameras are equipped with an NVIDIA TX2 processing platform with Jetpack 3.3 system, which controls camera rotation and image acquisition. The hardware elements are integrated into a 3D-printed shell. The total weight of the entire system is 1181 grams.

The first experiment is implemented at our campus. We randomly selected a scene and selected four ROI, including car license plates, trash cans, street lights, and flower beds. We used the static reference camera to take wide-field reference images and used the dynamic camera to acquire real-time HR images of the ROI, as shown in Fig. 5. We can see that the close-ups of the wide-field image are blurred due to limited resolution, and the system is effective to improve the resolution of this ROI. The total data amount is much less than that of the conventional large-scale camera array, relieving storage, transmission, and processing pressure.

 figure: Fig. 5.

Fig. 5. Ground experiment results using the prototype system. We employed the prototype to capture images of our campus. The results validate that the setup is effective to capture wide-field images, with selective high resolution on the regions of interests.

Download Full Size | PDF

We also conducted an air-to-ground monitoring experiment for practical application. We mounted the prototype setup on a DJI M300 RTK drone. The system was equipped with a 4500mAh 11.1V and a 3000mAh 18.5V lithium battery. The former battery provides the 12V DC power required by the two cameras after the voltage regulator module, and the latter one offers the 19V DC power required by the TX2 development platform. The system enables real-time imaging and processing of wide-field HR images during the flight of the drone (10fps). The drone was flying along the railway, while the wide-field camera ensured that no matter how the drone shakes, HR images of the target railway were always obtained, as shown in Fig. 6. We can see that the static reference camera takes wide-field low-resolution pictures, and the dynamic high-resolution camera performs continuous real-time tracking and high-resolution imaging of the railway area. This experiment validates the system’s wide applications for certain platforms with a limited load.

 figure: Fig. 6.

Fig. 6. Air-to-ground monitoring experiment results using the prototype setup mounted on a drone. We select a railway as ROI. The reference camera takes wide-field LR images, and the dynamic camera enables continuous HR imaging of the railway using the reported multiscale image registration technique.

Download Full Size | PDF

4. Conclusion

In this work, we reported a novel wide-field and HR imaging architecture based on the sparse characteristics of natural scenes. Compared with the conventional gigapixel imaging systems, we reduce the redundant amount of collected data from gigapixel level to megapixel level, with almost the same 120$^{\circ }$ wide field-of-view (FOV) and selective 0.45$mrad$ instantaneous FOV. The system maintains low cost with only two cameras, and the weight is only 1181 grams. To automatically locate ROI in the wide field for continuous and robust HR imaging, we propose an efficient image registration algorithm based on hierarchical convolution features, which overcomes the limitation of the existing image registration algorithms that can not handle large scale and white balance parameters between the two input images. We conducted a series of simulations to validate the superiority of our algorithm with higher feature matching accuracy compared to the state of the art, in both cases of large scale difference and white balance difference. Both ground and drone experiments validate the effectiveness of the reported imaging framework, and demonstrate its great potential in practical applications, especially for those with a limited weight load and computation load.

Funding

National Key Research and Development Program of China (2020YFB0505601); National Natural Science Foundation of China (61971045, 61991451, 62088101); Fundamental Research Funds for the Central Universities (3052019024).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

The training data of the multi-scale feature extraction network comes from Ref. [29], and the data used in the feature point matching experiment can be found in Ref. [21,22].

References

1. S. Romeo, L. Di Matteo, D. S. Kieffer, G. Tosi, A. Stoppini, and F. Radicioni, “The use of gigapixel photogrammetry for the understanding of landslide processes in alpine terrain,” Geosciences 9(2), 99–100 (2019). [CrossRef]  

2. N. M. Law, O. Fors, J. Ratzloff, H. Corbett, D. del Ser, and P. Wulfken, “The Evryscope: design and performance of the first full-sky gigapixel-scale telescope,” in Ground-based and Airborne Telescopes VI, vol. 9906 H. J. Hall, R. Gilmozzi, and H. K. Marshall, eds., International Society for Optics and Photonics (SPIE, 2016), pp. 589–594.

3. C. Eschmann, C. M. Kuo, C. H. Kuo, and C. Boller, “Unmanned aircraft systems for remote building inspection and monitoring,” in Proceedings of the 6th European Workshop on Structural Health Monitoring, Dresden, Germany, vol. 36 (2012), pp. 13–14.

4. E. Heymsfield and M. L. Kuss, “Implementing gigapixel technology in highway bridge inspections,” Constr. Fac. J. Perform. Constr. Facil. 29(3), 04014074 (2015). [CrossRef]  

5. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” Nat. Photonics 7(9), 739–745 (2013). [CrossRef]  

6. D. J. Brady, M. E. Gehm, R. A. Stack, D. L. Marks, D. S. Kittle, D. R. Golish, E. Vera, and S. D. Feller, “Multiscale gigapixel photography,” Nature 486(7403), 386–389 (2012). [CrossRef]  

7. X. Yuan, L. Fang, Q. Dai, D. J. Brady, and Y. Liu, “Multiscale gigapixel video: A cross resolution image matching and warping approach,” in 2017 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2017), pp. 1–9.

8. O. S. Cossairt, D. Miau, and S. K. Nayar, “Gigapixel computational imaging,” in 2011 IEEE international conference on computational photography (ICCP), (IEEE, 2011), pp. 1–8.

9. J. Kopf, M. Uyttendaele, O. Deussen, and M. F. Cohen, “Capturing and viewing gigapixel images,” ACM Trans. Graph 26(3), 93 (2007). [CrossRef]  

10. O. S. Cossairt, D. Miau, and S. K. Nayar, “Camera systems and methods for gigapixel computational imaging,” (2016).

11. D. Golish, E. Vera, K. Kelly, Q. Gong, P. Jansen, J. Hughes, D. Kittle, D. Brady, and M. Gehm, “Development of a scalable image formation pipeline for multiscale gigapixel photography,” Opt. Express 20(20), 22048–22062 (2012). [CrossRef]  

12. Y. Ren, C. Zhu, and S. Xiao, “Small object detection in optical remote sensing images via modified faster R-CNN,” Appl. Sci. 8(5), 813–814 (2018). [CrossRef]  

13. G. J. Knowles, M. Mulvihill, K. Uchino, and B. Shea, “Solid state gimbal system”, (2008).

14. S. Philip, B. Summa, J. Tierny, P.-T. Bremer, and V. Pascucci, “Distributed seams for gigapixel panoramas,” IEEE Trans. Visual. Comput. Graphics 21(3), 350–362 (2014). [CrossRef]  

15. H. S. Son, D. L. Marks, E. Tremblay, J. E. Ford, J. Hahn, R. A. Stack, A. Johnson, P. McLaughlin, J. M. Shaw, J. Kim, and D. J Brady, “A multiscale, wide field, gigapixel camera,” in Computational Optical Sensing and Imaging, (Optical Society of America, 2011), pp. 22–23.

16. M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European Conference on Computer Vision(ECCV), (2014), pp. 818–833.

17. A. Coates and A. Y. Ng, “Selecting Receptive Fields in Deep Networks,” Adv. Neural Inform. Process. Syst. pp. 2528–2536 (2011).

18. C. G. Harris and M. Stephens, “A combined corner and edge detector,” in Alvey Vision Conference, vol. 15 (Citeseer, 1988), pp. 10–5244.

19. E. Dubrofsky, “Homography estimation,” Diplomová práce. Vancouver: Univerzita Britské Kolumbie 5 (2009).

20. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM 24(6), 381–395 (1981). [CrossRef]  

21. yyangynu, Small uav image registration dataset, https://github.com/yyangynu/SUIRD.

22. R. C. Daudt, “Onera satellite change detection dataset,” https://rcdaudt.github.io/oscd/.

23. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision 60(2), 91–110 (2004). [CrossRef]  

24. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in International Conference on Computer Vision (ICCV), (2011), pp. 2564–2571.

25. P. F. Alcantarilla and T. Solutions, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” IEEE T. Pattern Anal. 34, 1281–1298 (2011).

26. V. Balntas, E. Riba, D. Ponsa, and K. Mikolajczyk, “Learning local feature descriptors with triplets and shallow convolutional neural networks,” in British Machine Vision Conference (BMVC), (2016), pp. 3–4.

27. A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbor’s margins: Local descriptor learning loss,” in Advances in Neural Information Processing Systems (NIPS), (2017), pp. 4826–4837.

28. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2018), pp. 224–236.

29. A. Krizhevsky, “The cifar-10 dataset,” https://www.cs.toronto.edu/kriz/cifar.html.

Data availability

The training data of the multi-scale feature extraction network comes from Ref. [29], and the data used in the feature point matching experiment can be found in Ref. [21,22].

29. A. Krizhevsky, “The cifar-10 dataset,” https://www.cs.toronto.edu/kriz/cifar.html.

21. yyangynu, Small uav image registration dataset, https://github.com/yyangynu/SUIRD.

22. R. C. Daudt, “Onera satellite change detection dataset,” https://rcdaudt.github.io/oscd/.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1. The reported agile wide-field imaging framework with selective high resolution. It is constructed of a reference camera to acquire wide-field images, and a dynamic camera to automatically and adaptively track regions of interest and acquire high-resolution images. The entire system has only 1181 gram weight, with 120$^{\circ }$ wide field of view (FOV) and selective 0.45$mrad$ instantaneous FOV.
Fig. 2.
Fig. 2. The workflow of the reported wide-field and high-resolution imaging technique. The reference camera is employed to take wide-field images $I_x$ with a certain low resolution, and the dynamic camera is to acquire HR images $I_y$ of ROI. A deep-learning-based algorithm is employed to register the different-scale images, in order to find the location relationship between the two fields of view of the two cameras. The algorithm first extracts multi-scale feature descriptors from the output of multiple different intermediate network layers, and then calculates the feature point distance matrix based on these feature descriptors. Under a bidirectional feature point matching strategy, high-accuracy feature point matching pairs are obtained that produce the homography transformation matrix of the two input images. The coordinate $Y$ of $I_y$ in $I_x$ is calculated using the homography transformation matrix, and input to the gimbal to update the field of view. Benefiting from the negative feedback mechanism, the entire system enables continuous wide-field and HR imaging.
Fig. 3.
Fig. 3. Exemplar feature matching results of different techniques, with the scale difference between two input images being 64. The yellow lines mark the correct matching pairs, and the blue ones mark the wrong matching pairs.
Fig. 4.
Fig. 4. Exemplar feature matching results under different white balance parameters of the input two images, with the scale difference fixed to be 16. The image pairs of $\#1$ and $\#2$ are from the OSCD dataset that were acquired at different time. The image pairs of $\#3$ and $\#4$ are from the SUIRD dataset, which simulates the situation under large white balance difference. The yellow lines mark the correct matching pairs, and the blue ones mark the wrong matching pairs.
Fig. 5.
Fig. 5. Ground experiment results using the prototype system. We employed the prototype to capture images of our campus. The results validate that the setup is effective to capture wide-field images, with selective high resolution on the regions of interests.
Fig. 6.
Fig. 6. Air-to-ground monitoring experiment results using the prototype setup mounted on a drone. We select a railway as ROI. The reference camera takes wide-field LR images, and the dynamic camera enables continuous HR imaging of the railway using the reported multiscale image registration technique.

Tables (2)

Tables Icon

Table 1. Feature matching accuracy and running time of different techniques for different scale differences.

Tables Icon

Table 2. Feature matching accuracy and running time of different techniques for different spectrum under different scales.

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

d ( x , y ) = 2 d 1 ( x , y ) + 2 d 2 ( x , y ) + d 3 ( x , y ) ,
d i ( x , y ) = E u c l i d e a n ( D i ( x ) , D i ( y ) ) .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.