Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Deep learning for the design of 3D chiral plasmonic metasurfaces

Open Access Open Access

Abstract

Chiral plasmonic metasurfaces are promising for enlarging the chiral signals of biomolecules and improving the sensitivity of bio-sensing. However, the design process of the chiral plasmonic nanostructures is time consuming. Deep learning has been playing a key role in the design of photonic devices with high time efficiency and good design performance. This paper proposes a deep neural network (DNN) to achieve forward prediction and inverse design for 3D chiral plasmonic metasurfaces, and further improve the training speed and performance by the transfer learning method. Once the DNNs are trained using a part of the sampled data from the parameter space, the circular dichroism (CD) spectra can be predicted within the time on milliseconds (about 3.9 ms for forward network and 5.6 ms for inverse network) with high prediction accuracy. The inverse design was optimized by taking more spectral information into account and extracting the critical features using the one-dimensional convolutional kernel. The aforementioned trained network for one handedness can accelerate the training speed and improve performance with small datasets for the opposite handedness via the transfer learning method. The proposed approach is instructive in the design process of chiral plasmonic metasurfaces and could find applications in exploring versatile complex nanophotonic devices efficiently.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Chirality universally exists in nature and is inherently essential for life because most molecular building blocks, such as amino acids, proteins and DNA, are chiral [1]. The interaction process of impinging circularly polarized light with opposite handedness is different for chiral objects, leading to circular dichroism (CD), in other words, distinctive absorption for left-handed circularly (LCP) and right-handed circularly polarized (RCP) light. CD can be monitored by a commercial CD spectrometer and is regarded as an essential indicator to evaluate the configurations of chiral substances as well as their concentrations in solutions [2]. However, natural chiral media usually perform low CD signals hindered by their slight chiral asymmetry and small electromagnetic interaction volume. Thus it is crucial to enhance or magnify their chiroptical responses. It is also attractive to control and tailor interaction between circularly polarized light and chiral media [35]. The issues mentioned above can be addressed by coupling these naturally chiral molecules to artificially engineered plasmonic nanostructures under resonant excitation [68].

Due to the ability to boost and control optical response nearly at will, chiral plasmonic metasurfaces, which consist of sub-wavelength metallic meta-atoms and enable mimicking the features of chiral molecules and even enlarging their response, have attracted significant attention. Up to date, chiral plasmonic systems with different shapes, including G-shaped [9], Gammadion [10], helices [11], twisted arcs [12] etc., have been reported. The trial employing a chiral plasmonic platform to enhance the CD of chiral molecules was reported in [7], which provides the possibility of using near-infrared light to detect molecules with intrinsic CD in the ultraviolet region. A typical chiral plasmonic metasurface, consisting of bilayer corner-stacked orthogonally arranged metallic nanorods, possesses a considerable chiroptical response [13,14]. Moreover, its chiroptical properties can be well interpreted by the so-called Born-Kuhn model, which treats the free electron oscillation as the motion of coupled springs. The analytical model might help understand the light-matter interaction better when using the 3D metallic meta-atoms to sense chiral molecules. However, despite the well-known physics of the chiral metasurfaces, it is still time-consuming and even hard to find proper structural parameters for satisfactory chiroptical performances that hinders further applications.

The emergence of artificial intelligence (AI) has provided an efficient tool to design the structure of metasurfaces [15], and heuristic algorithms have been investigated [1619]. Machine learning (ML) in the AI field is proved to be excellent in many areas, such as image recognition [20], natural language processing [21], and wireless communication [22,23]. Compared to conventional numerical simulation with commercial software that takes a long time, ML can build a model from the collected dataset and predict the result within milliseconds of high accuracy. There have already emerged some researches where kinds of neural networks design photonic devices, ranging from multilayer perceptron (MLP) [2430] to deep neural network (DNN) [3134], and convolutional neural networks (CNN) [3542]. Moreover, a tandem network has been applied to design the silicon nanostructure for accurate colour design [43]. Auto-encoder [44] and generative adversarial network (GAN) [45,46] have been used to generate new structures via a random vector. Several studies have revealed that ML is also helpful in designing chiral metasurfaces. The spectrum forward prediction and inverse design of the 3D chiral plasmonic metamaterials are achieved by building a multitask deep learning model [47]. A deep-learning-based on-demand design model is proposed to design the chiral metamaterials as well [48]. The deep learning model is also used to efficiently predict the CD response of higher-order diffracted beams of 2D chiral metamaterials [49,50]. The data enhancement algorithm is proposed to comprehensively investigate the CD properties in the higher-order diffracted patterns of 2D chiral metamaterials [51]. A few investigations have focused on the inverse design problem empowered by transfer learning. Transfer learning is firstly applied in physical scenarios [52]. A new inverse design paradigm for metasurfaces via transfer learning is proposed in [53]. However, the DNN-based inverse design model of chiral plasmonic metasurfaces succumbs to low accuracy because capturing the relationship of CD response and structure is challenging for most DL models and sometimes need the help of an auxiliary network [48] or extra calculation of other loss functions [47], which adds the complexity of the model. Also, it is time and resource-intensive during the generation of the training dataset via traditional simulation methods. The inefficient collection of the dataset hinders the study of optical chirality of chiral plasmonic metasurfaces because the deep learning training process requires a large amount of data. Therefore, further exploration of deep learning algorithms in terms of architectural richness, adaptability, and universality is still highly desired. In addition, chiral metasurfaces provide an intriguing platform to test the validation of the transfer learning method, considering the similar physics but distinctive properties of two enantiomers. It is also beneficial to use transfer learning to accelerate the training speed while reducing the training data numbers in designing chiral meta-atoms.

In this paper, we build DNNs as an efficient tool to predict the CD spectra of our 3D Born-Kuhn type chiral metasurfaces and generate the geometry parameters according to the input spectra. In the inverse design process, our work considers more discrete spectra data (i.e. the transmission spectra) to address the non-uniqueness problem: many combinations of different geometric parameters can generate similar CD spectra, and the inverse design network can hardly converge. Using the transmission spectra and CD spectra together makes the network more straightforward and the training process more accessible because a pre-trained forward network is no more necessary rather than a must for the tandem network [43]. We also explore the feature extraction ability of CNN in our work and compare the results of different network structures. Besides, transfer learning is adopted to accelerate the training speed of the network and reduce the size of train datasets while keeping excellent performances.

Our model includes two parts. The forward prediction deep neural network (FDNN) comprises fully collected layers; the inverse design deep neural network (IDNN) utilizes the convolutional layers to extract the essential features and end with fully collected layers. The transfer learning is applied to facilitate the building of the model used to design the other enantiomer with opposite handedness.

2. Metasurface structure and simulation method

The designed plasmonic metasurfaces are periodic structures with unit cells consisting of corner-stacked orthogonal gold nanorods arranged in ${C_4}$ symmetry, as shown in Fig. 1. The $C_4$ symmetry of the structures could avoid linear birefringence, so that unwanted polarization conversion does not appear. The LCP and RCP light impinges on the periodic arrays and the corresponding transmittance ${T_{LCP}}$ and $T_{RCP}$ is numerically calculated. The CD is defined as $CD = {T_{LCP}} - {T_{RCP}}$.

 figure: Fig. 1.

Fig. 1. Schematic diagram of the 3D chiral plasmonic metasurfaces with (a) The periodic structure, unit cell and circular dichroism (CD) definition, as well as (b) the left-handed enantiomer and the right-handed enantiomer. LCP and RCP represent left-handed and right-handed circularly polarized light. Structure parameters are also shown (P: period, G: gap, D: distance, L: length, W: width).

Download Full Size | PDF

There are five parameters to determine the detailed structures of the left-handed (LH) and right-handed (RH) chiral metasurfaces. They are the width $W$ of the nanorods, the length $L$ of the nanorods, the distance $D$ between the two gold layers (top of the lower layer and bottom of the upper layer), the period $P$ of the unit cell, and the gap $G$ between adjacent nanorods along the period direction. The height of the nanorods is fixed to be 40 nm to simplify the dataset collection process. To guarantee the manufacturability of the structure, we have pre-defined the range of the dimensional parameters, as shown in Table 1. A dielectric spacer layer with a refractive index of 1.3 covers the gold nanorods. The dielectric constant of gold rods is based on measurement from Johnson and Christy [54]. The incident beam is perpendicular to the meta-structures. For generating dataset, we used finite different time-domain (FDTD) method to perform numerical simulation.

Tables Icon

Table 1. Structural Parameters Range

3. Deep learning for forward prediction

The deep learning network architecture used to achieve the forward prediction task is shown in Fig. 2. The model includes an input layer, hidden layers with batch normalization, and an output layer. The input layer has five neurons corresponding to the five-dimensional parameters of our chiral metasurface, and the output layer has five hundred neurons to represent the dispersed points of the CD spectrum. Each of the hidden layers has four hundred neurons, and the activation function is Leaky Rectified Linear Unit (Leaky-ReLU) [55]. The problem of forward prediction is actually a regression problem, and thus the loss function is chosen to be mean-square error (MSE) loss. The MSE loss can be illustrated as follows.

$$\begin{array}{l} MSE = \frac{1}{N}\sum\limits_{i = 1}^{N} {{{({y_i} - {{\bar y}_i})}^{2}}} \end{array}$$
where $N$ is the number of a batch, FC means the fully collected layer, $y_{i}$ is the target label of the training data, and ${\bar y_i}$ is the CD predicted by the neural network.

 figure: Fig. 2.

Fig. 2. The architecture of the forward prediction network (a deep neural network with four hidden layers)

Download Full Size | PDF

Twenty-three thousand datasets, including the dimensional parameters, transmission spectra, and the corresponding CD spectra, are collected. We split the datasets into three parts: the training, the validation, and the testing data. The training data are used to train the neural network to fit a nonlinear model, and the validation data are used to test the model’s performance. To eliminate the effect of the different length scales of dimensional parameters on the generation of the model, such as internal covariate shift, the batch-normalization technology [56] is adopted, and it obeys the following relation.

$$\begin{array}{l} {x_i} = \gamma \frac{{{x_i} - {\mu _b}}}{{\sqrt {\delta _b^{2} + \varepsilon } }} + \beta \end{array}$$
where $x_i$ is the i-th input parameter, ${\mu _b}$ is the mean-value of a batch, $\delta _b^{2}$ is the variance of the batch, and $\varepsilon$ is a small value to guarantee the denominator is non-zero. $\gamma$ and $\beta$ are hyper-parameters to scale and shift the input.

We firstly choose the LH enantiomer to verify the performance of the forward prediction model. The MSE loss of the training data set can finally reach 0.0002, as shown in Fig. 3 (a). The training loss indicates that the neural network can learn the regulation behind the training set. To validate the generality of the trained model, the validation dataset that the network has never learned but has the same distribution as the training dataset is fed to the network, and then its MSE loss is calculated. The validation loss is below 0.00016 after 1900 epochs, even smaller than the case in the training dataset, as shown in Fig. 3 (b). To further test the model, a group of dimensional parameters is taken as the input of the neural network, and the CD spectra can be accordingly obtained. With the forward network it only takes about 3.9 ms to predict a CD spectrum. The CD spectra predicted by the neural network and the truth labels for two cases are shown in Fig. 4 (a) and (b). The blue dots are the prediction CD spectrum, and the red curve is the truth CD spectrum simulated by FDTD. The prediction result of the network is very similar to the result simulated by FDTD when the CD spectra do not contain very sharp spectral profiles, as shown in Fig. 4(a). Nevertheless, the network cannot perfectly predict the CD spectra when the CD spectra include the Rayleigh anomaly [57], which is caused by the grating diffraction, marked by the dashed vertical line in Fig. 4 (b). The reason is that learning the relationship of the structure and the sharp feature is more difficult for the network, and it is not easy to coincide with the label perfectly.

 figure: Fig. 3.

Fig. 3. (a) The MSE loss of the training dataset, the blue curve is the mean MSE value of batches in every epoch, and the background shadow is the variation range of the MSE within these batches in the corresponding epoch. (b) The MSE loss of the validation dataset. Insets are local magnification.

Download Full Size | PDF

 figure: Fig. 4.

Fig. 4. The CD spectrum predicted by DNN. (a) The structure parameters fed to the network are (D: 20 nm, L: 150 nm, W: 50 nm, G: 40 nm, P: 420 nm) (b) The structure parameters fed to the network are (D: 20 nm, L: 200 nm, W: 90 nm, G: 20 nm, P: 580 nm)

Download Full Size | PDF

4. Deep learning for inverse design

On the other hand, for broad interests in structure design and fabrication research, a standard tool is still needed to achieve the inverse design of chiral metasurfaces. In this case, CD spectra with wavelengths ranging from 700 nm to 1400 nm are launched toward the DNN network. Usually, we expect a specific structure including $D$, $L$, $W$, $G$, and $P$ that can realize the target CD spectrum. If we can input the targeted CD spectrum together with corresponding transmittance spectrum and then output the corresponding geometric parameters directly, many practical applications can be more easily carried out. We first design a fully collected DNN by using the CD spectrum as the content of the input layer and dimensional parameters of LH chiral metasurface as the content of the output layer. The architecture of this inverse design network is the same as the forward prediction network. Unfortunately, the training result is terrible. The reason is twofold. Firstly, the key features included in our training dataset are small. Because the CD is calculated by the transmittance of LCP and RCP light, which is illustrated as $CD = {T_{LCP}} - {T_{RCP}}$, we need to utilize these transmission data to enhance the features of our training dataset. Secondly, the neural network is not deep enough to accomplish the complex and nonlinear task, but too deep fully connected layers will cause the problem of gradient vanishing and increase the model complexity. In order to extract the key features of the dataset with a deeper DNN, the convolutional neural network is adopted. As the standard 2-D convolution kernel that is used in the image processing field cannot be directly used for the 1-D format input, there are two possible solutions. One is to sort and arrange the 1-D input data as a 2-D graph, as explained in [22]. This method is not suitable for our problem because it changes the position relation information of our data, and it is not necessary to covert the 1-D spectral data to a 2-D image. We adopt the other method, namely, using the 1-D convolutional kernel usually employed in the speech recognition field to replace the 2-D kernel. The proposed 1D-CNN inverse design model is shown in Fig. 5. The 1D-CNN model includes ten layers, namely one input layer, three convolution layers, five FC layers, and one output layer. The predesigned filter is constructed using three convolution layers and is used to capture, in an effective manner, the key features and characteristics of the input data. In each Conv1-D layer, K stands for the kernel size, and C is the number of channels. Each Conv1-D layer is followed by a ReLU activation function, a max-pooling layer (Pool), and a batch normalization (BN) layer. A ReLU and a BN layer follow each fully collected layer. The CD spectra and transmittance spectra with LCP and RCP illumination are regarded as the channels of the input, and the input data of the neural network have an array size of $500 \times 1 \times 3$. Due to the data dimensionality reduction thanks to the convolution kernel in the predesigned filter structure, the input information can be extracted at a small network scale. The dimension of each convolution kernel can be designed to yield low computation complexity while maintaining a good prediction performance of the model. After the predesigned filter layer, the FC layers are used to integrate the features. The final output layer consists of five neurons with a linear function corresponding to the geometry parameters of the chiral metasurface. As shown in Fig. 6, the predesigned filter layer transforms the input vector into a shorter feature vector. This is accomplished by convolving the input data with the local convolution kernels and adding bias parameters to generate the corresponding local features.

 figure: Fig. 5.

Fig. 5. The CNN architecture for inverse design

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. 1-D convolution diagram

Download Full Size | PDF

We generate 23000 sets of LH training data with assigned dimensional parameters ($D$, $L$, $W$, $S$, $P$). The whole dataset is divided into three subsets in the forward prediction task: training dataset, validation dataset, and testing dataset. The mean squared error (MSE) is the performance indicator of the DNN as the loss function during training and as the error function during validation. The loss function is minimized by gradient descent using the Adam optimization algorithm [58] in the training process, and the learning rate is set as 0.0001. The training sets are fed to the network with a batch size of 128. We compare the 1D-CNN, Tandem-NN, FC-DNN, and the FC-DNN without transmittance spectra as the input (FC-DNN/CD). In the FC DNN structures, the convolutional layers are replaced by three linear layers with the node numbers 2000, 1000, and 480, respectively. The ReLU and BN layers are maintained after each linear layer. The parameters are chosen based on the considerations to make the FC DNN balanced. In the Tandem-NN structures, the output of the inverse network is connected to the pre-trained forward network to guarantee uniqueness. The ReLU, Pool, and BN layers are maintained after each convolution layer. The total trainable parameters of the FC DNN are 6,248,445. In contrast, the total trainable parameters of the 1D-CNN are 759,437. We compare the 1D-CNN, Tandem-NN, FC-DNN, and the FC-DNN without transmittance spectra (FC-DNN/CD) as the input. The training losses and validation losses are obtained and shown in Fig. 7 (a) and (b), respectively.

 figure: Fig. 7.

Fig. 7. (a) The training loss of the 1D-CNN, FC-DNN, Tandem-NN, and FC-DNN with fewer features. The designed chiral metasurface is LH, (b) The validation loss of the 1D-CNN, FC-DNN, Tandem-NN, and FC-DNN with fewer features. The designed chiral metasurface is LH. Insets are local magnification.

Download Full Size | PDF

We can see that the additional features from the transmittance spectra greatly influence the performance of the network. The 1D-CNN is better than the Tandem-NN and the FC-DNN without transmittance spectra. The 1D-CNN is also slightly better than the FC-DNN, while the training parameters are far less than the FC-DNN. One thousand groups of spectra of the test dataset are fed into the 1D-CNN, FC-DNN, Tandem-DNN, and the FC-DNN without transmittance spectra. The average prediction errors of structure parameters compared to the truth values are calculated. As shown in Fig. 8, the errors of 1D-CNN and FC-DNN are tiny, and the training time of the 1D-CNN is only one half of the FC-DNN. Note that all four networks could produce acceptable geometrical errors regarding actual fabrication deviation on the order of 10-20 nm for conventional electron-beam lithography technique. Nevertheless, the error comparison between different networks might be an important hint towards applications of precise manufacturing and sensitive sensing.

 figure: Fig. 8.

Fig. 8. The average prediction error of 1D-CNN, FC-DNN, Tandem-NN, and the FC-DNN without transmittance spectra (FC-DNN/CD). One thousand results were collected. (b) The training time for 2000 epochs of 1D-CNN, FC-DNN, Tandem-NN, and FC-DNN without transmittance spectra (FC-DNN/CD)

Download Full Size | PDF

Figure 9(a) shows a CD profile serving as the target spectrum (red dotted curve, denoted as input), obtained from FDTD simulation of specific geometric parameters, shown in Fig. 9(c) (true values, purple bars). Upon feeding the CD spectrum into the inverse network, the 1D-CNN spends about 5.9 ms retrieving the geometric parameters in Fig. 9(c) (orange bars). Although it is clear that the retrieved geometrical values are very close to the ground-truth, we proceed to verify the inverse design by inputting these retrieved geometric parameters into the FDTD and comparing the CD response [blue curve in Fig. 9(a)] with the target one, so as to confirm the success of the inverse network. Figure 9(b) and 9(d) were generated following the same procedure. Generally, there is a good agreement between target spectra and those from retrieved dimensional parameters, although we can observe slight differences in the spectra that cohere with the minor differences between the sets of retrieved and ground-truth geometric parameters.

 figure: Fig. 9.

Fig. 9. The results of the 1D-CNN. (a) Comparison between the CD spectra from retrived (denoted as 1D-CNN) and true (denoted as Input) geometrical parameters. (b) Comparison between the CD spectra from retrived (denoted as 1D-CNN) and true (denoted as Input) geometrical parameters. Different from (a), the input CD spectrum includes sharp feature arising from Rayleigh anomaly. (c) Comparison between the retrieved parameters by 1D-CNN (D: 30 nm, L: 181 nm, W: 30 nm, G: 41 nm, P: 503 nm, orange bar) and the true parameters (D: 30 nm, L: 180 nm, W: 30 nm, G: 40 nm, P: 500 nm, purple bar). (d) Comparison between the retrieved parameters by 1D-CNN (D: 40 nm, L: 200 nm, W: 40 nm, G: 68 nm, P: 608 nm, orange bar) and the true parameters (D: 40 nm, L: 200 nm, W: 40 nm, G: 70 nm, P: 610 nm, purple bar).

Download Full Size | PDF

5. Transfer learning for the design of chiral metasurfaces with opposite handedness

The transfer learning technology is adopted to train the neural network for similar chiral metasurfaces and can improve the network performance and speed up the training process even if their datasets are small. The illustration of the transfer learning is shown in Fig. 10. We use the LH and RH chiral metasurfaces to illustrate transfer learning performance. After the neural network for LH is trained, the parameters of the network for LH (source model) is transferred to the network for RH (target model) as the initialization of the target model. It should be noted that the source neural network and the target neural network should have the same structure, including the number of hidden layers and the number of neurons. There are forward prediction and inverse design networks, so the transfer between the forward networks and the transfer between the inverse networks are investigated respectively.

 figure: Fig. 10.

Fig. 10. The illustration of the transfer learning.

Download Full Size | PDF

The forward prediction network is a fully collected structure, so the number of the transferred parameters is of significance in the performance of the target network. There are four hidden fully collected layers named from FC 1to FC 4, as shown in Fig. 2. The parameters of the source model can be totally or partly transferred to the target model. By freezing specific layers and adjusting the parameters of the remaining layers from the source network to the target network, we can investigate the influence of the number of transferred layers. Table 2 and Table 3 show the relationship between the performance of the network (represented by the loss) and the number of the frozen layers. We can find that the network can achieve the best performance when freezing only the first hidden layer FC 1 than in other cases (same with adjusting all the parameters from the source network), see Table 2. Especially for the forward network, we find that a good performance can be obtained when transferring the model for LH to the network for RH and freezing all of them, and then adding one fully collected layer before the output layer, where the loss can finally reach 0.00011. With transfer learning, we can quickly train an excellent model with half of the original data. Hence, it only takes half the time to train the network to achieve the same effect in comparison to no transfer situation.

Tables Icon

Table 2. Relationship Between Loss Value and the Number of Frozen Layers

Tables Icon

Table 3. Relationship Between Loss Value and the Frozen Layer When Freezing a Single Layer

Then we froze the first layer of the target network and adjusted the parameters of the remaining layers. Figure 11(a) and (b) show the training loss function and the validation loss function of the network for RH with (or without) transfer parameters from the network for LH. Compared with the training process without transfer learning, the loss value of the network with transfer parameters starts from a smaller value, and more importantly, it can finally reduce fast to a smaller value which indicates the improvement of the performance. Besides, in our case, the training dataset for RH is only half of the dataset for LH.

 figure: Fig. 11.

Fig. 11. (a) The training loss of the forward network for RH with and without transfer parameters from the network for LH, (b) The validation loss of the forward network for RH with and without transfer parameters from the network for LH. Insets are local magnification.

Download Full Size | PDF

The inverse design network includes two parts, i.e., the convolution and the fully collected layers. The convolution layers extract the key features from the input data. Due to the characteristics of the weight sharing, after transferring the parameters from the source network, we can freeze the parameters of CNN as non-trainable and train only the parameters of the fully collected layers. Figure 12 (a) shows the convergence curve of the 1D-CNN model for the RH training process. The training loss value of the network with transfer learning converges faster than the loss of the network without transfer learning, and the final loss is also smaller than that of the network without transfer. In addition, Fig. 12 (b) indicates that the network with transfer learning can perform better than the network without transfer learning in terms of MSE loss for an unfamiliar dataset.

 figure: Fig. 12.

Fig. 12. (a) The training loss of the inverse network for RH with and without transfer learning. (b) The validation loss of the inverse network for RH with and without transfer learning. Insets are local magnification.

Download Full Size | PDF

6. Conclusion

In conclusion, we have demonstrated DNN methodology, including forward prediction and inverse design models, to help design the Born-Kuhn archetypical 3D chiral metasurfaces. Given the structure parameters, the forward prediction model uses a fully collected DNN to predict the CD spectra. The inverse design model uses a 1D-CNN to generate directly the structural parameters given a target spectrum. We additionally explore the transfer learning technology to further promote the training performance with smaller datasets and less time. Our proposed forward model can predict the CD responses of chiral metasurfaces in an ultrafast (about 3.9 ms), highly efficient, and accurate manner, which dramatically reduces the computational resources spent on numerically solving the electromagnetics equations regarding optical chirality and switches this solution into a data-driven approach. The inverse network can output appropriate structural parameters within 5.6 ms when giving the target spectra. The transfer learning can accelerate the training speed and reduce the training dataset while guarantying a better performance than no transfer condition. The insights gained from this study may assist the intelligent design of other chiral metasurfaces and nanophotonic devices.

Funding

National Key Research and Development Program of China (2018YFB2201803); National Natural Science Foundation of China (61625104, 61705015, 61905018); Beijing Nova Program of Science and Technology (Z191100001119110); Fundamental Research Funds for the Central Universities; Fund of State Key Laboratory of Information Photonics and Optical Communications (Beijing University of Posts and Telecommunications) of China (IPOC2020ZT08, IPOC2021ZR02).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data Availability

Data presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. X. T. Kong, L. V. Besteiro, Z. M. Wang, and A. O. Govorov, “Plasmonic chirality and circular dichroism in bioassembled and nonbiological systems: Theoretical background and recent progress,” Adv. Mater. 32(41), 1801790 (2020). [CrossRef]  

2. M. Hentschel, M. Schaferling, X. Y. Duan, H. Giessen, and N. Liu, “Chiral plasmonics,” Sci. Adv. 3(5), 12 (2017). [CrossRef]  

3. M. J. Huttunen, G. Bautista, M. Decker, S. Linden, M. Wegener, and M. Kauranen, “Nonlinear chiral imaging of subwavelength-sized twisted-cross gold nanodimers invited,” Opt. Mater. Express 1(1), 46–56 (2011). [CrossRef]  

4. X. A. Wang and Z. Y. Tang, “Circular dichroism studies on plasmonic nanostructures,” Small 13(1), 1601115 (2017). [CrossRef]  

5. A. Cecconello, L. V. Besteiro, A. O. Govorov, and I. Willner, “Chiroplasmonic dna-based nanostructures,” Nat. Rev. Mater. 2(9), 17039 (2017). [CrossRef]  

6. J. M. Slocik, A. O. Govorov, and R. R. Naik, “Plasmonic circular dichroism of peptide-functionalized gold nanoparticles,” Nano Lett. 11(2), 701–705 (2011). [CrossRef]  

7. A. O. Govorov, Z. Y. Fan, P. Hernandez, J. M. Slocik, and R. R. Naik, “Theory of circular dichroism of nanomaterials comprising chiral molecules and nanocrystals: Plasmon enhancement, dipole interactions, and dielectric effects,” Nano Lett. 10(4), 1374–1382 (2010). [CrossRef]  

8. B. M. Maoz, Y. Chaikin, A. B. Tesler, O. Bar Elli, Z. Y. Fan, A. O. Govorov, and G. Markovich, “Amplification of chiroptical activity of chiral biomolecules by surface plasmons,” Nano Lett. 13(3), 1203–1209 (2013). [CrossRef]  

9. V. K. Valev, A. V. Silhanek, N. Verellen, W. Gillijns, P. Van Dorpe, O. A. Aktsipetrov, G. A. E. Vandenbosch, V. V. Moshchalkov, and T. Verbiest, “Asymmetric optical second-harmonic generation from chiral g-shaped gold nanostructures,” Phys. Rev. Lett. 104(12), 127401 (2010). [CrossRef]  

10. S. M. Chen, F. Zeuner, M. Weismann, B. Reineke, G. X. Li, V. K. Valev, K. W. Cheah, N. C. Panoiu, T. Zentgraf, and S. Zhang, “Giant nonlinear optical activity of achiral origin in planar metasurfaces with quadratic and cubic nonlinearities,” Adv. Mater. 28(15), 2992–2999 (2016). [CrossRef]  

11. M. Esposito, V. Tasco, M. Cuscuna, F. Todisco, A. Benedetti, I. Tarantini, M. De Giorgi, D. Sanvitto, and A. Passaseo, “Nanoscale 3d chiral plasmonic helices with circular dichroism at visible frequencies,” ACS Photonics 2(1), 105–114 (2015). [CrossRef]  

12. Y. H. Cui, L. Kang, S. F. Lan, S. Rodrigues, and W. S. Cai, “Giant chiral optical response from a twisted-arc metamaterial,” Nano Lett. 14(2), 1021–1025 (2014). [CrossRef]  

13. X. H. Yin, M. Schaferling, B. Metzger, and H. Giessen, “Interpreting chiral nanophotonic spectra: The plasmonic born-kuhn model,” Nano Lett. 13(12), 6238–6243 (2013). [CrossRef]  

14. L. Gui, M. Hentschel, J. Defrance, J. Krauth, T. Weiss, and H. Giessen, “Nonlinear born-kuhn analog for chiral plasmonics,” ACS Photonics 6(12), 3306–3314 (2019). [CrossRef]  

15. X. Luo, M. Pu, Y. Guo, X. Li, and X. Ma, “Electromagnetic architectures: Structures, properties, functions and their intrinsic relationships in subwavelength optics and electromagnetics,” Adv. Photonics 2(10), 2100023 (2021). [CrossRef]  

16. Z. H. Liu, X. H. Liu, Z. Y. Xiao, C. C. Lu, H. Q. Wang, Y. Wu, X. Y. Hu, Y. C. Liu, H. Y. Zhang, and X. D. Zhang, “Integrated nanophotonic wavelength router based on an intelligent algorithm,” Optica 6(10), 1367–1373 (2019). [CrossRef]  

17. C. C. Lu, Z. H. Liu, Y. Wu, Z. Y. Xiao, D. Y. Yu, H. Y. Zhang, C. Y. Wang, X. Y. Hu, Y. C. Liu, X. H. Liu, and X. D. Zhang, “Nanophotonic polarization routers based on an intelligent algorithm,” Adv. Opt. Mater. 8, 9 (2020).

18. P. H. Fu, S. C. Lo, P. C. Tsai, K. L. Lee, and P. K. Wei, “Optimization for gold nanostructure-based surface plasmon biosensors using a microgenetic algorithm,” ACS Photonics 5(6), 2320–2327 (2018). [CrossRef]  

19. J. C. C. Mak, C. Sideris, J. Jeong, A. Hajimiri, and J. K. S. Poon, “Binary particle swarm optimized 2 x 2 power splitters in a standard foundry silicon photonic platform,” Opt. Lett. 41(16), 3868–3871 (2016). [CrossRef]  

20. K. Yun, A. Huyen, and T. Lu, Deep neural networks for pattern recognition, arXiv preprint arXiv:1809.09645 (2018).

21. J. Hirschberg and C. D. Manning, “Advances in natural language processing,” Science 349(6245), 261–266 (2015). [CrossRef]  

22. X. Hu, Z. Liu, X. Yu, Y. Zhao, W. Chen, B. Hu, X. Du, X. Li, M. Helaoui, W. Wang, and F. M. Ghannouchi, Convolutional neural network for behavioral modeling and predistortion of wideband power amplifiers, IEEE Trans. Neural Netw. Learn. Syst. pp. 1–15 (2021).

23. X. Liao, X. Hu, Z. Liu, S. Ma, L. Xu, X. Li, W. Wang, and F. M. Ghannouchi, “Distributed intelligence: A verification for multi-agent drl-based multibeam satellite resource allocation,” IEEE Commun. Lett. 24(12), 2785–2789 (2020). [CrossRef]  

24. Y. S. Chen, J. F. Zhu, Y. N. Xie, N. X. Feng, and Q. H. Liu, “Smart inverse design of graphene-based photonic metamaterials by an adaptive artificial neural network,” Nanoscale 11(19), 9749–9755 (2019). [CrossRef]  

25. A. M. Hammond and R. M. Camacho, “Designing integrated photonic devices using artificial neural networks,” Opt. Express 27(21), 29620–29638 (2019). [CrossRef]  

26. S. Inampudi and H. Mosallaei, “Neural network based design of metagratings,” Appl. Phys. Lett. 112(24), 241102 (2018). [CrossRef]  

27. D. J. Liu, Y. X. Tan, E. Khoram, and Z. F. Yu, “Training deep neural networks for the inverse design of nanophotonic structures,” ACS Photonics 5(4), 1365–1369 (2018). [CrossRef]  

28. Y. Long, J. Ren, Y. Li, and H. Chen, “Inverse design of photonic topological state via machine learning,” Appl. Phys. Lett. 114(18), 181105 (2019). [CrossRef]  

29. J. Peurifoy, Y. C. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B. G. DeLacy, J. D. Joannopoulos, M. Tegmark, and M. Soljacic, “Nanophotonic particle simulation and inverse design using artificial neural networks,” Sci. Adv. 4(6), 7 (2018). [CrossRef]  

30. T. Zhang, J. Wang, Q. Liu, J. Z. Zhou, J. Dai, X. Han, Y. Zhou, and K. Xu, “Efficient spectrum prediction and inverse design for plasmonic waveguide systems based on artificial neural networks,” Photonics Res. 7(3), 368–380 (2019). [CrossRef]  

31. I. Malkiel, M. Mrejen, A. Nagler, U. Arieli, L. Wolf, and H. Suchowski, “Plasmonic nanostructure design and characterization via deep learning,” Light Sci. Appl. 7(1), 60 (2018). [CrossRef]  

32. S. So, J. Mun, and J. Rho, “Simultaneous inverse design of materials and structures via deep learning: Demonstration of dipole resonance engineering using core-shell nanoparticles,” Acs Appl. Mater. Interfaces 11(27), 24264–24268 (2019). [CrossRef]  

33. M. H. Tahersima, K. Kojima, T. Koike-Akino, D. Jha, B. N. Wang, C. W. Lin, and K. Parsons, “Deep neural network inverse design of integrated photonic power splitters,” Sci. Rep. 9(1), 1368 (2019). [CrossRef]  

34. R. Unni, K. Yao, and Y. Zheng, “Deep convolutional mixture density network for inverse design of layered photonic structures,” ACS Photonics 7(10), 2703–2712 (2020). [CrossRef]  

35. S. An, B. Zheng, M. Y. Shalaginov, H. Tang, H. Li, L. Zhou, J. Ding, A. M. Agarwal, C. Rivero-Baleine, M. Kang, K. A. Richardson, T. Gu, J. Hu, C. Fowler, and H. Zhang, “Deep learning modeling approach for metasurfaces with high degrees of freedom,” Opt. Express 28(21), 31932–31942 (2020). [CrossRef]  

36. X. Han, Z. Fan, Z. Liu, C. Li, and L. J. Guo, “Inverse design of metasurface optical filters using deep neural network with high degrees of freedom,” InfoMat 3(4), 432–442 (2020). [CrossRef]  

37. K. Kojima, M. H. Tahersima, T. Koike-Akino, D. K. Jha, Y. Tang, Y. Wang, and K. Parsons, “Deep neural networks for inverse design of nanophotonic devices,” J. Light. Technol. 39(4), 1010–1019 (2021). [CrossRef]  

38. Y. Li, Y. Xu, M. Jiang, B. Li, T. Han, C. Chi, F. Lin, B. Shen, X. Zhu, L. Lai, and Z. Fang, “Self-learning perfect optical chirality via a deep neural network,” Phys. Rev. Lett. 123(21), 213902 (2019). [CrossRef]  

39. R. Lin, Y. Zhai, C. Xiong, and X. Li, “Inverse design of plasmonic metasurfaces by convolutional neural network,” Opt. Lett. 45(6), 1362–1365 (2020). [CrossRef]  

40. I. Sajedian, J. Kim, and J. Rho, “Finding the optical properties of plasmonic structures by image processing using a combination of convolutional neural networks and recurrent neural networks,” Microsystems Nanoeng. 5(1), 27 (2019). [CrossRef]  

41. J. Ma, Y. J. Huang, M. B. Pu, D. Xu, J. Luo, Y. H. Guo, and X. A. Luo, “Inverse design of broadband metasurface absorber based on convolutional autoencoder network and inverse design network,” J. Phys. D-Applied Phys. 53(46), 464002 (2020). [CrossRef]  

42. M. H. Liao, S. S. Zheng, S. X. Pan, D. J. Lu, W. Q. He, G. H. Situ, and X. Peng, “Deep-learning-based ciphertext-only attack on optical double random phase encryption,” Opto-Electron. Adv. 4(5), 200016 (2021). [CrossRef]  

43. L. Gao, X. Z. Li, D. J. Liu, L. H. Wang, and Z. F. Yu, “A bidirectional deep neural network for accurate silicon color design,” Adv. Mater. 31(51), 1905467 (2019). [CrossRef]  

44. Z. A. Kudyshev, A. V. Kildishev, V. M. Shalaev, and A. Boltasseva, “Machine-learning-assisted metasurface design for high-efficiency thermal emitter optimization,” Appl. Phys. Rev. 7(2), 021407 (2020). [CrossRef]  

45. Z. C. Liu, D. Y. Zhu, S. P. Rodrigues, K. T. Lee, and W. S. Cai, “Generative model for the inverse design of metasurfaces,” Nano Lett. 18(10), 6570–6576 (2018). [CrossRef]  

46. W. Ma, F. Cheng, Y. H. Xu, Q. L. Wen, and Y. M. Liu, “Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy,” Adv. Mater. 31(35), 1901111 (2019). [CrossRef]  

47. E. Ashalley, K. Acheampong, L. V. Besteiro, P. Yu, A. Neogi, A. O. Govorov, and Z. M. Wang, “Multitask deep-learning-based design of chiral plasmonic metamaterials,” Photonics Res. 8(7), 1213 (2020). [CrossRef]  

48. W. Ma, F. Cheng, and Y. M. Liu, “Deep-learning-enabled on-demand design of chiral metamaterials,” ACS Nano 12(6), 6326–6334 (2018). [CrossRef]  

49. Z. Tao, J. Zhang, J. You, H. Hao, H. Ouyang, Q. Yan, S. Du, Z. Zhao, Q. Yang, X. Zheng, and T. Jiang, “Exploiting deep learning network in optical chirality tuning and manipulation of diffractive chiral metamaterials,” Nanophotonics 9(9), 2945–2956 (2020). [CrossRef]  

50. Z. L. Tao, J. You, J. Zhang, X. Zheng, H. Z. Liu, and T. Jiang, “Optical circular dichroism engineering in chiral metamaterials utilizing a deep learning network,” Opt. Lett. 45(6), 1403–1406 (2020). [CrossRef]  

51. S. Y. Du, J. You, J. Zhang, Z. L. Tao, H. Hao, Y. H. Tang, X. Zheng, and T. Jiang, “Expedited circular dichroism prediction and engineering in two-dimensional diffractive chiral metamaterials leveraging a powerful model-agnostic data enhancement algorithm,” Nanophotonics 10(3), 1155–1168 (2021). [CrossRef]  

52. Y. R. Qu, L. Jing, Y. C. Shen, M. Qiu, and M. Soljacic, “Migrating knowledge between physical scenarios based on artificial neural networks,” ACS Photonics 6(5), 1168–1174 (2019). [CrossRef]  

53. R. Zhu, T. Qiu, J. Wang, S. Sui, C. Hao, T. Liu, Y. Li, M. Feng, A. Zhang, C. W. Qiu, and S. Qu, “Phase-to-pattern inverse design paradigm for fast realization of functional metasurfaces via transfer learning,” Nat. Commun. 12(1), 2974 (2021). [CrossRef]  

54. P. B. Johnson and R. W. Christy, “Optical constants of the noble metals,” Phys. Rev. B 6(12), 4370–4379 (1972). [CrossRef]  

55. B. Xu, N. Wang, T. Chen, and M. Li, Empirical evaluation of rectified activations in convolutional network, arxiv.org (2015).

56. S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arxiv :1502.03167v3 (2015).

57. V. G. Kravets, F. Schedin, and A. N. Grigorenko, “Extremely narrow plasmon resonances based on diffraction coupling of localized plasmons in arrays of metallic nanoparticles,” Phys. Rev. Lett. 101(8), 087403 (2008). [CrossRef]  

58. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv: 1412.6980 [cs.LG], (2017).

Data Availability

Data presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (12)

Fig. 1.
Fig. 1. Schematic diagram of the 3D chiral plasmonic metasurfaces with (a) The periodic structure, unit cell and circular dichroism (CD) definition, as well as (b) the left-handed enantiomer and the right-handed enantiomer. LCP and RCP represent left-handed and right-handed circularly polarized light. Structure parameters are also shown (P: period, G: gap, D: distance, L: length, W: width).
Fig. 2.
Fig. 2. The architecture of the forward prediction network (a deep neural network with four hidden layers)
Fig. 3.
Fig. 3. (a) The MSE loss of the training dataset, the blue curve is the mean MSE value of batches in every epoch, and the background shadow is the variation range of the MSE within these batches in the corresponding epoch. (b) The MSE loss of the validation dataset. Insets are local magnification.
Fig. 4.
Fig. 4. The CD spectrum predicted by DNN. (a) The structure parameters fed to the network are (D: 20 nm, L: 150 nm, W: 50 nm, G: 40 nm, P: 420 nm) (b) The structure parameters fed to the network are (D: 20 nm, L: 200 nm, W: 90 nm, G: 20 nm, P: 580 nm)
Fig. 5.
Fig. 5. The CNN architecture for inverse design
Fig. 6.
Fig. 6. 1-D convolution diagram
Fig. 7.
Fig. 7. (a) The training loss of the 1D-CNN, FC-DNN, Tandem-NN, and FC-DNN with fewer features. The designed chiral metasurface is LH, (b) The validation loss of the 1D-CNN, FC-DNN, Tandem-NN, and FC-DNN with fewer features. The designed chiral metasurface is LH. Insets are local magnification.
Fig. 8.
Fig. 8. The average prediction error of 1D-CNN, FC-DNN, Tandem-NN, and the FC-DNN without transmittance spectra (FC-DNN/CD). One thousand results were collected. (b) The training time for 2000 epochs of 1D-CNN, FC-DNN, Tandem-NN, and FC-DNN without transmittance spectra (FC-DNN/CD)
Fig. 9.
Fig. 9. The results of the 1D-CNN. (a) Comparison between the CD spectra from retrived (denoted as 1D-CNN) and true (denoted as Input) geometrical parameters. (b) Comparison between the CD spectra from retrived (denoted as 1D-CNN) and true (denoted as Input) geometrical parameters. Different from (a), the input CD spectrum includes sharp feature arising from Rayleigh anomaly. (c) Comparison between the retrieved parameters by 1D-CNN (D: 30 nm, L: 181 nm, W: 30 nm, G: 41 nm, P: 503 nm, orange bar) and the true parameters (D: 30 nm, L: 180 nm, W: 30 nm, G: 40 nm, P: 500 nm, purple bar). (d) Comparison between the retrieved parameters by 1D-CNN (D: 40 nm, L: 200 nm, W: 40 nm, G: 68 nm, P: 608 nm, orange bar) and the true parameters (D: 40 nm, L: 200 nm, W: 40 nm, G: 70 nm, P: 610 nm, purple bar).
Fig. 10.
Fig. 10. The illustration of the transfer learning.
Fig. 11.
Fig. 11. (a) The training loss of the forward network for RH with and without transfer parameters from the network for LH, (b) The validation loss of the forward network for RH with and without transfer parameters from the network for LH. Insets are local magnification.
Fig. 12.
Fig. 12. (a) The training loss of the inverse network for RH with and without transfer learning. (b) The validation loss of the inverse network for RH with and without transfer learning. Insets are local magnification.

Tables (3)

Tables Icon

Table 1. Structural Parameters Range

Tables Icon

Table 2. Relationship Between Loss Value and the Number of Frozen Layers

Tables Icon

Table 3. Relationship Between Loss Value and the Frozen Layer When Freezing a Single Layer

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

M S E = 1 N i = 1 N ( y i y ¯ i ) 2
x i = γ x i μ b δ b 2 + ε + β
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.