Artificial neural networks used to retrieve effective properties of metamaterials

Taavi Repän; Ramakrishna Venkitakrishnan; Carsten Rockstuhl; Carsten Rockstuhl

doi:10.1364/OE.427778

1. Introduction

With the advent of metamaterials, the optical responses available for applications have been dramatically expanded, and they are no longer limited to those offered by natural materials [1,2]. Metamaterials are (mostly) periodic structures with (deeply) subwavelength unit cells. These disparate length scales allow to spatially average the optical response of individual unit cells and to let the metamaterial act as if it would be homogeneous and characterized by a set of effective material properties. Such treatment puts metamaterials on an equal footing to natural materials and widens our range of materials when designing applications [3,4]. Examples of newly available optical properties are double negative (negative-index) media [5–7], hyperbolic metamaterials (exhibiting indefinite dispersion relation) [8,9], and many more [10–14].

An important part of metamaterial design is to obtain the effective material parameters. For some selected structures, this homogenization can be done analytically, as is the case for multilayers [15–17], wire media [18–20], or even resonator-based structures [21,22]. There are also techniques based on obtaining the effective material parameters by field averaging techniques, which do require numerical calculations to obtain fields in the unit cell, but the homogenization itself can be done analytically [23–27]. In many other cases, however, analytical approaches are not available. In these instances, computational parameter retrieval approaches are required, where the optical response of a representative system is calculated numerically (or even measured experimentally) at first. Then, constitutive relations that shall describe the metamaterial at the effective level are put forward and the explicit material parameters in these constitutive relations are adjusted such that the predicted optical response from the homogenised material matches that of the actual material [28,29].

Although many approaches exist for the parameter retrieval [30–33], perhaps the most practical is to exploit the complex reflection and transmission coefficients from a thin slab of the metamaterial. These coefficients can be extracted from numerical simulations of the structure, but in principle, they are also retrievable from experiments. It is considered a practical approach as metamaterials very often are available as thin films. In limited cases, the effective parameters can be retrieved analytically, e.g., in the case of inverting reflection and transmission coefficients of an isotropic medium for normal incidence. However, in general, this is not possible and thus numerical non-linear least-squares fitting must be used. Such would be the case for more advanced constitutive relations, as is the case for anisotropic and/or non-local media [28].

Moreover, the assignment of material parameters while restricting the consideration to a given constitutive relation is necessarily always an approximation of the actual response. There is no guarantee that the actual metamaterial obeys entirely the postulated constitutive relations. Indeed, very often the period is not much smaller than the wavelength but only smaller, leaving a trace thereof on the optical response. This effect of spatial dispersion can never be ruled out exactly. If it has a dominating impact, then even for a postulated isotropic material (e.g. spheres arranged on a cubical grid) the effective properties retrieved at normal incidence can only reproduce the optical response at normal incidence but fail at oblique incidence. As a consequence, one already seeks material parameters even for some simple examples that can explain the entire response in an optimal sense. This implies the application of such a non-linear least-squares fitting procedure even for rather simple materials. This least-squares fitting procedure, however, is rather tedious and resource consuming. Therefore, alternative approaches for the actual retrieval are currently needed.

Here, we suggest to use artificial neural networks (ANNs) for this purpose. In the past decade, aided by widely available computation power, the use of ANNs has undergone explosive growth. Following the initial breakthrough in image recognition problems, they have found use in a wide range of scientific disciplines, including nano-optics [34–38]. Indeed, the use of ANNs has also been discussed in the context of design and retrieval of effective properties of metamaterial structures [39,40]. The application of ANNs to the parameter retrieval is particularly appealing. First, the problem of retrieving effective properties appears in an identical setting many times, as many different geometrical structures can be considered to form the basic motive for a unit cell. Then, the efforts in training a neural network is worth to spend as the speed-up in the final application is expected to be tremendous. Moreover, the generation of data to train ANNs is rather simple, as no full wave simulations from the structured material are necessary (which would be a very time and resource consuming task). Only reflection and transmission coefficients predicted while assuming a given constitutive relation for the homogeneous material are sufficient as an input data to train the nets. This can be expensive for advanced constitutive relations, but in general the data is generated conveniently using Fresnel-type expressions. However, it must be pointed out that ANN-based parameter retrieval is limited to the parameter range of the training dataset. Thus some prior knowledge about the expected parameter range is usually needed. Otherwise, the parameter range for which the ANN is trained simply has to be sufficiently large to accommodate all possible material parameters at the effective level.

While working out the details of the application of ANNs for the parameter retrieval problem, we noticed however initially severe problems. The straightforward approach of training ANNs to predict effective medium parameters from reflection and transmission data fails due to non-uniqueness present in the underlying physical problem. That is a recurrent problem in the field of artificial intelligence. In the context of least-squares fitting, this manifests as additional local minima, leading to sub-optimal solutions for the fitting procedure. A straightforward workaround is to run the fitting procedure several times with different (random) initial guesses. This is usually enough to find the most optimal effective parameters. However, in this approach large amounts of computational time are spent to evaluate reflection and transmission coefficients by brute force. The issue is more pronounced in the case of retrieving parameters for more complex constitutive relations (e.g., non-local media [28,41–47]), for which numerical costs of evaluating the optical response can be significantly higher. Instead, ANNs promise a solution where part of these computational costs are performed off-line during the training phase. During the parameter retrieval, a trained ANN will predict effective parameters from the reflection and transmission response with just a single evaluation of the network. To overcome the problem non-uniqueness of the underlying physical problem, we propose to divide the whole parameter space into subspaces, such that in each subspace the mapping between material parameters and optical response is unique. Then, an ANN can be easily trained for each subspace, and later, during the retrieval, the effective parameters can be retrieved by evaluating a given optical response on all ANNs and choosing the most optimal result. Interestingly, there have been earlier reports of using multiple ANNs for improving parameter retrieval in ellipsometry applications [48], albeit without the treatment of the non-uniqueness aspect. Furthermore, we discovered that purely ANN-based parameter retrieval requires high computational costs to reach errors comparable to the least-squares approach. However, by using ANN prediction as initial guess for the least-squares optimizer, we were able to obtain results equivalent to the purely least-squares-based approach at a significantly reduced numerical cost.

As a model problem, we seek to find effective material parameters that reproduce given complex-valued reflection and transmission angular spectrum for TE polarized illumination. As the target geometry, we consider a homogeneous slab of thickness $h$, illustrated in Fig. 1(a). Although our approach is general and can easily be extended to cover models with more parameters, we will consider here an isotropic electromagnetic response from the slab, characterized by a dielectric permittivity $\varepsilon$ and a magnetic permeability $\mu$. The reflection and transmission coefficients for the slab can be analytically calculated using Fresnel formulas.

Fig. 1. (a) Geometry of the problem: we seek an equivalent homogeneous slab that reproduces, for a plane wave illumination, reflection and transmission properties of a metamaterial structure (cut shown in the left). (b) Overview of the ANN architecture used: we have a number of feed-forward neural networks, which output predicted parameters for the homogeneous slab. Each of the networks itself is trained to work for a different sub-set of the problem.

Download Full Size | PDF

Figure 1(b) illustrates the chosen ANN architecture for retrieving material parameters, which is implemented as a feed-forward fully connected neural network. The inputs are the real and imaginary parts of the angular reflection and transmission spectrum. We discretize the angular range from $0$ to $\pi /2$ with eight values; thus, the network has $32+2$ total input values. The two additional inputs are the wavelength $\lambda _0$ and thickness of the slab $h$. The outputs are the real and imaginary parts of $\varepsilon$ and $\mu$. One evaluation of the ANN gives predicted effective parameters for a single wavelength. For obtaining effective parameters over multiple wavelengths, the ANN must be evaluated for each of the wavelengths separately, with corresponding angular reflection and transmissions spectra given as inputs (along with wavelength $\lambda _0$ and slab thickness $h$).

We found that networks with five hidden layers with 100 units each (44 506 degrees of freedom in total) performed best for our purposes. We use $\mathrm {tanh}$ as the nonlinear activation function for the hidden layers. The networks are implemented using the Tensorflow framework and trained using the Adam optimizer with learning rate scheduling. As elaborated later, the accuracy of our method is mostly determined by the parameter space subdivision. As long as sufficiently large ANNs were used, we did not notice any significant benefit from tuning the hyperparameters (hidden layers sizes, activation functions, etc.).

2. Parameter space subdivision

When trying to train ANNs to perform the parameter retrieval task, it got quickly evident that the ANN is unable to learn the desired functionality and the training procedure stagnates, leaving the resulting ANN with large errors in predicted parameters. This can be seen in the later results, e.g., in Fig. 4(a). The underlying issue that prevents a straightforward training of the parameter retrieval problem is the (near) non-uniqueness of the underlying dataset. We illustrate this by considering a slab with a thickness of $h=400~\mathrm {nm}$ (and the case of $\lambda _0=1570~\mathrm {nm}$) and mapping the mean absolute difference between various spectra and a reference spectrum

(1)$$\delta_\mathrm{mse}(\varepsilon,\mu) = \sum_{k_x} \left|r(k_x;\varepsilon,\mu) - r(k_x;\varepsilon_{\mathrm{ref}},\mu_{\mathrm{ref}})\right|^{2} + \left|t(k_x;\varepsilon,\mu) - t(k_x;\varepsilon_{\mathrm{ref}},\mu_{\mathrm{ref}})\right|^{2} \,.$$

We consider here, only exemplarily, a material with $\varepsilon =4.5$ and $\mu =4.5$ as the reference structure. As shown in Fig. 2(a) , several minima can be identified, corresponding to nearly identical spectra. The corresponding spectra of the multiple minima are plotted in Fig. 2(b). Even though the highlighted spectra are not identical, the differences are small enough to make it impossible for the neural network training to make any progress. To contrast the result, we also show an example where the mean squared difference to the reference is quite large.

Fig. 2. (a) Map of mean squared difference of reflection and transmission spectra relative to the point marked in red ($\varepsilon =4.5$, $\mu =4.5$). The colored markers indicate local minima, where spectra have small relative error compared to the reference. For comparison, the black marker shows a case with large difference to the reference case. (b) Angular dependent reflection and transmission spectra for the marked cases, with solid (dashed) lines indicating the real (imaginary) part. The squared difference to the reference is shown in the bottom panel.

Download Full Size | PDF

But the structure of this issue also hints at a way to mitigate it. In order for ANNs to learn the mapping from the optical response to the corresponding material parameters, we need to ensure uniqueness of this mapping. We take the approach of subdividing the parameter space into subspaces so that for each of those subspaces, the uniqueness requirement holds, and one ANN can be trained for each of the subspaces. Later, during the retrieval process, we evaluate a given optical response using all ANNs and pick the effective parameters that yield the closest response to the given input.

First, we generate a large number of samples from the initial parameter space, that is using uniform sampling of $\varepsilon$, $\mu$, $h$ and $\lambda _0$ from given parameter ranges. We then use a recursive algorithm for subdividing the samples into subsets. The primary goal is to ensure that similar spectra are produced by similar material parameters. For that, we use a similarity measure based on mean squared error $\delta _{\mathrm {mse}}$ [Fig. 1]. We have illustrated one step of the algorithm in Fig. 3, showing the first step of the splitting for an example dataset based on Fig. 2, that is $h=400~\mathrm {nm}$, $\lambda _0=1570~\mathrm {nm}$ and $\varepsilon$ and $\mu$ varying in range $[-10, 10]$. Each iteration of the algorithm starts with set of samples (parent dataset) to be divided, from which we select a reference point ($\varepsilon _{\mathrm {ref}}$, $\mu _{\mathrm {ref}}$) for calculating the similarity measure. Based on the similarity measure and some fixed threshold ($\delta _0$) we separate the samples [Fig. 3(a)] into two subsets: “similar” set [Fig. 3(b)], with $\delta _{\mathrm {mse}} \leq \delta _0$ and “dissimilar” set [Fig. 3(c)], with $\delta _{\mathrm {mse}} > \delta _0$. The reference parameters are chosen from the input parameter set to ensure a split into “similar” and “dissimilar” dataset as equal as possible. As it stands, the mapping between the material parameters and the optical response in the “similar” set is clearly nonunique, so the next step is to divide the “similar” set into separate disconnected sets. We describe our approach below. After separating the “similar” set into disconnected subsets [Fig. 3(d)] the parameter space subdivision will recursively be applied on the “dissimilar” set, starting from finding a new reference point and performing again split into “similar” and “dissimilar” sets. The initial dataset is chosen to be large enough so that by the end we have (at least) 10 000 samples in each subset. For training we only save the 10 000 samples for each subset.

Fig. 3. Parameter space splitting, illustrated by random sampling from corresponding parameter spaces. (a) Input set for an iteration of the parameter splitting algorithm. (b) “Similar” set. (c) “Dissimilar” set. (d) Splitting of “similar” set into connected subsets.

Download Full Size | PDF

One challenge was to choose a proper similarity threshold $\delta _0$ for the selection of similar spectra that would work robustly for a wide range of material parameters. We found that for our parameters, $\delta _0 = 0.03$ gave good results. We also saw that using lossless spectra, that is $\Im \varepsilon , \Im \mu = 0$, resulted in much improved resulting subdivision. Note that this is only done for similarity comparison, in the actual training process the spectra are calculated with losses included.

To separate out disconnected sets in the “similar” set, we use at first 2000 samples from the “similar” parameter space and start iterating over all pairs of points ($p_a$, $p_b$) ordered by distance ($|p_a-p_b|$), as described in Algorithm 1. At the end we have allocated the 2000 samples to a few disconnected groups. These reference points can then be used to classify the rest of the samples in the parameter space, by finding closest reference point to a particular sample and assigning that sample to the subset associated with the reference point. This can be efficiently done using kd-tree functions in SciPy. In principle, the described algorithm could run over the whole “similar” set, but since the method scales with number of points squared it is beneficial to use a smaller representative set.

There are many standard clustering algorithms available for separating out disconnected sets [49–51]. However, we found it difficult to obtain good control over clustering using the standard approaches. During the recursive subdivision of the dataset, the subsets have different characteristics, and it was difficult to find suitable parameters that would give good results for all the subsets. In our case, we rely on a dynamic distance cutoff based on current distance and growth rate of the inter-pair distance ($\alpha \cdot \mathrm {distance}$), where we found $\alpha =1.03$ to yield best results. It must be noted that we did not carry out a comprehensive study, so the question on how best cluster the disconnected sets depends on the underlying effective medium model and the dataset.

Importantly, the algorithm here only considers the spectra (for similarity calculation) and the to-be-retrieved material parameters (for finding connected sets), any other parameters are ignored during the splitting phase. This means that this approach can straightforwardly accommodate for training retrieval network that works for multiple wavelengths $\lambda _0$ and slab thicknesses $h$.

For the cases described below, the data generation took up to 15 minutes for larger datasets.

3. Training parameter retrieval networks

We start by considering a homogeneous slab with thickness of $h=150~\mathrm {nm}$ and incidence wavelength in range $\lambda _0 \in [1164, 2167]~\mathrm {nm}$. In a first data set, we train the networks to retrieve a rather wide range of material parameters. Specifically, in what we call set #1, the network is trained to retrieve properties within a parameter space of $\Re \,\varepsilon ,\Re \mu \in [-10,10]$ and $\Im \,\varepsilon ,\Im \mu \in [-2,2]$. for this parameter space, we ran the splitting algorithm with various numbers of recursive splitting steps to produce the training dataset. We then trained a set of parameter retrieval sub-networks for each parameter sub-space. Training of each network took around 10 minutes on a single CPU core. Each of these sub-networks produces a guess for retrieved parameters, and in combining them, the prediction with the smallest error in the retrieved spectrum will be chosen. Comparing the training progress on a single network covering the full parameter space against splitting it into 16 sub-spaces [Fig. 4(a)] shows that, indeed, the single network is unable to learn the mapping from optical response to material parameters. With parameter space splitting, the individual networks exhibit much improved final validation loss.

Fig. 4. (a) Validation loss as a function of training epochs for networks trained for set #1 for different number of subdivisions. (b) Histogram of error of predicted material parameters ($\varepsilon$, $\mu$) for various number of problem subdivisions, for networks trained on set #1. Shaded area indicates error above $1.0$, which shall be considered as a misprediction of the samples. (c) Median error of predicted parameters as function of number of networks used for retrieval. (d) Fraction of mispredicted samples as function of number of networks used.

Download Full Size | PDF

For validating the combined parameter retrieval networks, we generated a test data set of 5000 samples drawn from the full parameter space. Figure 4(b) shows the histogram of errors of retrieved parameters of this 5000 sample validation set. One sees that dividing the parameter space between a number of ANNs pushes down the parameter retrieval error. Figure 4(c) shows median prediction error over the validation samples as a function of the number of sub-networks considered.

In the context of parameter retrieval, an interesting parameter to consider is the fraction of mispredicted results, here taken as those samples where the parameter prediction error is above $1.0$ [see Fig. 4(b)], as these correspond to cases where the predicted parameters are significantly off and the resulting predictions would be difficult to refine further. Figure 4(d) shows the fraction of mispredicted results as a function of the number of sub-networks in the parameter space splitting. This again reinforces that the parameter space splitting approach quickly reduces the amount of mispredicted parameters (arising from difficulties of training on a dataset with nearly non-unique samples). It is important to note that here the key factor to improved results is better data (avoiding non-uniqueness).

Of course, the performance of this approach strongly depends on the size of parameter space under consideration that the network shall learn. Figure 4(c) and 4(d) also include results of running the same procedure with initial parameters in a reduced paramater range of $\Re \varepsilon ,\Re \mu \in [-5,5]$, $\Im \varepsilon ,\Im \mu \in [-2, 2]$ (we call this set #2). We see that satisfactory performance is reached with fewer sub-spaces than before. Also, the results show that for larger parameter spaces, the parameter retrieval performance stagnates, so that including more sub-spaces does not improve the performance further. This can be attributed to the fact that it is difficult to train the individual prediction networks to be accurate enough, given the similarity of the near-identical spectra for different parameter values. In Fig. 5. we show that using the top three predictions from ANNs (instead of just choosing the best) improves the performance markedly.

4. Comparison with basic practical examples

To give a more practical example, we apply the method to a parameter retrieval problem for a periodic 2D array of spherical particles characterized by an isotropic electric dipolar response [illustrated in Fig. 5(a)]. The particles are characterized at a generic level with a Lorentzian type dispersion centered around $k_0^{(p)} = 6.3\,\mathrm {\mu m}^{-1}$, oscillator strength of $6 \sqrt {2} \pi c_0$ and damping $0.1 k_0^{(p)} c_0$ [52,53]. Using the T-matrix approach [54], we calculated reflection and transmission spectra for these structures. For an illustration, the thus simulated reflection coefficient calculated for an array characterized with a period of $a=150~\mathrm {nm}$ is shown in Fig. 5(b) and 5(c). The resonant response around $k_0^{(p)}$ is clearly visible. At oblique incidence we notice a shift in the resonance frequency due to coupling of the particles to their nearest neighbors. This dataset is used to compare different ANNs against least-squares fitting.

Fig. 5. (a) Sketch of the system considered: the unit cell contains a particle characterized by an electric dipolar polarizability $\alpha$ and the system is periodic in $x$ and $y$ characterized by a period $a$. Thus, the thickness of this metamaterial slab equals the unit cell size $a$. (b,c) Calculated reflection coefficient for an array with a period of $a=150~\mathrm {nm}$ as a function of the frequency and the angle of incidence (expressed in terms of tangential wave vector component): magnitude (b) and phase (c). The white line shows always magnitude (phase) at normal incidence ($k_x=0$).

Download Full Size | PDF

Baseline results [solid black lines in Fig. 6(a) and 6(b)] are given by a least-squares approach. Ten different random initial guesses were used for each input to obtain a robust fit. For comparison, we first consider ANNs trained on dataset #1. For this dataset, we noted that it was sufficient to run the recursive dataset generation algorithm for 30 iterations, resulting in 62 sub-sets. As seen from Fig. 6(a) and 6(b) (the blue dots), purely ANN-based parameter retrieval can result in slightly inaccurate effective parameters, especially so for the magnetic permeability $\mu$ .

Fig. 6. (a,b) Comparison of fitting results for $\varepsilon$ (a) and $\mu$ (b) with various approaches. Note that results for sets #1 and #2 match almost perfectly after the refinement, so the orange dots are covered by the green dots. (c) Number of function evaluations of analytical formula. (d) Time taken for retrieval per one frequency (excluding ANN training time). ANNs #1 and #2 refer to “wide” (set #1) and “narrow” (set #2) datasets.

Download Full Size | PDF

To improve retrieval accuracy, we also consider an approach where the output from ANNs is used as an initial guess for the least-squares solver, helping to minimize these small inaccuracies [orange and green dots in Fig. 6(a) and 6(b)]. Comparing number of function evaluations [Fig. 6(c) shows that the conventional least-squares-based approach requires almost an order of magnitude more evaluations of the analytical expressions for reflection and transmission coefficients considering the effective medium model. For ANN-based retrieval there is one evaluation of analytical reflection and transmission coefficients per one subspace (one ANN), in order to calculate error between input and output spectra (latter calculated from predicted parameters). Note that the costs of training are not considered here. The underlying analytical model here is relatively cheap to evaluate, so the run-time difference is less pronounced [Fig. 6(d)], although the ANN-based approach is around two times faster. The speed-up would be more important, for example, when considering computationally more expensive models to describe the effective medium, e.g., when including higher-order terms to account for spatial dispersion [28]. Notably, adding least-squares refinement step after getting prediction from ANNs does not come with a large computational cost, as the initial guess from ANNs is close enough to allow for quick convergence.

As discussed above [Fig. 4(c) and 4(d)], by focusing on a narrower parameter space, the number of networks required can be reduced. Figure 6 also includes results for ANNs trained on the narrower parameter range (dataset #2). Showing that, indeed, for good fitting results, 15 iterations of the parameter space splitting were required, yielding 22 sub-sets. After the least-squares refinement step, the retrieved parameters match the corresponding results for set #1 but have been obtained at roughly 50% of the computational cost.

To show generality of the approach, we also applied the method to the retrieval of anisotropic effective material parameters. For this, we consider instead a rectangular unit cell of electric dipolar polarizabilities to obtain an anisotropic response. Following the approach described above, we generated datasets and trained ANNs for fitting the reflection and transmission data from the anisotropic lattice. Figure 7 compares retrieval results from ANN (with and without least-squares refinement) against least-squares reference. The underlying effective medium model is now more complex, so we used a reduced parameter space ($\Re \varepsilon _x,\, \Re \varepsilon _z,\, \Re \mu \in [-5,5]$ and $\Im \varepsilon _x,\, \Im \varepsilon _z,\, \Im \mu \in [-2,2]$), where after parameter space subdivision we end up with 64 subspaces. Compared to the isotropic case the fitting is more complicated for both approaches (ANN and also the least-squares fit), especially for larger $k_0$ regime (evidenced by significant increase in function evaluations during least-squares fitting). For ANN-based approach the least-squares refinement is important in obtaining good parameters, although again the numerical costs of this refinement step are comparatively low.

Fig. 7. Fitting results for the anisotropic dataset, showing $\varepsilon _x$ (a), $\varepsilon _z$ (b), $\mu$ (c) and a number of function evaluations required during retrieval (d). In (a,b) the dashed line shows least-square result from the other permittivity tensor component for reference. In (a,b,c) the least-squares reference is shown with solid black line.

Download Full Size | PDF

5. Including more parameters in the retrieval

For a more demanding example, we apply our approach to an extended dataset, where both wavelength $\lambda _0$ and slab thickness $h$ were varied and included as input to the parameter retrieval networks. In the case of the periodic dipole problem, varying thickness of the homogenized slab corresponds to calculating reflection and transmission spectra from dipole arrays of different periodicities (unit cell sizes). To illustrate the role of modifying the thickness, we plot in Fig. 8 the calculated reflection and transmission spectra (only normal incidence shown in plot), along with selected effective medium parameters obtained using least-squares fitting. For small unit cell sizes (i.e., slab thicknesses), the coupling between dipoles has a strong influence on the optical response of the dipole array. For larger unit cells sizes, the dilution of the response can be seen. As before, ten random initial guesses have been used for the least-squares fitting. Looking at the retrieved parameters in Fig. 8, it can be seen that for a few discrete input spectra the least-squares fit has not found the best fit. This can be improved by using more initial guesses, but for the purposes of this comparison, the results are suitable as-is. For bounds in least-squares fitting same values as for training set #1 have been used. However, as seen in Fig. 8(c), the residual jumps up for the strong resonance case ($h$ near 100 nm), as the optimal retrieved parameters lie outside of the given bounds. Increasing the bounds, however, would require increasing number of fitting trials, slowing down the process.

Fig. 8. Amplitude (a) and phase (b) for reflection coefficient at normal incidence from the electric dipolar array as a function of the frequency and as a function of the periodicity that corresponds here to the thickness of the slab $h$. (c) Residual from least-squares fitting. (d,e) Real part of the permittivity (d) and permeability (e) as retrieved from a least-squares fitting.

Download Full Size | PDF

For comparison, we again consider training ANNs for two different parameter ranges: the “wide” (set #1) and “narrow” (set #2) datasets defined earlier. Figure 9(a) shows the relative fitting error of ANNs compared to the least-squares reference as a function of the number of iterations for the parameter space splitting used for data generation for ANN-based retrieval. The figure shows results both before and after a final least-squares refinement step. Again one sees that without additional least-squares refinement, the ANN output is slightly worse than the reference least-squares fit. However, given sufficient splitting of the parameter space, the least-squares refinement brings the error down to the reference. In Fig. 9(b), we show the total time taken for three selected cases from set #2 (25, 30, 40 split iterations) and one from set #1 (30 split iterations). We note that the refinement stage requires only a small number of additional iterations per fit, and it brings the relative error to levels comparable with least-squares reference, with much smaller requirements for computational time (excluding training time).

Fig. 9. (a) Final mean relative error as a function of the number of splitting iterations. Solid lines (dashed lines) indicate error before (after) final refinement step. The dots mark the eight selected results for detailed comparison shown in (c-j). Dotted line indicates MSE for least-squared reference. (b) Total retrieval time (i.e., excluding training time) taken for fitting with different ANNs, with least-squares reference shown for comparison. Black bars on top indicate time taken for the least-squares refinement step. The numbers indicate the number of splitting iterations and the corresponding number of sub-networks (in parenthesis). (c-j) Relative error maps before (top row) and after (bottom row) of least squares refinement step. Green shading indicates region where ANN-based retrieval outperforms the least-squares reference.

Download Full Size | PDF

We illustrate the results by comparing fitting residual from ANN-based approach to the least-squares reference as a function of frequency $k_0$ and slab thickness in Figs. 9(c)–9(f) (before refinement) and Figs. 9(g)–9(j) (after refinement). Here we see that number of parameter space splits considered for parameter retrieval can be reduced. However, below a critical number of splittings, the inaccuracy of the ANN prediction becomes large so that the least-squares refinement optimization fails to find the proper minimum. At the shown example this happens for approximately 25 splitting iterations. We also see that the least-squares refinement step is essential to get the fitting residual to levels comparable with pure least-squares fitting. This is seen by the fact that before refinement [Figs. 9(c)–9(f)] the fitting residual (mean-squared error of predicted and input spectra) from ANNs can be more than two orders of magnitude worse than fitting residual from least-squares fitting. However, after refinement [Figs. 9(g)–9(j)] the fitting residual from ANNs is comparable with the reference.

It it worth noting that ANN-based retrieval outperforms least-squares fitting for the strongly coupled case, where the optimal effective parameters lie outside the prescribed bounds. For least-squares fitting the bounds are strict and therefore for those cases the retrieved parameters are off. In contrast, for ANN-based retrieval the dataset bounds are only relevant for generation of the training data and as results in Fig. 9 show, ANNs have reasonably good extrapolation capabilities in this case. These extrapolation capabilities are incidental and unlikely to be generally useful in our method. However, there might be circumstances where such features are interesting for further studies.

Note that for results in Fig. 9(j), we refine the closest three guesses (instead of just one) of ANNs and then select the one with the lowest error. This means that the refinement step now takes more time, but implies that comparable results are obtained using a smaller amount of total networks (to obtain comparable results 94 networks would be necessary, instead of 55).

6. Conclusion

In conclusion, we have successfully demonstrated the applicability of artificial neural networks to the problem of parameter retrieval in metamaterials. The networks are made to learn structure-property relation and can predict after training, the effective material parameters that are degrees of freedom in a given constitutive relation used to describe the metamaterial at the effective level. As an input it requires the complex reflection and transmission coefficients from a thin film of the metamaterial along with its thickness and the wavelength of the illumination. We have considered here a given polarization and described the metamaterial as a local and isotropic material characterized by a dispersive permittivity and permeability.

We have shown that by subdividing the problem, we can lift the non-uniqueness that otherwise hampers the parameter retrieval approach. In this approach, we end up with a number of ANNs, each performing parameter retrieval on some sub-set of the whole parameter space. Afterwards the prediction from each sub-net is compared and the most optimal response is finally choosen. Although this necessitates evaluating all the ANNs for finding corresponding effective material parameters, this still outperforms the conventional least-squared fitting approach for the parameter retrieval problem.

We wish to shortly iterate on the benefits of the use of ANN for the problem of parameter retrieval. In particularly, most if not so say nearly all of the computational efforts are contained in the training of the networks. Once the network has learned a specific structure-property relation, it can be applied to retrieve of properties from many disparate materials without further modifications. It can be applied to metamaterials but also natural materials. It constitutes, therefore, an essential tool in the research of photonic materials.

While only considered here for some rather basic constitutive relations, the work can be extended to a much wider range of materials. It can consider anisotropic or bi-anisotropic materials. Also, materials described by nonlocal constitutive relations can be envisioned. In terms of a critical assessment, it should be stressed that the networks are mostly limited to the finite extend of range of the training data. While we have seen some extrapolation capabilities of the networks, this is limited in scope can not be exploited in a consistent way. While this is not a disadvantage per se, it has to be kept in mind. Some prior knowledge needs to be at hand concerning the range of effective properties that can be expected for a given material. Thus, the proposed method is suited for cases where there are a priori assumptions on the retrieved parameters, and such retrievals are to be carried out repeatedly. Then the training costs of the ANNs are amortized over the repeated parameter retrievals. On the other hand, if the expected bounds for effective parameter values are very wide or there are only a small number of parameter retrievals to be carried out, then the proposed method offers no benefits. Moreover, it turned out to be advantageous to refine in a very last step the results from the networks with a final least-square fitting. But that is really just necessary to optimize the results in its final details and does not compromise the overall suitability of our approach.

Funding

Carl-Zeiss-Stiftung; Deutsche Forschungsgemeinschaft (390761711); Deutsche Forschungsgemeinschaft (258734477).

Acknowledgment

We acknowledge support by the German Research Foundation within the Excellence Cluster 3D Matter Made to Order (EXC 2082/1 under project number - 390761711) and within the SFB 1173 (Project-ID No. 258734477) and by the Carl Zeiss Foundation. Early results of this work were presented at the 14th International Congress on Artificial Materials for Novel Wave Phenomena - Metamaterials 2020.

Disclosures

The authors declare no conflicts of interest.

Data availability

Code and data are available in Ref. [55].

References

1. S. A. Tretyakov, “A personal view on the origins and developments of the metamaterial concept,” J. Opt. 19(1), 013002 (2017). [CrossRef]

2. A. K. Iyer, A. Alù, and A. Epstein, “Metamaterials and metasurfaces–historical context, recent advances, and future directions,” IEEE Trans. Antennas Propag. 68(3), 1223–1231 (2020). [CrossRef]

3. W. Cai and V. Shalaev, Optical Metamaterials (Springer New York, 2010).

4. A. Sihvola, “Metamaterials in electromagnetics,” Metamaterials 1(1), 2–11 (2007). [CrossRef]

5. V. M. Shalaev, “Optical negative-index metamaterials,” Nat. Photonics 1(1), 41–48 (2007). [CrossRef]

6. J. Valentine, S. Zhang, T. Zentgraf, E. Ulin-Avila, D. A. Genov, G. Bartal, and X. Zhang, “Three-dimensional optical metamaterial with a negative refractive index,” Nature 455(7211), 376–379 (2008). [CrossRef]

7. A. C. Atre, A. García-Etxarri, H. Alaeian, and J. A. Dionne, “A Broadband Negative Index Metamaterial at Optical Frequencies,” Adv. Opt. Mater. 1(4), 327–333 (2013). [CrossRef]

8. A. Poddubny, I. Iorsh, P. Belov, and Y. Kivshar, “Hyperbolic metamaterials,” Nat. Photonics 7(12), 948–957 (2013). [CrossRef]

9. P. Shekhar, J. Atkinson, and Z. Jacob, “Hyperbolic metamaterials: fundamentals and applications,” Nano Convergence 1(1), 14 (2014). [CrossRef]

10. P. Moitra, Y. Yang, Z. Anderson, I. I. Kravchenko, D. P. Briggs, and J. Valentine, “Realization of an all-dielectric zero-index optical metamaterial,” Nat. Photonics 7(10), 791–795 (2013). [CrossRef]

11. F. Capolino, Theory and phenomena of metamaterials (CRC press, 2017).

12. F. Capolino, Applications of metamaterials (CRC press, 2017).

13. T. J. Cui, D. R. Smith, and R. Liu, Metamaterials (Springer, 2010).

14. M. G. Silveirinha, A. Alù, and N. Engheta, “Parallel-plate metamaterials for cloaking structures,” Phys. Rev. E 75(3), 036603 (2007). [CrossRef]

15. V. Agranovich and V. Kravtsov, “Notes on crystal optics of superlattices,” Solid State Commun. 55(1), 85–90 (1985). [CrossRef]

16. I. Tsukerman, A. N. M. S. Hossain, and Y. D. Chong, “Homogenization of layered media: Intrinsic and extrinsic symmetry breaking,” (2020).

17. A. V. Chebykin, A. A. Orlov, A. V. Vozianova, S. I. Maslovski, Y. S. Kivshar, and P. A. Belov, “Nonlocal effective medium model for multilayered metal-dielectric metamaterials,” Phys. Rev. B 84(11), 115438 (2011). [CrossRef]

18. S. I. Maslovski, S. A. Tretyakov, and P. A. Belov, “Wire media with negative effective permittivity: A quasi-static model,” Microw. Opt. Technol. Lett. 35(1), 47–51 (2002). [CrossRef]

19. M. G. Silveirinha, “Nonlocal homogenization model for a periodic array of ɛ-negative rods,” Phys. Rev. E 73(4), 046612 (2006). [CrossRef]

20. T. Geng, S. Zhuang, J. Gao, and X. Yang, “Nonlocal effective medium approximation for metallic nanorod metamaterials,” Phys. Rev. B 91(24), 245128 (2015). [CrossRef]

21. R. Marques, F. Mesa, J. Martel, and F. Medina, “Comparative analysis of edge- and broadside-coupled split ring resonators for metamaterial design - Theory and experiments,” IEEE Trans. Antennas Propag. 51(10), 2572–2581 (2003). [CrossRef]

22. S. A. Ramakrishna, “Physics of negative refractive index materials,” Rep. Prog. Phys. 68(2), 449–521 (2005). [CrossRef]

23. D. R. Smith and J. B. Pendry, “Homogenization of metamaterials by field averaging,” J. Opt. Soc. Am. B 23(3), 391–403 (2006). [CrossRef]

24. A. Andryieuski, S. Ha, A. A. Sukhorukov, Y. S. Kivshar, and A. V. Lavrinenko, “Bloch-mode analysis for retrieving effective parameters of metamaterials,” Phys. Rev. B 86(3), 035127 (2012). [CrossRef]

25. M. G. Silveirinha, “Metamaterial homogenization approach with application to the characterization of microstructured composites with negative parameters,” Phys. Rev. B 75(11), 115104 (2007). [CrossRef]

26. S. Lannebère, T. A. Morgado, and M. G. Silveirinha, “First principles homogenization of periodic metamaterials and application to wire media,” arXiv preprint arXiv:2002.06271 (2020).

27. D. R. Smith, “Analytic expressions for the constitutive parameters of magnetoelectric metamaterials,” Phys. Rev. E 81(3), 036605 (2010). [CrossRef]

28. K. Mnasri, A. Khrabustovskyi, M. Plum, and C. Rockstuhl, “Retrieving effective material parameters of metamaterials characterized by nonlocal constitutive relations,” Phys. Rev. B 99(3), 035442 (2019). [CrossRef]

29. X.-X. Liu, D. A. Powell, and A. Alù, “Correcting the Fabry-Perot artifacts in metamaterial retrieval procedures,” Phys. Rev. B 84(23), 235106 (2011). [CrossRef]

30. A. F. Mota, A. Martins, J. Weiner, F. L. Teixeira, and B.-H. V. Borges, “Constitutive parameter retrieval for uniaxial metamaterials with spatial dispersion,” Phys. Rev. B 94(11), 115410 (2016). [CrossRef]

31. P. Grahn, A. Shevchenko, and M. Kaivola, “Theoretical description of bifacial optical nanomaterials,” Opt. Express 21(20), 23471 (2013). [CrossRef]

32. P. Grahn, A. Shevchenko, and M. Kaivola, “Interferometric description of optical metamaterials,” New J. Phys. 15(11), 113044 (2013). [CrossRef]

33. Q. Flamant, D. Torrent, S. Gomez-Gra na, A. N. Grigorenko, V. G. Kravets, P. Barois, V. Ponsinet, and A. Baron, “Direct retrieval method of the effective permittivity and permeability of bulk semi-infinite metamaterials by variable-angle spectroscopic ellipsometry,” OSA Continuum 2(5), 1762 (2019). [CrossRef]

34. R. S. Hegde, “Deep learning: a new tool for photonic nanostructure design,” Nanoscale Adv. 2(3), 1007–1023 (2020). [CrossRef]

35. W. Ma, Z. Liu, Z. A. Kudyshev, A. Boltasseva, W. Cai, and Y. Liu, “Deep learning for the design of photonic structures,” Nat. Photonics 15(2), 77–90 (2021). [CrossRef]

36. S. So, T. Badloe, J. Noh, J. Bravo-Abad, and J. Rho, “Deep learning enabled inverse design in nanophotonics,” Nanophotonics 9(5), 1041–1057 (2020). [CrossRef]

37. P. R. Wiecha, A. Arbouet, C. Girard, and O. L. Muskens, “Deep learning in nano-photonics: inverse design and beyond,” arXiv:2011.12603 [cond-mat, physics:physics] (2020). ArXiv: 2011.12603.

38. O. Khatib, S. Ren, J. Malof, and W. J. Padilla, “Deep Learning the Electromagnetic Properties of Metamaterials—A Comprehensive Review,” Adv. Funct. Mater. 31(31), 2101748 (2021). [CrossRef]

39. S. Huang, Z. Cao, H. Yang, Z. Shen, and X. Ding, “An electromagnetic parameter retrieval method based on deep learning,” J. Appl. Phys. 127(22), 224902 (2020). [CrossRef]

40. C. F. L. Vasconcelos, S. L. Rêgo, and R. M. S. Cruz, “The Use of Artificial Neural Network in the Design of Metamaterials,” in Intelligent Data Engineering and Automated Learning - IDEAL 2012, vol. 7435D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, H. Yin, J. A. F. Costa, and G. Barreto, eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012), pp. 532–539. Series Title: Lecture Notes in Computer Science.

41. K. Mnasri, A. Khrabustovskyi, C. Stohrer, M. Plum, and C. Rockstuhl, “Beyond local effective material properties for metamaterials,” Phys. Rev. B 97(7), 075439 (2018). [CrossRef]

42. S. Raza, S. I. Bozhevolnyi, M. Wubs, and N. Asger Mortensen, “Nonlocal optical response in metallic nanostructures,” J. Phys.: Condens. Matter 27(18), 183204 (2015). [CrossRef]

43. P. Kinsler, “An introduction to spatial dispersion: revisiting the basic concepts,” arXiv preprint arXiv:1904.11957 (2019).

44. A. Shevchenko, P. Grahn, V. Kivijärvi, M. Nyman, and M. Kaivola, “Spatially dispersive functional optical metamaterials,” J. Nanophotonics 9(1), 093097 (2015). [CrossRef]

45. D. Torrent, “Strong spatial dispersion in time-modulated dielectric media,” Phys. Rev. B 102(21), 214202 (2020). [CrossRef]

46. D. Iakushev and S. Lopez-Aguayo, “Nonlocal effect on transverse-magnetic photonic properties of periodic dielectric-metal stacks,” J. Opt. 20(10), 105101 (2018). [CrossRef]

47. A. Ciattoni and C. Rizza, “Nonlocal homogenization theory in metamaterials: Effective electromagnetic spatial dispersion and artificial chirality,” Phys. Rev. B 91(18), 184207 (2015). [CrossRef]

48. F. Urban III, D. Barton, and N. Boudani, “Extremely fast ellipsometry solutions using cascaded neural networks alone,” Thin Solid Films 332(1-2), 50–55 (1998). [CrossRef]

49. D. Xu and Y. Tian, “A Comprehensive Survey of Clustering Algorithms,” Ann. Data. Sci. 2(2), 165–193 (2015). Publisher: Springer Science and Business Media LLC. [CrossRef]

50. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the second ACM SIGKDD international conferenceon knowledge discovery and data mining, (1996), pp. 226–231.

51. M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: ordering points to identify the clustering structure,” SIGMOD Rec. 28(2), 49–60 (1999). [CrossRef]

52. A. Rahimzadegan, R. Alaee, C. Rockstuhl, and R. W. Boyd, “Minimalist Mie coefficient model,” Opt. Express 28(11), 16511 (2020). [CrossRef]

53. R. Venkitakrishnan, T. Höß, T. Repän, F. Z. Goffi, M. Plum, and C. Rockstuhl, “Lower limits for the homogenization of periodic metamaterials made from electric dipolar scatterers,” Phys. Rev. B 103(19), 195425 (2021). [CrossRef]

54. D. Beutel, A. Groner, T. Hos, C. Rockstuhl, and I. Fernandez-Corbaton, “Efficient Simulation of Bi-periodic, Layered Structures with the T-Matrix Method,” in 2020 Fourteenth International Congress on Artificial Materials for Novel Wave Phenomena (Metamaterials), (IEEE, 2020), pp. 110–112.

55. T. Repän, “ANN parameter retrieval,” zenodo (2021), https://doi.org/10.5281/zenodo.5235447.

Artificial neural networks used to retrieve effective properties of metamaterials

Abstract

1. Introduction

2. Parameter space subdivision

3. Training parameter retrieval networks

4. Comparison with basic practical examples

5. Including more parameters in the retrieval

6. Conclusion

Funding

Acknowledgment

Disclosures

Data availability

References

Data availability

Cited By

Figures (9)

Equations (1)

Optics Express