Abstract
Optical structures can serve as low-power high-capacity alternatives of electronic processors for more efficient neuromorphic computing, but can suffer from large footprints and weak scalability. In this work, properly phased time-perturbed microrings side-coupled to a waveguide are utilized to realize a compact processor for linear transformations. We build up a synthetic frequency dimension to provide sufficient degrees of freedom, where the linear time-varying structures enable the linear intermixing and transformation of frequency-multiplexed data. Moreover, non-reciprocal and asymmetric flow of data in the forward and backward modes, due to phasing of the perturbations, helped to build up another synthetic dimension and to avoid physically repeating the processing elements, thus enabling a much more compact and scalable linear processor.
© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Neural networks are mathematical models widely used in machine learning problems. A machine learns when it finds a suitable element of a hypothesis class in a reasonable number of steps by considering given samples of input-output mappings such that it can approximate later samples well with a high probability [1]. Neural networks (NN) are one such hypothesis class that can approximate a wide range of concepts when they have enough tuned parameters, due to the universal approximation theorem [2]. The gradient descent is typically utilized to tune the NN parameters in order to reach the true function [3]. Thus far, fascinating tasks such as speech recognition, image classification, online translation [3], and human-like decision making [4] etc. have been obtained by NNs.
To exploit such a mathematical model, we need a low-power high-capacity flexible and scalable processor to perform the computations as accurately and as fast as possible [5]. Nowadays, electronic processers meet these needs but suffer from high power consumption due to both data transfer between memory and CPU as well as computations inside the chip [5,6]. This limitation worsens year after year as neural networks inflate in number of parameters to achieve high accuracy for more complicated tasks, but memory and processor technologies do not develop with the same trend [7]. This has triggered considerable amount of research to overcome such challenges [8]. Aside from software level tricks or hardware architecture design approaches using FPGA or ASIC, new high-tech approaches are trying to perform computations near memory, inside memory or even at sensor level to overcome memory-wall and Moore’s law limitations in power consumption [5].
Light as a low-loss high-capacity carrier of information can be an excellent alternative in the quest for developing low-power high-capacity processors [5,6,9–13]. Optical interconnects between memory and processor have been explored to reduce power consumption due to data transfer as in the near-memory-processing approach [14–16]. Moreover, optical structures can act as in-memory processors, which their processing parameters are embedded in the physical properties of the medium, such that the light field signal is processed as it propagates through the structure [11]. Some well-known optical functionalities such as filtering [17–20], coupling [21–25], or diffraction [26–30], together with intrinsic properties of light fields such as superposition can resemble multiply-and-accumulate (MAC) computations and may be exploited to realize linear transformations [27,31,32]. Nonlinear optics with all its limitations, complexities and implementation difficulties may also help us design a fully optical processor for neuromorphic computations [26,33–35].
Most optical neural networks proposed in recent years have shown their success in reducing power consumption and improving throughput, but typically suffer from either large footprint or weak scalability [13]. Some research have used multiplexing techniques to attain a compact structure but these techniques are not sufficient as they only shrink the signal-carrying components (e.g. waveguides) but the processing components are still repeated in space resulting in a large footprint. For example although wavelength-division multiplexing (WDM) and mode-division multiplexing (MDM) techniques were used in [17,36], the overall footprint was not reduced as the processing elements, i.e. microring drop-filters, had to be repeated in space in order to provide sufficient degrees of freedom needed to conduct arbitrary computations.
Other computational considerations may force us to repeat the processing element. For example, it has been shown that repeating the processing element in specific types of analog optical computing, such as diffractive layers, improves the structure’s computational abilities and facilitates the training phase (inverse design), i.e. the layered structure acts better and trains faster than a monolayer structure with same degrees of freedom [37,38]. As a result, repeating the processing element seems unavoidable due to both providing sufficient degrees of freedom as well as enhancing the structure computationally.
A promising new approach to address the footprint and scalability is to use time-varying optical structures along with time-division multiplexing (TDM) and WDM techniques. Reference [39] has built up a synthetic space in time and Ref. [40,41] have used phase modulation together with time-lenses; however, using delay lines in the former and dispersive medium in the latter has led to an overall large footprint in all cases. Time-varying structures combined with WDM techniques have also been explored. Reference [42] has used acousto-optics to perturb cavities and Ref. [43] has introduced a scheme based on microring resonators perturbed by electro-optic modulation, both to realize frequency conversion; however, the former does not provide sufficient degrees of freedom for a complicated task and the latter must either use extremely huge on-chip delay lines with 8-mm2 footprint as resonators or provide ultra-high frequency electro-optic modulation which are not yet easily feasible and economic.
In this paper, we propose an architecture utilizing a series of time-perturbed microrings coupled to a waveguide, with WDM signals flowing in both directions in the guide (Fig. 1). These time-perturbed rings are discrete analogues of the diffractive layers in frequency synthetic approaches [44,45]. They provide the necessary degrees of freedom by manipulating perturbation characteristics in time, but unlike [43], our rings rely only on the closely-spaced shadow frequencies (dynamic modes generated around the input frequency due to periodic modulation e.g. temporal Bloch harmonics) of the time-perturbed system, rather than the largely-spaced resonance modes of the ring. Thus our proposal does not require ultra-high modulation frequencies nor large-footprint microrings, as was the case in [43].
Moreover, time-perturbed micro-rings with proper phasing of modulation enable non-reciprocity and asymmetric forward/backward transmission. This allows us to overload processing units on the same ring on forward and backward modes, without adding new physical elements, without cross-talks between the two directions. This novel utilization of both forward and backward modes of the processing elements, yields the benefits of layered linear processors such as easier training and better accuracies with less number of epochs. Hence, each microring is repeated also in synthetic space as shown in Fig. 2.
2. Proposed structure and methods
The structure is made up of a single-mode waveguide side-coupled to a series of individual microrings, as is schematically shown in Fig. 1. The single-mode waveguide carries information on frequency channels separated by $\mathrm{\Omega }$ around the carrier frequency (${\omega _0}$), which is set at one resonance frequency of the microring. Thus, the input signal can be considered as $u(t )= \mathop \sum \nolimits_n {u_n}{e^{jn\mathrm{\Omega }t + j{\omega _0}t}}$ where the input information is placed on the complex Fourier coefficients of ${u_n}$ and n indexes the number of frequency channel around the main resonance frequency. Guided modes of the waveguide would evanescently side-couple to the resonance modes of the microrings, which can also be analyzed using temporal couple mode theory [46,47]. The signal moves along the waveguide on forward mode and is processed as it passes through the microrings. Finally, an on-chip compact reflector [48] or two cascaded microrings [49] can reflect the forward-mode signal back to the backward mode and the signal passes through the microrings again and is processed differently. The output is then extracted from the backward mode frequencies.
The time-perturbed microring intermixes the frequency components of the incoming field before sending it out back to the waveguide and thereby actualize a linear transformation (Fig. 3). We perturb the microrings by a periodic modulation with period $\mathrm{\Omega }$, equal to the frequency spacing of the WDM signal. As a result, one can write the perturbation signal in terms of its harmonic tones as $\mathop \sum \nolimits_{l = 1}^m {A_l}\cos ({l\mathrm{\Omega }t + {\theta_l}} )= \mathop \sum \nolimits_{l ={-} m}^m {\delta _l}{e^{j{\theta _l}}}{e^{jl\mathrm{\Omega t}}}$ where ${A_l} = 2{\delta _l} = 2{\delta _{ - l}}$ and ${\theta _l}$ are amplitude and relative phase of each harmonic respectively and l indexes the number of tone harmonics. The maximum frequency of the perturbation signal is less than the distance between two resonance frequencies. We assumed that data are placed on frequencies with distance of $\mathrm{\Omega }$ with respect to each other. The inward signal is ${s_ + }(t )= \mathop \sum \nolimits_n s_n^ + {e^{jn\mathrm{\Omega }t + j{\omega _0}t}}$, leading to the outward signal vector of $[{s_n^ - } ]$ (see the Appendix):
Here, ${\mathbf \Omega } = \textrm{diag}({n\mathrm{\Omega }:n ={-} \infty :\infty } )$, ${\mathbf \Delta } = \textrm{toeplitz}({0,\,{\delta_1}{e^{j{\theta_1}}},{\delta_2}{e^{j{\theta_2}}}, \ldots } )$, and ${\mathbf \Gamma } = \textrm{diag}(\ldots ,{\gamma_{ - \mathrm{\Omega }}}, {\gamma_0},\,{\gamma_\mathrm{\Omega }}, \ldots )$ where ${\gamma _{l\mathrm{\Omega }}}$ is the coupling rate corresponding to the frequency channel ${\omega _0} + l\mathrm{\Omega }$. The matrix ${\mathbf W}$ denotes the linear transformation which couples each frequency channel of the input to the output. This matrix can approximate arbitrary unitary matrices with high fidelity if enough degrees of freedom are provided. To provide enough degrees of freedom one must add harmonics in the perturbation or repeat the computation by adding new microrings, or mathematically apply other ${\mathbf W}$s sequentially. The obtained linear transformation would be unitary due to assuming no loss and the conservation of energy. Although unitary transformations have their application in signal processing and neural networks [50–52], one can embed non-unitary operations inside a unitary transformation [53].
Amplitude and phase of perturbation implicitly determine ${\mathbf W}$’s entries. Gradient descent method together with an appropriate criterion can tune entries to approximate a desired matrix appropriately. The following relation introduces a criterion to measure the resemblance of two matrices:
where $\|U\| = \langle U,U \rangle$ and $\langle U,V \rangle = \mathop \sum \nolimits_i \mathop \sum \nolimits_j {u_{ij}}v_{ij}^\ast $. It can be easily seen that V and U would be the same except for a phase shift when ${{\cal F}} = 1$. By defining this criterion, automatic differentiation algorithms automatically compute the gradient descent method. In this work we have used JAX [54] as the framework of automatic reverse differentiation to calculate the gradient and Adam method was used as the optimization method [55].3. Results
A series of time-perturbed microrings coupled to a waveguide can approximate an arbitrary unitary matrix. We first examined this by providing enough degrees of freedom. Three five-tone perturbed microrings have 29 degrees of freedom enough to approximate a 5 × 5 matrix. Figure 4 shows an arbitrary unitary matrix approximated by three five-tone perturbed microrings with high fidelity.
Adding new microrings brings us another useful feature, viz. non-reciprocity and asymmetric transmission enabled by applying time-periodic perturbations of different phases and different amplitudes at several points in space [56–59]. This allows for the realization of two different linear transformations for forward and backward modes. Figure 5 shows a two-ring structure that represents a non-reciprocal behavior. We tested the assumption by two 5-tone perturbed rings demonstrating asymmetry together with non-reciprocity (Fig. 5). One can optimize forward and backward transformations by redefining the criterion properly. Figure 6 shows two different unitary matrices each approximated for forward and backward modes.
Considering neural networks applications, we also tested our design for the famous handwritten digit recognition task [60,61], a machine learning benchmark commonly used as a sanity check which can be learned by a linear model to some extent, suitable to examine our proposed linear processor. The dataset consists of 1800 samples of 8 × 8 images, where 1440 of them were used as the training data set and the rest were used as the test data set. Input data sit on frequency channels around the static resonance frequency with a gap of $\mathrm{\Omega }$ between adjacent channels (Fig. 7). A specific range of spectrum is attributed to each digit as shown in Fig. 8. A digit would be considered recognized if power is concentrated within the associated range of spectrum for that digit, more than others (Fig. 8).
A structure with 40 rings with $2{\gamma _{l\mathrm{\Omega }}} = 20\mathrm{\Omega }$ and 81 harmonics fulfilled expected results and recognized digits with 80 percent accuracy (Fig. 9(a)), using only the forward transmission mode of the waveguide. Additionally, one can send the signal back in to the waveguide and process the signal using both the forward and backward linear transformation provided by the phased TV-microrings. The reflection may be easily achieved with an on-chip compact reflector [48] or two cascaded microrings [49] both of which can couple the forward-mode signal to the backward mode. The accuracy of the 40-ring structure with $2{\gamma _{l\mathrm{\Omega }}} = 30\mathrm{\Omega }$ improved when backward modes were exploited, yielding a 90 percent accuracy with only 64 harmonics for each ring (Fig. 9(b)). A linear mathematical model (y = Wx) with no bias, without any optical constraint on the entries of the matrix would reach 86 and 93 percent in accuracy for train and test dataset respectively which is comparable with our results.
4. Discussion
We build up a synthetic space made up of frequency and forward-backward- modes so the signal diffracts in frequency space and moves along the forward and backward modes. Phased time-perturbed microrings yield non-reciprocal behavior, differently with respect to forward and backward modes. As a result, the forward signal encounters the same microrings differently when it propagates back into the structure and thereby is repeated in the synthetic dimension without adding any new elements.
It should be noted that this proposal better utilizes all available degrees of freedom by exploiting both forward and backward modes. This is analogues to the behavior of multi-layered diffractive elements, which outperform mono-layer designs in computing and in training with the same degrees of freedom [37]. Utilizing non-reciprocal and asymmetric transmission of forward and backward modes to mimic layered-ness is another important novelty of our work.
Time-varying mechanisms can make our design reprogrammable. Moreover, our proposal is realizable according to the state-of-the-art technology. To achieve a result like what was mentioned in the previous section, one can place data on frequency channels with frequency-spacing of Ω = 640 MHz and employ microrings with 18.7 GHz bandwidth and 40 GHz maximum modulation frequency which is completely achievable by compact microrings (e.g. with nearly 200-µm2 footprint) according to the current technology [62] and may lead to a high footprint efficiency of 1 PMAC/s/mm2. This is in stark contrast to [43] where rather unrealistic/high values were needed for the ring size and modulation frequency.
Based on time-perturbation, microrings can be reprogrammed by changing the phase and the amplitude of each tone of the perturbation signal. Perturbation can be realized by electro-optic modulation of the refractive index. The electro-optic mechanism can also tune the characteristic parameters of the ring. If the resonance frequency of rings are not the same, the electro-optic mechanism may fine-tune the main frequency by a proper DC bias voltage. Moreover, this mechanism together with thermal processes can be used to prevent from thermal instability [63,64].
The dimensionality of our structure is determined by the number of frequency channels that it can support which is limited by the bandwidth of resonators on the one hand and the bandwidth of electro optic modulation on the other. A compact resonator with high bandwidth with respect to the gap between frequency channels worsens the resolution of the structure in the spectrum domain, and thus, reduces computational accuracy and ability of the processor. Moreover, maximum frequency of the electro-optic modulation limits the exploitable width of the spectrum. However, current structure can handle these consideration in some manner. The electro-optic mechanism together with thermal processes can bring the resonance frequency far away from its original values. In such a manner, the entire structure may support a wider range of signals in spectrum. As a result, our proposal takes advantage of spectrum suitably which brings high dimension processing without necessitating significant increase in footprint.
Another useful advantage of using microring is that similar processing can be simultaneously attained around other resonance frequencies of microrings, thus enabling batch processing. Although design parameters for each resonant mode (e.g. coupling rate) may be different, this can be handled by proper multi resonant-mode design in order to attain a nearly the same accuracy for all modes.
Suitable exploiting of spectrum and its feasibility have made our proposal more advantageous with respect to the previous proposals based on WDM and frequency coupling. [43] takes advantage of frequency coupling by perturbing microrings but unlike ours, resonance frequencies with distance equal to FSR are considered as data channels which either requires large rings or leads to large FSR; As a result, neither does it exploit the spectrum efficiently nor can bring about some features, like batch processing. Furthermore, our proposal takes advantage of forward and backward modes to make a better computation. Another work [42] has used acousto-optically perturbed cavities but has not used different harmonics to provide more degrees of freedom and, hence, richer computations; Moreover, in our work one can tune the resonance frequency of resonators by thermal and electro optical mechanisms and also batch processing is possible due to ring structure while they may be harder to achieve in other proposals.
5. Conclusion
In this paper we proposed a scheme to achieve a compact linear processor for neuromorphic processing, compliant with the current integrated optics fabrication processes. The proposal benefits from time-perturbations introduced in a series of micro-ring resonators side-coupled to a main waveguide bearing WDM signals. The time-varying microrings enable frequency domain linear transformations between the signals. Additionally, proper phasing of perturbations between the rings enabled different processing paths in the forward and backward flow of signals thanks to the achieved non-reciprocity.
The scheme has potential for exploiting different resonant frequencies of the rings simultaneously and thus enables batch processing. Tunability of ring characteristics, its resonance frequency in particular, may also help in processing with wider bandwidth and makes it more scalable. Ring resonators also have more features which may be exploited in further research. A ring resonator deformed to a racetrack is mode-sensitive [65] which opens a door to MDM data processing and may help improve processing capacity. In future, this versatile optical linear processing scheme can be combined with existing or new nonlinear activation functions in a variety of ways to pave the way for a very compact scalable on-chip optical deep neural network processor.
Appendix
A microring is an integrated passive photonic component made up of a waveguide loop. Light field inside the ring is as follows [66]:
The decay rate $\gamma = {\gamma _0} + {\gamma _1}$ can be due to the intrinsic (${\gamma _0})$ loss or the coupling of the light from the ring to the guide (${\gamma _1}$), here we assume no intrinsic loss. If a waveguide gets brought close to the ring, light modes would couple to guided modes, leak out or get in, explained by temporal couple mode theory [46]:
where ${s_ + }$ denotes inward fields and $\mu $ is coupling rate which in the case of lossless microring is related to the decay rate by $\mu = \sqrt {2{\gamma _1}} $. Following equation relates the input filed and the field of the ring to the output:Perturbed microring couples different frequencies of input field to each other:
We assume that signals are placed on frequencies with distance of $\mathrm{\Omega }$ to each other. By expanding input signal ${s_ + }(t )= \mathop \sum \nolimits_n s_n^ + {e^{jn\mathrm{\Omega }t + j{\omega _0}t}}$ where $s_n^ + $ is a complex-value parameter, the perturbation signal $\omega (t )= \; \mathop \sum \nolimits_{l = 1}^m {A_l}\cos ({l\mathrm{\Omega }t + {\theta_l}} )= \mathop \sum \nolimits_{l ={-} m}^m {\delta _l}{e^{j{\theta _l}}}{e^{jl\mathrm{\Omega t}}}$, where ${\delta _0} = 0$, ${\theta _l} ={-} {\theta _{ - l}}$, and ${A_l} = 2{\delta _l} = 2{\delta _{ - l}}$, and the microring field $a(t )= {e^{j{\omega _0}t}}\mathop \sum \nolimits_k {a_k}{e^{jk\mathrm{\Omega }t}}$. l indexes the harmonic tones of perturbation and $n,k$ index the frequency channels around the main resonance frequency. By substituting into the (8) we have:
Or in the matrix form:
According to the relation between perturbation parameters:
So we have:
Now one can find the outward filed:
Here, we have assumed that coupling rate of $\gamma $ is equal in the entire range of the spectrum, but in a more general assumption one can account for different coupling rate for each frequency channel. Then [Eq. (9)] will lead to:
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
References
1. M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning (MIT Press, 2018).
2. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016), Vol. 1.
3. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]
4. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Ostrovski, “Human-level control through deep reinforcement learning,” Nature 518(7540), 529–533 (2015). [CrossRef]
5. V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks,” Synthesis Lectures on Computer Architecture 15(2), 1–341 (2020). [CrossRef]
6. P. Stark, F. Horst, R. Dangel, J. Weiss, and B. J. Offrein, “Opportunities for integrated photonic neural networks,” Nanophotonics 9(13), 4221–4232 (2020). [CrossRef]
7. X. Xu, Y. Ding, S. X. Hu, M. Niemier, J. Cong, Y. Hu, and Y. Shi, “Scaling for edge inference of deep neural networks,” Nat. Electron. 1(4), 216–222 (2018). [CrossRef]
8. C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S. Rose, and J. S. Plank, “A survey of neuromorphic computing and neural networks in hardware,” arXiv preprint arXiv:1705.06963 (2017).
9. T. F. de Lima, A. N. Tait, A. Mehrabian, M. A. Nahmias, C. Huang, H.-T. Peng, B. A. Marquez, M. Miscuglio, T. El-Ghazawi, and V. J. Sorger, “Primer on silicon neuromorphic photonic processors: architecture and compiler,” Nanophotonics 9(13), 4055–4073 (2020). [CrossRef]
10. T. F. De Lima, H.-T. Peng, A. N. Tait, M. A. Nahmias, H. B. Miller, B. J. Shastri, and P. R. Prucnal, “Machine learning with neuromorphic photonics,” J. Lightwave Technol. 37(5), 1515–1534 (2019). [CrossRef]
11. M. A. Nahmias, T. F. De Lima, A. N. Tait, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, “Photonic multiply-accumulate operations for neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–18 (2020). [CrossRef]
12. B. J. Shastri, A. N. Tait, T. F. de Lima, M. A. Nahmias, H.-T. Peng, and P. R. Prucnal, “Principles of neuromorphic photonics,” arXiv preprint arXiv:1801.00016 (2017).
13. A. R. Totović, G. Dabos, N. Passalis, A. Tefas, and N. Pleros, “Femtojoule per MAC neuromorphic photonics: an energy and technology roadmap,” IEEE J. Sel. Top. Quantum Electron. 26(5), 1–15 (2020). [CrossRef]
14. L. Bernstein, A. Sludds, R. Hamerly, V. Sze, J. Emer, and D. Englund, “Freely scalable and reconfigurable optical hardware for deep learning,” arXiv preprint arXiv:2006.13926 (2020).
15. R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, “Large-scale optical neural networks based on photoelectric multiplication,” Phys. Rev. X 9, 021032 (2019). [CrossRef]
16. A. Sludds, “Attojoule scale computation of large optical neural networks,” (Massachusetts Institute of Technology, 2019).
17. A. N. Tait, T. F. De Lima, M. A. Nahmias, H. B. Miller, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, “Silicon photonic modulator neuron,” Phys. Rev. Appl. 11(6), 064043 (2019). [CrossRef]
18. A. N. Tait, T. F. De Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7(1), 1–10 (2017). [CrossRef]
19. A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Lightwave Technol. 32(21), 4029–4041 (2014). [CrossRef]
20. X. Xu, M. Tan, B. Corcoran, J. Wu, T. G. Nguyen, A. Boes, S. T. Chu, B. E. Little, R. Morandotti, and A. Mitchell, “Photonic perceptron based on a kerr microcomb for high-speed, scalable, optical neural networks,” Laser Photonics Rev. 14(10), 2000070 (2020). [CrossRef]
21. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and D. Englund, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]
22. H. Bagherian, S. Skirlo, Y. Shen, H. Meng, V. Ceperic, and M. Soljacic, “On-chip optical convolutional neural networks,” arXiv preprint arXiv:1808.03303 (2018).
23. N. C. Harris, J. Carolan, D. Bunandar, M. Prabhu, M. Hochberg, T. Baehr-Jones, M. L. Fanto, A. M. Smith, C. C. Tison, and P. M. Alsing, “Linear programmable nanophotonic processors,” Optica 5(12), 1623–1631 (2018). [CrossRef]
24. H. Zhang, M. Gu, X. D. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. H. Yung, Y. Z. Shi, F. K. Muhammad, G. Q. Lo, X. S. Luo, B. Dong, D. L. Kwong, L. C. Kwek, and A. Q. Liu, “An optical neural chip for implementing complex-valued neural network,” Nat. Commun. 12(1), 457 (2021). [CrossRef]
25. C. Wu, H. Yu, S. Lee, R. Peng, I. Takeuchi, and M. Li, “Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network,” Nat. Commun. 12(1), 96 (2021). [CrossRef]
26. T. Yan, J. Wu, T. Zhou, H. Xie, F. Xu, J. Fan, L. Fang, X. Lin, and Q. Dai, “Fourier-space diffractive deep neural network,” Phys. Rev. Lett. 123(2), 023901 (2019). [CrossRef]
27. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]
28. J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Rep. 8, 1–10 (2018). [CrossRef]
29. S. Zarei and A. Khavasi, “Inverse design of on-chip thermally tunable varifocal metalens based on silicon metalines,” IEEE Access 9, 73453–73466 (2021). [CrossRef]
30. S. Zarei, M.-r. Marzban, and A. Khavasi, “Integrated photonic neural network based on silicon metalines,” Opt. Express 28(24), 36668–36684 (2020). [CrossRef]
31. J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. L. Gallo, X. Fu, A. Lukashchuk, A. Raja, and J. Liu, “Parallel convolution processing using an integrated photonic tensor core,” arXiv preprint arXiv:2002.00281 (2020).
32. G. Mourgias-Alexandris, A. Totović, A. Tsakyridis, N. Passalis, K. Vyrsokinos, A. Tefas, and N. Pleros, “Neuromorphic photonics with coherent linear neurons using dual-IQ modulation cells,” J. Lightwave Technol. 38(4), 811–819 (2020). [CrossRef]
33. J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. P. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]
34. Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y.-C. Chen, P. Chen, G.-B. Jo, J. Liu, and S. Du, “All-optical neural network with nonlinear activation functions,” Optica 6(9), 1132–1137 (2019). [CrossRef]
35. G. Mourgias-Alexandris, A. Tsakyridis, N. Passalis, A. Tefas, K. Vyrsokinos, and N. Pleros, “An all-optical neuron with sigmoid activation function,” Opt. Express 27(7), 9620–9630 (2019). [CrossRef]
36. E. Gordon, “Mode division multiplexing (MDM) weight bank design for use in photonic neural networks,” arXiv preprint arXiv:1810.07583 (2018).
37. O. Kulce, D. Mengu, Y. Rivenson, and A. Ozcan, “All-optical information-processing capacity of diffractive surfaces,” Light: Sci. Appl. 10(1), 1–17 (2021). [CrossRef]
38. D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Analysis of diffractive optical neural networks and their integration with electronic neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–14 (2020). [CrossRef]
39. B. Peng, S. Yan, D. Cheng, D. Yu, Z. Liu, V. V. Yakovlev, L. Yuan, and X. Chen, “Novel optical neural network architecture with the temporal synthetic dimension,” arXiv preprint arXiv:2101.08439 (2021).
40. . M. Li, Z. Lin, and X. Meng, “Temporal optical neurons for serial deep learning,” in 2021 IEEE Photonics Society Summer Topicals Meeting Series (SUM) (IEEE, 2021), 1–2.
41. L. Zhang, C. Li, J. He, Y. Liu, J. Zhao, H. Guo, L. Zhu, M. Zhou, K. Zhu, and C. Liu, “Optical Machine Learning Using Time-Lens Deep Neural NetWorks,” in Photonics, (Multidisciplinary Digital Publishing Institute, 2021), 78.
42. H. Zhao, B. Li, H. Li, and M. Li, “Scaling optical computing in synthetic frequency dimension using integrated cavity acousto-optics,” arXiv preprint arXiv:2106.08494 (2021).
43. S. Buddhiraju, A. Dutt, M. Minkov, I. A. Williamson, and S. Fan, “Arbitrary linear transformations for photons in the frequency synthetic dimension,” Nat. Commun. 12(1), 2401 (2021). [CrossRef]
44. L. Ding, C. Qin, F. Zhou, L. Yang, W. Li, F. Luo, J. Dong, B. Wang, and P. Lu, “Efficient spectrum reshaping with photonic gauge potentials in resonantly modulated fiber-loop circuits,” Phys. Rev. Appl. 12(2), 024027 (2019). [CrossRef]
45. C. Qin, F. Zhou, Y. Peng, D. Sounas, X. Zhu, B. Wang, J. Dong, X. Zhang, A. Alù, and P. Lu, “Spectrum control through discrete frequency diffraction in the presence of photonic gauge potentials,” Phys. Rev. Lett. 120(13), 133901 (2018). [CrossRef]
46. S. Fan, W. Suh, and J. D. Joannopoulos, “Temporal coupled-mode theory for the Fano resonance in optical resonators,” J. Opt. Soc. Am. A 20(3), 569–572 (2003). [CrossRef]
47. M. Minkov, Y. Shi, and S. Fan, “Exact solution to the steady-state dynamics of a periodically modulated resonator,” APL Photonics 2(7), 076101 (2017). [CrossRef]
48. T. Wang, H. Guo, H. Chen, J. Yang, and H. Jia, “Ultra-compact reflective mode converter based on a silicon subwavelength structure,” Appl. Opt. 59(9), 2754–2758 (2020). [CrossRef]
49. I. Chremmos and N. Uzunoglu, “Reflective properties of double-ring resonator system coupled to a waveguide,” IEEE Photonics Technol. Lett. 17(10), 2110–2112 (2005). [CrossRef]
50. H.-Y. Chang and K. L. Wang, “Deep Convolutional Neural Networks with Unitary Weights,” arXiv preprint arXiv:2102.11855 (2021).
51. M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning (PMLR, 2016), 1120–1128.
52. M. Schuld, I. Sinayskiy, and F. Petruccione, “The quest for a quantum neural network,” Quantum Inf. Process. 13(11), 2567–2586 (2014). [CrossRef]
53. N. Tischler, C. Rockstuhl, and K. Słowik, “Quantum optical realization of arbitrary linear transformations allowing for loss and gain,” Phys. Rev. X 8, 021017 (2018). [CrossRef]
54. J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, JAX: composable transformations of Python + NumPy programs, 2018.
55. R. Frostig, M. J. Johnson, and C. Leary, “Compiling machine learning programs via high-level tracing,” Systems for Machine Learning, Stanford, California (2018).
56. A. Zarif, K. Mehrany, M. Memarian, and H. Heydarian, “Optical isolation enabled by two time-modulated point perturbations in a ring resonator,” Opt. Express 28(11), 16805–16821 (2020). [CrossRef]
57. C. Caloz, A. Alu, S. Tretyakov, D. Sounas, K. Achouri, and Z.-L. Deck-Léger, “Electromagnetic nonreciprocity,” Phys. Rev. Appl. 10(4), 047001 (2018). [CrossRef]
58. M. Chegnizadeh, M. Memarian, and K. Mehrany, “Non-reciprocity using quadrature-phase time-varying slab resonators,” J. Opt. Soc. Am. B 37(1), 88–97 (2020). [CrossRef]
59. M. Chegnizadeh, K. Mehrany, and M. Memarian, “General solution to wave propagation in media undergoing arbitrary transient or periodic temporal variations of permittivity,” J. Opt. Soc. Am. B 35(11), 2923–2932 (2018). [CrossRef]
60. D. Dua and C. Graff, “UCI Machine Learning Repository,” (2017).
61. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research 12, 2825–2830 (2011).
62. X. Xiao, X. Li, H. Xu, Y. Hu, K. Xiong, Z. Li, T. Chu, J. Yu, and Y. Yu, “44-Gb/s silicon microring modulators based on zigzag PN junctions,” IEEE Photonics Technol. Lett. 24(19), 1712–1714 (2012). [CrossRef]
63. K. Padmaraju and K. Bergman, “Resolving the thermal challenges for silicon microring resonator devices,” Nanophotonics 3(4-5), 269–281 (2014). [CrossRef]
64. K. Padmaraju, J. Chan, L. Chen, M. Lipson, and K. Bergman, “Thermal stabilization of a microring modulator using feedback control,” Opt. Express 20(27), 27999–28008 (2012). [CrossRef]
65. L.-W. Luo, N. Ophir, C. P. Chen, L. H. Gabrielli, C. B. Poitras, K. Bergmen, and M. Lipson, “WDM-compatible mode-division multiplexing on a silicon chip,” Nat. Commun. 5, 1–7 (2014). [CrossRef]
66. V. Van, Optical Microring Resonators: Theory, Techniques, and Applications (CRC Press, 2016).