Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Multiplexable all-optical nonlinear activator for optical computing

Open Access Open Access

Abstract

As an alternative solution to surpass electronic neural networks, optical neural networks (ONNs) offer significant advantages in terms of energy consumption and computing speed. Despite the optical hardware platform could provide an efficient approach to realizing neural network algorithms than traditional hardware, the lack of optical nonlinearity limits the development of ONNs. Here, we proposed and experimentally demonstrated an all-optical nonlinear activator based on the stimulated Brillouin scattering (SBS). Utilizing the exceptional carrier dynamics of SBS, our activator supports two types of nonlinear functions, saturable absorption and rectified linear unit (Relu) models. Moreover, the proposed activator exhibits large dynamic response bandwidth (∼11.24 GHz), low nonlinear threshold (∼2.29 mW), high stability, and wavelength division multiplexing identities. These features have potential advantages for the physical realization of optical nonlinearities. As a proof of concept, we verify the performance of the proposed activator as an ONN nonlinear mapping unit via numerical simulations. Simulation shows that our approach achieves comparable performance to the activation functions commonly used in computers. The proposed approach provides support for the realization of all-optical neural networks.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Progress of intelligence hardware accelerates the growth of artificial neural networks (ANNs) [13]. Optical neural networks (ONNs) are considered promising candidates for next-generation high-performance hardware processors due to their inherent advantages compared with traditional electronic neural networks [46]. Though the potential of ONNs for linear operation has been validated, the lack of optical nonlinearity remains an open challenge [710]. In fact, linear optical-matrix operations alone cannot satisfy the computational requirements of actual physical problems [710]. Therefore, efficient optical nonlinear hardware needs to be developed to support the realization of all-optical neural networks (AONNs).

Several schemes have been proposed to implement physical optical nonlinear activation functions (NAFs). For example, Zuo et al. utilized the light-induced quantum interference effect among atomic transitions to achieve nonlinear manipulation [7], which is difficult to combine with existing computing chips. Yang et al. proposed an all-optical nonlinear activator based on two-dimensional (2D) material [9]. While the lifetime and weak nonlinear traits of 2D material limit its practical application in ONNs. Nguyen et al. illustrated a reprogrammable electro-optic NAF [11], which requires efficient optical–electrical-optical conversion and fails to meet the requirements of high-speed optical computing.

Here, we propose an all-optical nonlinear activator based on stimulated Brillouin scattering (SBS). It consists of simple passive components that can be directly combined with existing computing chips. The proposed activator utilizes the ultrafast carrier dynamics mechanism of SBS to implement excellent dynamic and static transmission characteristics. As a proof-of-concept, two machine learning tasks are constructed to examine the capability of our activator as nonlinear units. Simulation shows that the performance of the experimentally measured activation model is comparable to NAFs commonly used in electronic neural networks. Our work, combining machine learning with optics and physics, opens a new front in the ongoing effort to advance optical ANNs theory and hardware.

2. Implementing optical nonlinear activator

Artificial neuron, the basic building units of ANNs, performs mainly linear and nonlinear operations [1214], as shown in Fig. 1(a). Essentially, ANNs refer to a network structure composed of multiple neurons. Correspondingly, the connections between pairs of artificial neurons can be represented as matrix-vector operations [1517]. Physically, photonic integrated circuits, with their interconnectivity and linearity advantages, can offer a suitable hardware platform for realizing high-performance ANNs [9]. As shown in Fig. 1(b), each layer of the ONNs is composed of an optical interference unit (OIU) that implements matrix multiplication and an optical nonlinear unit (ONU) that executes nonlinear activation [8]. Herein, the OIU consists of an array of Mach-Zehnder interferometers, where the array of beam splitters and phase shifters can achieve unitary matrix transforms using interference between different paths of coherent input light [1820]. The ONU consists of a nonlinear activator manufactured by us, which relies on the optical response of devices. In each layer, data propagates by a linear combination followed by the NAF to generate the output optical signal. The partial enlargement of Fig. 1(b) displays the fabric of the SBS-based activator (SBSBA), which consists of a fixed-length single-mode fiber (SMF) and an optical circulator (OCI). This configuration significantly enhances the nonlinear optical properties via the interaction of light with the acoustic waves in a medium [2123].

 figure: Fig. 1.

Fig. 1. (a) Schematic diagram of the structure of the neuron [13]. (b) General ONNs construction consists of optical interference and nonlinear units [8].

Download Full Size | PDF

Our nonlinear activator is fabricated using simple passive components and is a typical three-port device consisting of one input port and two output ports, as shown in Fig. 1(b). The SBS is a representative optical nonlinear process that describes the interaction of pump, acoustic, and Stokes waves [2123]. Under strong pumping, the electrostriction effect induces a periodic modulation of the medium refractive index, resulting in a mutual gain of acoustic and scattered waves. Meanwhile, this positive feedback mechanism makes the scattered light obtain an exponential gain, enabling a backward Stokes wave that is shifted downward with respect to the frequency of the pump light [2123]. The above energy conversion process makes the physical realization of optical nonlinearity possible. Note that, the fiber geometry confines the scattered light to the for- and back-ward directions, corresponding to the forward SBS (FSBS) or backward SBS (BSBS) [2325]. However, the Stokes waves mainly propagate in the reverse direction thanks to the weak forward Brillouin scattering in the optical fibers [2325].

Based on the coupled mode theory, the coupled steady state of Brillouin scattering under the action of continuous wave can be described as [2426],

$$\left\{ \begin{array}{l} \frac{{d{I_{FSBS}}}}{{d{I_{in}}}} ={-} \alpha {I_{FSBS}} - \frac{{{g_B}{I_{FSBS}}{I_{BSBS}}}}{{{A_{eff}}}},\\ \frac{{d{I_{BSBS}}}}{{d{I_{in}}}} = \alpha {I_{BSBS}} - \frac{{{g_B}{I_{FSBS}}{I_{BSBS}}}}{{{A_{eff}}}}, \end{array} \right.$$
where ${\textrm{I}_{\textrm{FSBS}}}$ and ${\textrm{I}_{\textrm{BSBS}}}$ represent the intensities of the pump and the Stokes waves in the fiber. ${\textrm{I}_{\textrm{in}}}$ is the input intensity. ${\textrm{g}_\textrm{B}}$ and $\mathrm{\alpha }$ denote the gain and the attenuation coefficients. ${\textrm{A}_{\textrm{eff}}}$ indicates the cross-sectional area of fiber. Note that, the exact solution of Eq. (1) is nonexistent when considering fiber and pump loss [2426]. Therefore, the nonlinear properties of SBS will be described numerically.

Theoretically, the nonlinear effects in optical fibers depend on the third-order susceptibility [23]. For low-intensity injection, spontaneous Brillouin scattering dominates and the corresponding stimulated scattering is minimal [23]. Once the injected power exceeds a certain threshold, SBS proportional to the injected energy is generated [23]. Therefore, the backward-monitored output power as a function of the launched power ${\textrm{I}_{\textrm{in}}}$ can be expressed as [2326],

$${I_{BSBS}} = \left\{ \begin{array}{l} \quad 0\quad ,\quad {I_{in}} \le u,\\ S \times {I_{in}},\quad {I_{in}} > u, \end{array} \right.$$
where $\textrm{S}$ and $\textrm{u}$ are the slope and translation of the experimental measurements, respectively.

Furthermore, since the energy and momentum must remain constant during the scattering process, the FSBS exhibits the opponent transport features to BSBS [23]. With the aid of SBS, the energy of the pump is transferred to the BSBS, resulting in a saturated FSBS [23]. Thus, the relationship between the ${\textrm{I}_{\textrm{in}}}$ and the forward output power can be expressed as [2326],

$${I_{FSBS}} = 1 - As \times \textrm{exp} \left( { - \frac{{{I_{in}}}}{{Is}}} \right) - ANs,$$
where $\textrm{As}$ and $\textrm{ANs}$ represent the saturable and non-saturable absorption, respectively. $\textrm{Is}$ indicates the saturation intensity, defined as the intensity required in a steady state to reduce the absorption to half of its unbleached value [27].

For the proposed activator, different transmission properties can be observed in distinct directions. Finally, the nonlinear model of our activator can be described by Eqs. (2) and (3).

3. Experiments and discussions

3.1 Dynamic characteristic of the SBSBA

A simple experimental scenario is established to analyze the dynamic capability of our nonlinear activator, as shown in Fig. 2. A continuous wave (CW) light wave with a power level of 15.86 dBm and a wavelength of 1553.32 nm is emitted by the narrow linewidth lasers (NLL) and coupled into a 40-GHz intensity modulator (IM) by a polarization controller (PC). An arbitrary waveform generator (AWG) is utilized to generate the desired baseband signal. The modulated signal is then emitted into the SBSBA and filtered out-of-band noise via a tunable optical filter (TOF). For the proposed SBSBA, a 20.5-km length SMF is used as the nonlinear medium, and an OCI is applied to control the transmission direction. After passing through the 30-GHz photodetector (PD), the demodulated signal is monitored using an oscilloscope (OSC) and a spectrum analyzer (ESA).

 figure: Fig. 2.

Fig. 2. (a) Schematic diagram of the dynamic experiment. Points a, b, c, and d represent corresponding monitoring points. Herein, the different directions (i.e., for- and back-ward) correspond to the FSBS and BSBS processes, respectively. (b) Picture of the experimental setup.

Download Full Size | PDF

To analyze SBS, the output of the AWG is disconnected. We first measure the spectrum of the BSBS at point c, as shown in Fig. 3(a). Apparently, the Stokes wave produces a downward frequency shift of ∼0.0899 nm with respect to the pump source, corresponding to a Brillouin frequency shift (${\mathrm{\nu }_\textrm{B}}$) of about 11.24 GHz, as shown in Fig. 3(b). For observation, the output signal is normalized. Considering the transfer characteristics of SBS, it is reasonable to suspect that the bandwidth of the transmitted signal is limited by ${\mathrm{\nu }_\textrm{B}}$. Similarly, we can measure the output features of the FSBS. Actually, it is unnecessary because FSBS exhibits similar transmission capabilities to BSBS except for the difference in signal intensity. Meanwhile, the SBS phenomenon after loading the baseband signal is also analyzed. Here, the 3.5-GHz signal generated by the AWG is modulated onto the optical carrier. It should be emphasized that, to obtain equivalent energy transfer efficiency, the optical intensity of the carrier and sidebands need to be controlled at a similar degree. Figure 3(c) illustrates the waveforms of the baseband and demodulated signals for 3-cycles. Obviously, the signal is not distorted after transmission through the activator. Figure 3(d) shows the energy transfer process of the modulated signal. It can be seen that under the role of SBS, each sideband of the signal experiences ${\mathrm{\nu }_\textrm{B}}$ frequency shift, which might affect the transmission bandwidth of our device.

 figure: Fig. 3.

Fig. 3. BSBS transmission characteristics. (a) The output optical spectrum of BSBS. Herein, the wavelengths of the pump and scattered waves are 1553.32 nm and 1553.41 nm, respectively. (b) The frequency spectrum of the Brillouin frequency shift. (c) Time domain waveforms of the baseband (point a) and demodulated (point d) signals. Herein, the signal period is 0.2857 ns, corresponding to a frequency of 3.5 GHz. (d) Spectra with (point c) and without (point b) the help of nonlinear activators. Obviously, the modulated signal undergoes SBS effect action.

Download Full Size | PDF

Further, the dynamic transmission bandwidth of the proposed activator is discussed. To do this, we measured the spectrum of the signal at different signal bandwidths, as shown in Fig. 4. In conjunction with the energy transfer properties of SBS (see Fig. 3(d)), it can be observed that the transmission bandwidth of SBSBA is related to the modulation format. Specifically, for double-sideband modulation (DSB), our device remains abiding under the repetition frequency of about 5.12 GHz, as shown in Fig. 4(a). And for single-sideband modulation (SSB), the signal is not distorted when the bandwidth of data is lower than ∼11.24 GHz, as shown in Fig. 4(b). Obviously, the transmission bandwidth of SSB is twice that of DSB, which proves the previous inference. Furthermore, compared to the previous 100 kHz repetition frequency [28], the performance of our activator is improved by orders of magnitude. These results reveal that our activator supports data dynamic transmission.

 figure: Fig. 4.

Fig. 4. Output spectrum of demodulated signals under different modulation formats. (a) DSB-based demodulated signal spectrum. (b) SSB-based demodulated signal spectrum. Obviously, the signal will not generate new frequency components within the transmission bandwidth range.

Download Full Size | PDF

3.2 Static characteristic of the SBSBA

In the following, we analyze the static traits of the fabricated activator. An experimental scheme based on wavelength division multiplexing is constructed to measure the nonlinear output of our device, as shown in Fig. 5. The CW from NLL is transmitted via variable optical attenuators (VOA1 and VOA2) to a 50:50 optical coupler (OCO) so that the coupler output is used as the pump for our device. Among them, the emission wavelengths of NLL1 and NLL2 are 1530.02 and 1553.32 nm, respectively. The VOA is applied to tune the injected optical power of OCO. Meanwhile, an optical isolator (ISO) is placed between the OCO and the activator to prevent the backward Stokes signal from entering the optical source and interfering with the single-mode operation. The proposed activator consists of an optical circulator (OCI) and a nonlinear medium. Here, the 20.5-km SMF is used as the nonlinear medium, while the OCI is utilized to transmit optical signals moving in opposite directions. The optical signal with different wavelengths is separated by the wavelength division multiplexer (WDM) and injected into the optical power meter (PM) for measurement. Finally, different nonlinear responses can be obtained after passing through the nonlinear activator.

 figure: Fig. 5.

Fig. 5. (a) Schematic diagram of experimental measurement of WDM-based nonlinear output. Here, Mi (i = 1, 2, 3, 4) represents the nonlinear model for different directions and different wavelength conditions. (b) Picture of the experimental setup.

Download Full Size | PDF

Figure 6 illustrates the measured nonlinear output under the WDM structure. Figures 6(a) and (b) demonstrate the backward Stokes optical power versus pump optical power. At low pump powers, the backscattered power is dominated by spontaneous scattering, resulting in a low output power of ∼0.00 mW. Once the incident optical power exceeds 2.29 mW, the spontaneous scattering is transformed into stimulated scattering, so that the output power increases approximately linearly with the pump power. Figures 6(c) and (d) describe the output identities of the forward transmitted optical power. Obviously, under the action of SBS, the output power rises sharply at a certain threshold intensity (∼1.30 mW) and remains almost constant at higher power. Furthermore, it can be seen that there is no crosstalk in the output of our device under multiwavelength conditions. Thus, the WDM-based nonlinear activator structures can be constructed to explore more flexible applications.

 figure: Fig. 6.

Fig. 6. WDM-based nonlinear output. (a) and (c) represent the relationship between backscattered and forward-transmitted power and pump power at a wavelength of 1530.02 nm. Here, the pump signal at 1553.32 nm is fixed at different injection powers. (b) and (d) demonstrate the backward and forward transmission characteristics at 1553.32 nm, correspondingly the 1530.02-nm pump signal is fixed at different powers.

Download Full Size | PDF

To assess the stability of our device, the average of multiple measurements is employed as the experimental data. Figure 7 demonstrates the measured output characteristics at 1553.32 nm, indicating a fine agreement between theory and experiment. It can be seen that: (I) Under the action of SBS, the pump power is transferred to the backward Stokes and the forward wave will be saturated. In this process, most of the energy of the pump source is transferred to the backward Stokes wave. Such as, at the 11.42-mW power, the energy transfer efficiencies (ETE) of BSBS and FSBS are 43.82% and 13.31% respectively. (II) Our activator exhibits strong robustness and high stability due to the minimal perturbation. Herein, the fine perturbation is shown in Fig. 7(a), where the standard deviation (SD) is 2.9 × 10−3. (III) The power-dependent nonlinear mapping model is dependent on the transmission direction. Evidently, the transmission behavior of the BSBS can be described by a linear rectification function (see Fig. 7(a)) [29], while the FSBS corresponds to a typical saturable absorption (see Fig. 7(b)) [30].

 figure: Fig. 7.

Fig. 7. Realization of optical nonlinear activation models. (a) The nonlinear mapping of the backward Stokes at 1553.32 nm. Herein the blue curve and red dots are the Eq. (2)-based theoretical analysis and experimental verification, respectively. (b) The forward transmission characteristics at 1553.32 nm. Here, the fitted data (blue line) is simulated according to Eq. (3). Furthermore, the local zoomed views in Fig. 7 show the fine perturbation at an input power of 11.42 mW, where the corresponding SDs of (a) and (b) are 2.92 × 10−3 and 3.37 × 10−3, respectively. The small perturbations reveal the stability of our device.

Download Full Size | PDF

More details are summarized in Tables 1 and 2. Experimental results show that: (I) For various wavelengths, the transmission identities of FSBS and BSBS are similar except for the disparity in signal strength. (II) The nonlinear threshold and the energy transfer efficiency increase with the injected wavelength, which provides guidance for pump source selection. (III) The SDs of various models are on the order of 10-3, indicating the stability of our activator. (IV) For the WDM structure, the pump power is increased after the superposition of signal powers at different wavelengths, and the nonlinear threshold at the corresponding wavelengths is greatly reduced. Specifically, the SBS will be excited when the pump power exceeds 2.29 mW (see Fig. 7(a)), which is smaller than the ∼5 mW in the previous study [21]. These results demonstrate the advantages of our equipment.

Tables Icon

Table 1. Backward transmission characteristic parameters

Tables Icon

Table 2. Forward transmission characteristic parameters

Until now, the optical characteristics of the prepared SBSBA, including dynamic and static properties, have been validated. Our activator exhibits outstanding nonlinear output traits, high stability, WDM behaviors, wide dynamic range, compatibility, and dual-port output. Furthermore, SBS also expresses ultrafast dynamics advantages, occurring over nanoseconds [24]. These features make our device a promising candidate for achieving nonlinear manipulation. Of course, there is still room for improvement in our equipment, mainly in the following aspects. (I) Large transmission loss. In this work, the nonlinear medium used by SBSBA is 20.5-km SMF, meaning an approximately 6-dB loss. In fact, the medium with a higher dispersion coefficient (e.g., photonic crystal fibers [31] or highly nonlinear fibers [23]) can be utilized to instead of SMF, thus achieving the desired nonlinear operation over a shorter distance. Herein, the shorter length means a lower loss. (II) Intensity limitation. For physical systems, the output power will inevitably be limited after passing through the nonlinear activator [24]. Therefore, a trade-off between transmission efficiency and nonlinear mapping is unavoidable. Interestingly, our nonlinear activator exhibits a complementary advantage. As shown in Fig. 7, the intensity of the FSBS is constrained to 1.56 mW at saturation, while BSBS shows a linear increasing tendency. This feature supports the flexible application of SBSBA. (III) Integrability. Despite the made SBSBA can be directly connected to existing computing chips, there are still limitations to the integration of separated devices. Thus, it is necessary to probe the integrable solution for SBSBA to further improve device performance. Actually, the inducing and inhibiting SBS in chip­scale devices has been demonstrated [3234]. As a result, on-chip SBSBA can be implemented using similar techniques [3234]. Regrettably, the related work needs to be carried out in the next stage, thanks to the experimental conditions. In the following, we will explore the application of the proposed activation model to ONNs.

4. Case study

4.1 Hard parameter sharing

In the section, the representative multiple classification tasks are introduced to measure the performance of experiment-based nonlinear activation models. Herein, two different datasets, the MNIST handwritten digit classification and the more complex Fashion-MNIST classification [6], are employed to benchmark machine learning algorithms. Both of them consist of a training set of 60,000 examples and a test set of 10,000 examples [6]. Furthermore, each example is a 28 × 28 grayscale image, associated with a label from 10 classes [6]. We then discuss how ONNs with ONUs compare to state-of-the-art ANNs. To do this, simple fully-connected ANNs with a single 784-neuron hidden layer are constructed to execute classification tasks, as shown in Fig. 8(a). During the training process, stochastic gradient descent and backpropagation algorithms are utilized to train the learnable parameters for optimal performance [35]. Meanwhile, the mean squared error is used as the loss function to quantify the divergence between the predicted and the true outputs [35].

 figure: Fig. 8.

Fig. 8. (a) Fully-connected network architecture for classification tasks. Here, this network has a single hidden layer of 784 neurons. (b) Parameter sharing-based structure. x and y represent the input data and output label of task1. m and n denote the input data and output label of the task2. w1 and w2 indicate the weight matrices corresponding to NAF1 and NAF2. h and z express the preprocessing parameters and their corresponding results.

Download Full Size | PDF

To take full advantage of our activator, a WDM-based ANNs are created, as shown in Fig. 8(b). Each neuron receives a series of input data x, computes their weighted sum using a weight matrix w, and outputs an outcome obtained by applying a NAF [29]. Finally, the learnable parameter w is trained through an optimization algorithm. To realize parameter sharing, we do the following: (I) For the fixed network architecture (see Fig. 8(a)), NAF1 and NAF2 are selected as NAFs to train MNIST and Fashion-MNIST data to obtain the corresponding w1 and w2. It should be emphasized that, for distinct NAFs, the training process is independent of each other. (II) According to the parameters w1 and w2, the preprocessing parameters h can be deduced. Then, we can perform preprocessing operations on the input data m to obtain new data z, as shown in Fig. 8(b). (III) After dealing with the above process, x and z can be modulated onto different wavelengths and delivered by ONNs to obtain the corresponding outputs. In this process, two different results can be obtained simultaneously under the fixed parameter w1, thus enabling parameter sharing, as shown in Fig. 8(b). Note that, all operations in the above process operate on matrices.

Figure 9 compares the simulated performance of the actual (experiment-based NAFs) and benchmark networks (existing NAFs). More details are summarized in Table 3. It can be seen that: (I) The performance of ANNs is significantly ameliorated under the action of NAF. For example, the M1-based classifier achieves an accuracy of 97.43%, far exceeding the 86.41% of the linear system, as shown in Fig. 9(a). (II) The performance of our activator is comparable to that of classical activation functions. Specifically, for the MNIST dataset, the recognition accuracies of M1, M3, Sigmoid and Relu are 97.43%, 97.42%, 97.78% and 97.48%, respectively, as shown in Fig. 9(a). (III) For the WDM-based nonlinear model, the convergence speed of the same output port is almost the same during the training process. For example, after 20 iterations, M1 and M3 obtain optimal performance, as shown in Fig. 9(a). These results illustrate the feasibility of our device as an ONU.

 figure: Fig. 9.

Fig. 9. Learning curves for different NAFs. (a) MNIST and (b) Fashion-MNIST dataset. For comparison, the classification performance of the existing activation functions is also validated.

Download Full Size | PDF

Tables Icon

Table 3. Classification accuracy under different NAFs

4.2 Siamese neural networks

Similarity is an important aspect of computer science, Siamese neural networks (SNNs) demonstrate great potential in discovering similarities between two comparable things [3638]. SNNs are a special architecture that contains two identically configured subnetworks [3941]. Since each subnetwork provides an identical mapping of inputs to potential features, the output of SNNs is a measure of the difference between two latent features [3941]. Inspired by this, a fully-connected SNNs are built to measure the similarity of two input data. Each subnetwork is constructed from a simple feedforward neural network structure using two hidden layers, as shown in Fig. 10. Meanwhile, the Olivetti Research Ltd. database of Faces, created by AT&T Laboratories Cambridge, is employed as the training and testing sets [42]. This dataset contains 400 images from 40 distinct objects. Each example has 92 × 112 pixels, with 256 grey levels per pixel [42]. After processing, 37 subjects are randomly selected as training data and the remaining 3 subjects as test data.

 figure: Fig. 10.

Fig. 10. Configuration of the SNNs. The subnetwork is three fully connected layers, each layer containing 1024, 256 and 5 neurons in turn. Both subnetworks have the same construction and parameters. Herein, the outcome of SNNs is the Euclidean distance (ED), which is used to characterize the differences between potential features.

Download Full Size | PDF

Our goal is to measure the degree of similarity between two inputs. Thus, the input data is preprocessed before encoding it into the network. In this process, we first resize the input image to 100 × 100 pixels. Then, corresponding similarity labels and input pairs are generated based on whether the two images belong to the same category. Though the error backpropagation is also used in the training process, the loss function of SNNs is different from that of traditional ANNs. Normally, SNNs use contrastive loss function [43], which is defined as Loss = $\frac{1}{{2N}}{\sum {y{D^2} + ({1 - y} )\max ({m - D,\;0} )} ^2}$, where y denotes the similarity label. N represents the number of samples. D and m are the ED and expected value, respectively [43]. This algorithm compares the output of two subnetworks via a distance metric. After training, we hope that the distances for similar inputs are as small as possible and the distances for different categories are as large as possible.

Table 4 organizes the similarity of SNNs under different nonlinear models. It can be observed that: (I) Compared with linear systems, SNNs show better recognition performance after introducing nonlinearity. Specifically, for different categories, the larger the ED after introducing nonlinearity, which means the larger the difference between the input pair. (II) For identity input pairs, the ED is equal to zero. (III) When entering similar images, the ED are M3, Linear, M1, M4, M2, Sigmoid and Relu in ascending order for different NAFs. (IV) When inputting different images, for Case-3 and Case-4, M3 and M1 obtained the largest ED, respectively. In fact, we want the output to be small for similar inputs and large for different inputs. Therefore, the above results illustrate that our nonlinear model exhibits better performance.

Tables Icon

Table 4. Similarity under different NAFs

Two different tasks have been constructed to demonstrate the performance of our model. It is clear that the proposed scheme is effective and can meet the needs of practical applications. Likewise, wider applications based on the proposed framework can be explored. It should be emphasized that the performance of ONNs can be further improved by optimizing the network structure and algorithm. Since the main objective of this work is to explore the physical realization of optical nonlinearity, related work is carried out in the next stage.

5. Conclusion

This work presents an effective and simple method to realize all-optical nonlinear activation. The key of our proposal is to exploit the optical characteristics of the nonlinear medium to perform desired transformations, meaning no additional energy is required in this process. We exploited the ultrafast dynamics behavior of SBS, resulting in a dynamic response bandwidth of 11.24 GHz, a minimum threshold power of 2.29 mW, and two different NAFs (i.e. Relu and saturable absorption models). Additionally, our activator demonstrates impressive WDM and dual-port transmission properties, which enhances the flexibility of our device. As a proof-of-concept, ANNs with different frameworks are employed for classification and face recognition tasks to illustrate their capabilities and feasibility. Simulation shows that our method can achieve comparable performance to classical nonlinear activation functions, even in deep networks with complex structures. Our approach provides strong support for the realization of true all-optical neural networks.

Funding

National Key Research and Development Program of China (2021YFA1401100); Innovation Group Project of Sichuan Province (20CXTD0090).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. F. Böhm, D. Alonso-Urquijo, G. Verschaffelt, et al., “Noise-injected analog Ising machines enable ultrafast statistical sampling and machine learning,” Nat. Commun. 13(1), 5847–5859 (2022). [CrossRef]  

2. L. Mennel, J. Symonowicz, S. Wachter, et al., “Ultrafast machine vision with 2D material neural network image sensors,” Nature 579(7797), 62–66 (2020). [CrossRef]  

3. G. Wetzstein, A. Ozcan, S. Gigan, et al., “Inference in artificial intelligence with deep optics and photonics,” Nature 588(7836), 39–47 (2020). [CrossRef]  

4. X. Xu, M. Tan, B. Corcoran, et al., “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature 589(7840), 44–51 (2021). [CrossRef]  

5. K. Liao, Y. Chen, Z. Yu, et al., “All-optical computing based on convolutional neural networks,” Opto-Electron. Adv. 4(11), 200060 (2021). [CrossRef]  

6. X. Guo, T. D. Barrett, Z. M. Wang, et al., “Backpropagation through nonlinear units for the all-optical training of neural networks,” Photonics Res. 9(3), B71 (2021). [CrossRef]  

7. Y. Zuo, B. Li, Y. Zhao, et al., “All-optical neural network with nonlinear activation functions,” Optica 6(9), 1132–1137 (2019). [CrossRef]  

8. Y. Shen, N. C. Harris, S. Skirlo, et al., “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]  

9. Z. Yang, W. Tan, T. Zhang, et al., “MXene-based broadband ultrafast nonlinear activator for optical computing,” Adv. Opt. Mater. 10(1), 2200714 (2022). [CrossRef]  

10. L. G. Wright, T. Onodera, M. M. Stein, et al., “Deep physical neural networks trained with backpropagation,” Nature 601(7894), 549–555 (2022). [CrossRef]  

11. M. M. Pour Fard, I. A. D. Williamson, M. Edwards, et al., “Experimental realization of arbitrary activation functions for optical neural networks,” Opt. Express 28(8), 12138–12148 (2020). [CrossRef]  

12. G. Mourgias-Alexandris, A. Tsakyridis, N. Passalis, et al., “An all-optical neuron with Sigmoid activation function,” Opt. Express 27(7), 9620–9630 (2019). [CrossRef]  

13. K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,” Nature 575(7784), 607–617 (2019). [CrossRef]  

14. J. Jiang, M. Chen, and J. A. Fan, “Deep neural networks for the evaluation and design of photonic devices,” Nat. Rev. Mater. 6(8), 679–700 (2020). [CrossRef]  

15. B. J. Shastri, A. N. Tait, T. F. d. Lima, et al., “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15(2), 102–114 (2021). [CrossRef]  

16. H. Zhou, J. Dong, J. Cheng, et al., “Photonic matrix multiplication lights up photonic accelerator and beyond,” Light: Sci. Appl. 11(1), 30–50 (2022). [CrossRef]  

17. Q. Zhang, H. Yu, M. Barbiero, et al., “Artificial neural networks enabled by nanophotonics,” Light: Sci. Appl. 8(1), 42–55 (2019). [CrossRef]  

18. M. Reck, A. Zeilinger, H. J. Bernstein, et al., “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]  

19. H. Zhang, M. Gu, X. D. Jiang, et al., “An optical neural chip for implementing complex-valued neural network,” Nat. Commun. 12(1), 457–467 (2021). [CrossRef]  

20. T. W. Hughes, M. Minkov, Y. Shi, et al., “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864–871 (2018). [CrossRef]  

21. D. Cotter, “Stimulated Brillouin scattering in monomode optical fiber,” J. Opt. Commun. 4(1), 10–19 (1983). [CrossRef]  

22. A. L. Gaeta and R. W. Boyd, “Stochastic dynamics of stimulated Brillouin scattering in an optical fiber,” Phys. Rev. A 44(5), 3205–3209 (1991). [CrossRef]  

23. A. Kobyakov, M. Sauer, and D. Chowdhury, “Stimulated Brillouin scattering in optical fibers,” Adv. Opt. Photonics 2(1), 1–59 (2010). [CrossRef]  

24. B. J. Eggleton, C. G. Poulton, and R. Pant, “Inducing and harnessing stimulated Brillouin scattering in photonic integrated circuits,” Adv. Opt. Photonics 5(4), 536–587 (2013). [CrossRef]  

25. M. O. van Deventer and A. J. Boot, “Polarization properties of stimulated Brillouin scattering in single-mode fibers,” J. Lightwave Technol. 12(4), 585–590 (1994). [CrossRef]  

26. L. Chen and X. Bao, “Analytical and numerical solutions for steady state stimulated Brillouin scattering in a single-mode fiber,” Opt. Commun. 152(1-3), 65–70 (1998). [CrossRef]  

27. Q. Bao, H. Zhang, Z. Ni, et al., “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4(3), 297–307 (2011). [CrossRef]  

28. B. Wu, H. Li, W. Tong, et al., “Low-threshold all-optical nonlinear activation function based on a Ge/Si hybrid structure in a microring resonator,” Opt. Mater. Express 12(3), 970–980 (2022). [CrossRef]  

29. F. Vernuccio, A. Bresci, V. Cimini, et al., “Artificial intelligence in classical and quantum photonics,” Laser Photonics Rev. 16(5), 2100399 (2022). [CrossRef]  

30. T. Tan, X. Jiang, C. Wang, et al., “2D material optoelectronics for information functional device applications: status and challenges,” Adv. Sci. 7(11), 2000058 (2020). [CrossRef]  

31. P. Dainese, P. S. J. Russell, N. Joly, et al., “Stimulated Brillouin scattering from multi-GHz-guided acoustic phonons in nanostructured photonic crystal fibres,” Nat. Phys. 2(6), 388–392 (2006). [CrossRef]  

32. M. Merklein, I. V. Kabakova, T. F. S. Büttner, et al., “Enhancing and inhibiting stimulated Brillouin scattering in photonic integrated circuits,” Nat. Commun. 6(1), 6396 (2015). [CrossRef]  

33. R. Botter, K. Ye, Y. Klaver, et al., “Guided-acoustic stimulated Brillouin in silicon nitride photonic circuits,” Sci. Adv. 8(40), 2196–2202 (2022). [CrossRef]  

34. H. Shin, W. Qiu, R. Jarecki, et al., “Tailorable stimulated Brillouin scattering in nanoscale silicon waveguides,” Nat. Commun. 4(1), 1944 (2013). [CrossRef]  

35. W. Ma, Z. Liu, Z. A. Kudyshev, et al., “Deep learning for the design of photonic structures,” Nat. Photonics 15(2), 77–90 (2021). [CrossRef]  

36. P. Shao, T. Liu, F. Che, et al., “Adaptive pseudo-Siamese policy network for temporal knowledge prediction,” Neural Netw. 160(1), 192–201 (2023). [CrossRef]  

37. S. Pan, C. Zhu, X. M. Zhao, et al., “A deep Siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments,” Nat. Commun. 13(1), 2326 (2022). [CrossRef]  

38. Z. Ma, B. Wang, L. Huang, et al., “Dimension-expanded-based matching method with Siamese convolutional neural networks for gravity-aided navigation,” IEEE Trans. Ind. Electron. 70(10), 10496–10505 (2023). [CrossRef]  

39. Y. Qiao, Y. Wu, F. Duo, et al., “Siamese neural networks for user identity linkage through web browsing,” IEEE Trans. Neural Netw. Learn. Syst. 31(8), 2741–2751 (2020). [CrossRef]  

40. R. Chiplunkar and B. Huang, “Siamese neural network-based supervised slow feature extraction for soft sensor application,” IEEE Trans. Ind. Electron. 68(9), 8953–8962 (2021). [CrossRef]  

41. M. Byra, K. D. Sobczak, Z. Klimonda, et al., “Early prediction of response to neoadjuvant chemotherapy in breast cancer sonography using Siamese convolutional neural networks,” IEEE J. Biomed. Health Inform. 25(3), 797–805 (2021). [CrossRef]  

42. F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model for human face identification,” Proceedings of IEEE Workshop on Applications of Computer Vision, 138–142 (1994).

43. S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 539–546 (2005).

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (10)

Fig. 1.
Fig. 1. (a) Schematic diagram of the structure of the neuron [13]. (b) General ONNs construction consists of optical interference and nonlinear units [8].
Fig. 2.
Fig. 2. (a) Schematic diagram of the dynamic experiment. Points a, b, c, and d represent corresponding monitoring points. Herein, the different directions (i.e., for- and back-ward) correspond to the FSBS and BSBS processes, respectively. (b) Picture of the experimental setup.
Fig. 3.
Fig. 3. BSBS transmission characteristics. (a) The output optical spectrum of BSBS. Herein, the wavelengths of the pump and scattered waves are 1553.32 nm and 1553.41 nm, respectively. (b) The frequency spectrum of the Brillouin frequency shift. (c) Time domain waveforms of the baseband (point a) and demodulated (point d) signals. Herein, the signal period is 0.2857 ns, corresponding to a frequency of 3.5 GHz. (d) Spectra with (point c) and without (point b) the help of nonlinear activators. Obviously, the modulated signal undergoes SBS effect action.
Fig. 4.
Fig. 4. Output spectrum of demodulated signals under different modulation formats. (a) DSB-based demodulated signal spectrum. (b) SSB-based demodulated signal spectrum. Obviously, the signal will not generate new frequency components within the transmission bandwidth range.
Fig. 5.
Fig. 5. (a) Schematic diagram of experimental measurement of WDM-based nonlinear output. Here, Mi (i = 1, 2, 3, 4) represents the nonlinear model for different directions and different wavelength conditions. (b) Picture of the experimental setup.
Fig. 6.
Fig. 6. WDM-based nonlinear output. (a) and (c) represent the relationship between backscattered and forward-transmitted power and pump power at a wavelength of 1530.02 nm. Here, the pump signal at 1553.32 nm is fixed at different injection powers. (b) and (d) demonstrate the backward and forward transmission characteristics at 1553.32 nm, correspondingly the 1530.02-nm pump signal is fixed at different powers.
Fig. 7.
Fig. 7. Realization of optical nonlinear activation models. (a) The nonlinear mapping of the backward Stokes at 1553.32 nm. Herein the blue curve and red dots are the Eq. (2)-based theoretical analysis and experimental verification, respectively. (b) The forward transmission characteristics at 1553.32 nm. Here, the fitted data (blue line) is simulated according to Eq. (3). Furthermore, the local zoomed views in Fig. 7 show the fine perturbation at an input power of 11.42 mW, where the corresponding SDs of (a) and (b) are 2.92 × 10−3 and 3.37 × 10−3, respectively. The small perturbations reveal the stability of our device.
Fig. 8.
Fig. 8. (a) Fully-connected network architecture for classification tasks. Here, this network has a single hidden layer of 784 neurons. (b) Parameter sharing-based structure. x and y represent the input data and output label of task1. m and n denote the input data and output label of the task2. w1 and w2 indicate the weight matrices corresponding to NAF1 and NAF2. h and z express the preprocessing parameters and their corresponding results.
Fig. 9.
Fig. 9. Learning curves for different NAFs. (a) MNIST and (b) Fashion-MNIST dataset. For comparison, the classification performance of the existing activation functions is also validated.
Fig. 10.
Fig. 10. Configuration of the SNNs. The subnetwork is three fully connected layers, each layer containing 1024, 256 and 5 neurons in turn. Both subnetworks have the same construction and parameters. Herein, the outcome of SNNs is the Euclidean distance (ED), which is used to characterize the differences between potential features.

Tables (4)

Tables Icon

Table 1. Backward transmission characteristic parameters

Tables Icon

Table 2. Forward transmission characteristic parameters

Tables Icon

Table 3. Classification accuracy under different NAFs

Tables Icon

Table 4. Similarity under different NAFs

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

{ d I F S B S d I i n = α I F S B S g B I F S B S I B S B S A e f f , d I B S B S d I i n = α I B S B S g B I F S B S I B S B S A e f f ,
I B S B S = { 0 , I i n u , S × I i n , I i n > u ,
I F S B S = 1 A s × exp ( I i n I s ) A N s ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.