Machine Learning assisted aggregation schemes for optical cross-connect in hybrid electrical/optical data center networks

Li Zhao; and Peng Shi

doi:10.1364/OSAC.400942

1. Introduction

To meet the ever-increasing traffic demand with an annual growth rate of 25$\%$ [1], the number of servers in the data center has increased dramatically. Data center network (DCN) requirements include high scalability, full bisection bandwidth, low energy consumption, and fast switching speed. To cope with these requirements, some network operators focus on developing all-optical networks [2] or hybrid electrical/optical networks [3] to replace the traditional purely electrical DCN designs. Six technologies have been developed and based either on 2D or 3D micro electrical mechanical switches (MEMS) [4], tunable lasers and arrayed waveguide grating routers [5], liquid crystal on silicon [6], semiconductor optical amplifiers (SOA) [7], Mach-Zehnder interferometers [8], and microring resonators [9]. However, recent advancements in fast and large-port-count all-optical switches have not been able to meet the exponential increase in bandwidth demand. In this context, the legend hybrid network architecture, such as Helios [10] or c-Through [3], is still a promising candidate. They combine the advantages of the low latency of an electrical fast packet switching (EPS) and high bandwidth of an optical slow circuit switching (OCS).

One of the significant challenges is to make a slow optical circuit-switched network also available for delay-sensitive applications. In an existing network like Mahout [11], EPS provides all-to-all communication for delay-sensitive flows, and OCS establishes a point-to-point connection for each throughput-sensitive flow. Only the "elephant flows", long-lasting flows that are sensitive to throughput can benefit from the advantage of OCS. The challenge is two-fold. First, delay-sensitive "mouse" flows may transfer messages between all source-destination pairs, which may reach tens of thousands [12]. However, a nanosecond scale optical switch has only very low port counts, about 32 [13]. Therefore, it is impossible to establish an optical path for each source-destination pair. Second, data center applications place strict requirements on interconnection networks. The millisecond-level reconfiguration latency of 3D-MEMS based OCS is insufficient to satisfy dynamic delay-sensitive flows. After a long period of configuration, most delay-sensitive traffic will expire. All these drawbacks seriously hinder the practical implementation of the hybrid network.

To effectively carry the delay-sensitive traffic, the hybrid network should know the flow type in advance. However, traffic information collection of a statistical distribution of all flows lagged significantly behind flow detection algorithms. Recently, machine learning (ML) algorithms are considered a promising solution to this problem. Many supervised ML algorithms have been developed [14–17], such as decision tree (DT), Naive Bayes (NB), support vector machine (SVM), and k-nearest neighbors (KNN) to estimate all flow types. The prediction accuracy is improved to more than 90$\%$ [18]. However, using machine learning predictions alone is insufficient. The reason is that this technique is unlikely to be accurate and perfect in the case of small data sets. For example, if the algorithm treats a small mouse flow as a giant elephant flow, the prediction result will result in a waste of high bandwidth. Besides, most machine learning algorithms run on a central management system [19]. Continuous monitoring of the network has driven the growing demand for data exchange between the central network management system and the edge nodes. This method will increase network overhead. Sending a response may take several hundred milliseconds instead of one hundred nanoseconds [20].

To this end, machine learning assisted traffic aggregation schemes help to solve these problems effectively. First, traffic aggregation schemes can direct multiple flows with the same source-destination pair into several wavelengths. Therefore, the total number of ports is shared among all the flow types. Intuitively, OCS does not need to be frequently turned on or off in response to the continually changing traffic of mice flows. Since we only allow aggregated flows to apply for optical resources, not individual flow types. Finally, the aggregation scheme can relax the accuracy requirement of flow prediction. For example, we assume that a machine learning algorithm treats small mouse flow as a giant elephant flow. The light path established in this way can not only allow the mouse to pass but also allows the elephant to pass. The maturity of new optical switches and machine learning algorithms pave the way for high-bandwidth, low-latency DCNs.

Making optical circuit switching suitable for handling highly dynamic and profoundly changing traffic is a considerable challenge. This motivation drives the development of a hybrid electrical$/$optical network towards high bandwidth and low latency. Compared with the traditional non-aggregation scheme, we provide two machine learning assisted aggregation schemes. The first one is to design optical cross-connect switches to increase the throughput of the circuit-switched network. In this solution, the optical cross-connect serves both delay-sensitive traffic flows and delay-tolerant traffic flows. As the network throughput rises rapidly, the number of ports of the optical switch remains unchanged. The second scheme is to add small port counts, which maximizes throughput while relaxing the requirements for accurate machine learning algorithms. In this paper, we have a set of four machine learning algorithms, and only the most suitable one is selected at a time. We deploy a machine learning algorithm at edge nodes instead of a central network management system. Therefore, we can simultaneously reduce network overhead and latency. Both aggregation schemes outperform the traditional non-aggregation scheme in terms of throughput, delay, and flow completion time.

The rest of the paper is organized as follows. Section 2 introduces a hybrid network, a control plane, and two proposed aggregation schemes. Section 3 describes the traffic flow characteristics of the servers in a data center and three indicators to evaluate performance. Then, we show how to implement the aggregation schemes to meet the DCN requirements successfully. Section 4 presents the simulation parameters used to obtain the real DCN traffic flows. Key indicators of a hybrid network are collected. We compare their results with and without an aggregation scheme. Finally, Section 5 concludes.

2. Hybrid network, a control plane, and two proposed aggregation schemes

2.1 Hybrid network and an edge node structure

The structure of a hybrid network with the hardware foundation for handling delay-sensitive flows is shown in Fig. 1. To this end, an EPS still adopts the most widely used tree-based structure [21], as shown in the upper left of Fig. 1. It can ensure fast switching and full connection of tens of thousands of nodes. However, as a DCN continues to grow, increasing the bandwidth of EPS will require a substantially high number of switches and long cables. Therefore, commercial OXC or MEMS switches are introduced. High-bandwidth OCS replaces part of the core switching layer, thereby providing increased bandwidth with reduced costs. However, the traditional method suffers from a high reconfiguration time (milliseconds [3,14]). Therefore, we incorporate a nanosecond optical cross-connect (OXC) to a hybrid network. It enables the nanosecond-scale optical circuit switching with multiple SOA switches [22], see the upper right part of Fig. 1. This replacement eliminates the millisecond switches from OCS and shortens the switching speed by 1,000,000 times. Also, the use of wavelength division multiplexing technology has expanded the scalability of the optical network, making the number of optical connections accepted by OXC increased by a hundred times.

Fig. 1. A hybrid electrical/optical data center network.

Download Full Size | PDF

In a conventional design, racks are grouped in clusters, and ToR switches are interconnected each other utilizing cluster switches. In this environment, optical technology is used only for point-to-point links between the switches of a cluster. In our work, the edge node structure has been redesigned to support the hybrid network with multiple OCSes, see the bottom part of Fig. 1. Thus, we could accommodate the same number of servers in a rack as in a cluster. In this sense, we use rack and cluster interchangeably. A field-programmable gate array (FPGA) board is added in front of the top-of- rack (ToR). The FPGA-based ToR includes three types of interfaces. It uses 1GbE optical ports as server interfaces and control plane interfaces and these interfaces act as agents for the software-defined network (SDN). Unlike [23], it uses a 10GbE optical ports to connect EPS and a 100Gbps optical port with a set of DWDM channels to connect OCS. Once the input buffer receives a flow, the FPGA board will sample the first 30-40 packets of the flow to obtain its statistical information. Similar to [23], we also build a flow table to store all relevant information about the connections to the local server. Ten features generate a feature vector. Then, a machine learning algorithm accepts the feature vector as input and distinguishes its flow type as output.

The training process in this model includes five parts. First, the system generates a total of 100 epochs. In each period, the system randomly selects among 1000 flows based on the traffic load. Second, a user-defined function is needed to perform data cleaning. Before training starts, we should appropriately delete some duplicate data with a high degree of correlation to avoid the algorithm reporting errors. However, this deletion process should not be excessive, because when the amount of data is insufficient, the algorithm will also report an error. Third, according to the individual requirements of the input function, part of the data input is converted from numerical values to binary symbols {-1,+1}. Since some Matlab functions only accept two-column symbols as input, such as SVM and NB method. They should be used iteratively to achieve multi-symbol classification. Fourth, by adjusting the model parameters (with an accuracy of 80$\%$ as the reference target), the model is locally fine-tuned. Fifth, use the trained model to predict the data, convert it back to a numerical value, and compare it with the original data to measure accuracy.

As mentioned earlier, using machine learning predictions alone is insufficient. The reason is that this technique cannot be accurate and perfect in the case of small data sets [17–20]. Worse, if the algorithm treats a small mouse flow as a giant elephant flow, the prediction result will result in a waste of bandwidth. Using traffic aggregation scheme that directs multiple flows with the same source-destination pair to several wavelengths helps to solve these problems effectively. Unlike [24], our aggregation scheme accepts the ML predictions as inputs. By searching all lightpaths with the same source-destination pair, our proposed scheme finds the aggregate flows as outputs. Specifically, the product of the flow duration and the inter-arrival time determines the amount of traffic. Then, all flows are sorted in decreasing order of traffic volume. An aggregate flow will be found when the flow proportion exceeds the flow on average. By sending a request, we should notify the central controller that an aggregate flow has occurred and request action.

Once an aggregate flow is detected, the central controller initiates the decision-making process. In specific, it calls a series of operations to establish the optical path. First, it sorts all aggregate flows in descending order. Second, it periodically enables the optical connection of several aggregate flows that are urgently needed. Then, the centralized topology manager is responsible for reconfiguring the connection. The central controller uses SDN commands to deploy optical switches remotely. After receiving the report from the SDN control agent, the optical switch will update its routing table. Therefore, a new routing table entry is created for the associated ToR in the routing manager. Finally, these ToRs can carry the traffic over the OCS to their destinations.

This method has several advantages. First, it simplifies the coordination and orchestration of hybrid networks. The reason is that a machine learning algorithm runs on the edge nodes, not on the centralized controller. In this way, it avoids the high overhead caused by frequent communication between the servers with the same source-destination pair. It only needs to allocate resources to several much-needed aggregated flows periodically. Therefore, it prevents frequent reconfiguration due to small time-scale traffic. Besides, physically, the EPS interface and multiple OCSes interfaces share the same IP address. After receiving the data, the receiver can directly use the data without additional sorting process.

2.2 Proposed traffic aggregation schemes and a central controller

One of the significant challenges is to make a slow optical circuit-switched network also available for delay-sensitive applications. In this regard, traffic aggregation schemes have been considered as the most promising solution [24]. We need to eliminate the impact of slow optical switching on delay-sensitive applications and the data center network needs a novel solution. They divide, aggregate, and transport aggregate flows all optically to maximize throughput of OCS with a fixed port count. Meanwhile, they separate, deflect and transport flows electrically to meet the latency requirements. A high bandwidth, low latency hybrid network is obtained using the following steps, see Fig. 2.

Fig. 2. Flow chart implementing the proposed aggregation schemes in a hybrid network.

Download Full Size | PDF

The first selector divides all flows into two categories. If a flow belongs to an aggregate flow, OCS accepts it as an input, then switching is done all optically. At the same time, EPS takes the rest of flows as inputs, then switching is realized electronically. In this way, we can offload most of the high-volume traffic to OCS, while avoiding severe congestion in EPS. Consider the traditional EPS in Fig. 2 on the far right [21]. It has a three-layer architecture, namely a core layer, an aggregation layer and an edge layer. Unlike the EPS part, OCS usually does not have any aggregation switch on the edge node [23]. Here, an FPGA board performs the flow aggregation function. The aggregation process starts without the decision is returned from the central controller to reduce the waiting time. Delay-sensitive flows are placed in the front, while the delay-tolerant flows are placed in the back, to avoid head-of-line blocking in the next step. Finally, the OCS accepts aggregate flows as inputs (blue), and the EPS takes the non-aggregated flows as inputs (orange).

The second selector is necessary to separate and deflect the delay-sensitive part of the aggregate flows to EPS. As mentioned earlier, establishing an OCS connection is very time-consuming. Our goal is to build a low-latency network. With the second selector, the delay-sensitive part is separated from the whole flow. The deflected packets jump to the new queue in EPS. They have a higher scheduling priority than other existing queues. In the electrical domain, the intracluster traffic can either be exchanged through ToR switches or the edge layer switch. Intercluster traffic is switched through the edge node and the aggregation layer. Finally, all traffic entering and leaving the data center passes through all three layers. To prevent EPS network congestion caused by traffic deflecting from OCS, ECMP is usually used to balance traffic load. If there is no local congestion, the delay in the EPS network is relatively low. As a result, the long switching time of OCS has relatively little effect on delay-sensitive flows. When deflecting a packet from OCS to EPS, the time penalty is increased by ten nanoseconds [25].

For the first selector, we use two aggregation schemes to replace the non-aggregation scheme, see Fig. 2 on the far left. The goal is to allow an aggregation flow to apply for optical spectrum, not a single flow type. In the traditional non-aggregation scheme [11], OCS is dedicated to elephant flows. No other types of flows are allowed to use OCS at any time. The second scheme is quite similar to the scheme in [24]. Only when an elephant flow is detected can OCS be allowed for other delay-sensitive flows with the same source-destination pair. The scheduling policy uses the first-in-first-out algorithm. The third scheme allows all aggregate flows to use spectrum in the optical network, whether or not it includes elephant flows. If the traffic volume is large enough, the aggregated flow can apply for spectrum in OCS.

For the second selector, the switching strategy is modified in several aspects. First, we set this lifetime to one second, which is much longer than the average flow duration [26]. We set this lifetime to avoid reconfiguring the optical path too frequently. After receiving an acknowledgement from the central controller, a life cycle begins. Unlike [26], once OCS is ready, regardless of the delay sensitivity, the entire aggregate flow will be transmitted through OCS and no longer consume EPS resources. If the central controller does not grant permission, the delay-sensitive part of the aggregate flow will take an alternative path to EPS. (orange link) The rest part will update its priority and apply for spectrum again next time. In the next cycle, we release all acquired resources, and all wavelengths are waiting for reallocation. Once released, the optical connection is unable to deliver any packet. Therefore, the remaining delay-sensitive flows will use alternative routes through the EPS. Unlike [26], the rest of delay-tolerant flows will not pass the EPS. Instead, they will be given a higher priority and wait for confirmation in the next cycle.

Finally, we use a two-step strategy in the central controller. First, the strategy solves traffic congestion by using a fair bandwidth allocation algorithm [27]. To ensure higher fairness, we try to set up a connection for each source-destination pair. When the traffic load is small or moderate, we will adjust the remaining resources to meet the demands as much as possible. The principle behind it is to preferentially allocate more bandwidth resources to carry more significant aggregate flows [28]. Second, wavelength contention is resolved by modelling the OCS network as a crossbar [29]. Assume that an input port is only connecting to one output port in a reconfiguration. This practice eliminates wavelength contention and maximizes the amount of traffic that OCS can accommodate. Therefore, the delay in the OCS will be minimized.

3. Traffic characteristics and main indicators

Understanding traffic characteristic plays an essential role in evaluating a hybrid network of different aggregation schemes. Today, Microsoft data centres may process 19 different types of flows [30]. For simplicity, we consider the most common traffic flows here, including elephant flows (EF), co-flows (CF), and mice flows (MF). The elephant flow is tremendous, and its maximum size may exceed 25MB [31]. The co-flow is a collection of parallel, delay-sensitive flows whose total size are less than 8MB [32]. The small flow often termed as mice flow, is delay-sensitive request with a maximum size of 1MB. We denote elephant flows as f $_{e}$(i,j), co-flows as f $_{c}$(i,j), and mice flows as f $_{m}$(i,j), where i indicates the source node and j indicates the destination node.1$\leq$i,j$\leq$N, where N denotes the number of server racks.

Leading indicators- network throughput, network delay, and flow completion time - are a set of measurable values. They demonstrate the ability of an aggregation scheme to achieve key network targets. Network throughput is the dominant indicator of elephant flow because it is in high demand and requires much bandwidth. End-to-end network latency has a significant impact on mouse flow, which uses short requests that are sensitive to latency. Flow completion time represents the sum of all seconds from the first packet to the last packet of coexisting flows [32]. The simulation is carried out independently for each problem, and analysis models will be introduced in a later section. Table 1. summarizes the leading indicators and their targets [33,34].

Table 1. Compared features of traffic flow, indicators and targets.

View Table | View all tables in this article

Table 1 shows that a small ratio of flows carries a large proportion of data center traffic. Therefore, it is reasonable to transfer most of the traffic through OCS with an aggregation scheme. However, the aggregation scheme is the first step towards the goal of high-bandwidth, low latency network. Latency is a stringent requirement, and a scheduling algorithm is needed to prevent mice flows from queuing behind an elephant flow. Generally, the flow completion delay is 1,000 times the delay requirement. Unfortunately, it is one of the most challenging problems that may arise in co-flows. These flow patterns could not be maintained after using different paths. Therefore, it is not easy to know in advance when the last flow reaches its destination.

3.1 Throughput of the optical circuit switch

Let us formulate the problem of maximizing the OCS throughput beyond 70$\%$ with a relatively small-scale optical switch. Similar to the studies in [24], we intentionally divide the traffic volume R(i, j) into two parts. Let A$_{l}^{(k)}$(i,j) be the network ports required by all source-destination pairs of the k$^{th}$ flow, where i,j$\in${1,2, $\ldots$,N}, k$\in${1,2,3}, and N is the number of server racks. Then, let f (i,j) represent the estimated traffic that needs to be transmitted in the network. A machine learning prediction algorithm is used to detect flow type at the edge nodes. We decide to store its results as the aggregation model’s input. Also, Table 2 lists some necessary symbols, variables and their definitions.

Table 2. Notations and their definitions.

View Table | View all tables in this article

Objective:

The optimization problem is to minimize the total port counts A$_{l}$ (i,j) while maximizing the throughput of an optical network.

Minimize: $A_{l} (i,j)= A_{l}^{(k1)}(i,j)\cup A_{l}^{(k2)}(i,j)$

Constraints:

1. The target of the traffic volume related to OCS $(1)$$R^{OCS} (i,j)\geq \rho R(i,j)$$$
2. The hybrid network constraints $(2)$$R(i,j)=R^{OCS}(i,j)+R^{EPS}(i,j)$$$
3. The non-aggregation scheme constraints(EF only), where $(3)$$R^{OCS}(i,j)=A_{l}^{(1)}(i,j)=\sum_{l}f_{e}(i,j)$$$ $(4)$$R^{EPS}(i,j)=A_{l}^{(2)}(i,j)\bigcup A_{l}^{(3)}(i,j)=\sum_{l}f_{c}(i,j)+\sum_{l}f_{m}(i,j)$$$
4. The first aggregation scheme constraints (EF agg), where $(5)$$\begin{aligned} R^{OCS}(i,j) & = A_{l}^{(1)}(i,j)\bigcup A_{(i,j)\in A_{l}^{1}} ^{(2)}(i,j)\bigcup A_{(i,j)\in A_{l}^{1}}^{(3)}(i,j)\\ & = \sum_{l}f_{e}(i,j)+\xi _{2}\sum_{l}f_{c}(i,j)+\xi _{3}\sum_{l}f_{m}(i,j) \end{aligned}$$$ $(6)$$\begin{aligned} R^{EPS}(i,j) & =A_{(i,j)\notin A_{l}^{(1)}} ^{(2)}(i,j)\bigcup A_{(i,j)\notin A_{l}^{(1)}}^{(3)}(i,j)\\ & =(1-\xi _{2})\sum_{l}f_{c}(i,j)+(1-\xi _{3})\sum_{l}f_{m}(i,j) \end{aligned}$$$
5. The second aggregation scheme constraints (CF+EF agg), where $(7)$$\begin{aligned} R^{OCS}(i,j) & =A_{l}^{(1)}(i,j)\bigcup A_{l}^{((2))}(i,j)+A_{(i,j)\in A_{l}^{(1)}\bigcup A_{l}^{(2)}}^{(3)}(i,j)\\ & =\sum_{l}f_{e}(i,j)+\sum_{l}f_{c}(i,j)+\xi_{3}\sum_{l}f_{m}(i,j) \end{aligned}$$$ $(8)$$R^{EPS}(i,j)=A_{(i,j)\notin A_{l}^{(1)}\bigcup A_{l}^{(2)}} ^{(3)}(i,j)=(1-\xi_{3})\sum_{l}f_{c}(i,j)$$$
6. The ML assisted aggregation scheme constraints (CF+EF agg), where $(9)$$R^{OCS}(i,j)=\xi _{1}^{Ml}\sum_{l}f_{e}(i,j)+\xi _{2}^{Ml}\sum_{l}f_{c}(i,j)+\xi _{3}^{Ml}\sum_{l}f_{m}(i,j)$$$ $(10)$$\begin{aligned} R^{EPS}(i,j) & =(1-\xi _{1}^{Ml})\sum_{l}f_{e}(i,j)+(1-\xi _{2}^{Ml})\sum_{l}f_{c}(i,j)\\ & +(1-\xi _{3}^{Ml})\sum_{l}f_{m}(i,j) \end{aligned}$$$

(1) indicates that traffic volume related to OCS should exceed 70$\%$. (2)-(4) represent the constraints of the traditional non-aggregation scheme. (5)-(6) are the constraints put by the first aggregation scheme with OCS only accepts EF flows as inputs. (7)-(8) are the constraints placed by the second aggregation scheme with an aggregation target of EF flows. (9)-(10) show that constraints of a machine learning assisted aggregation scheme with some inevitable prediction errors. The first aggregation scheme has been reported elsewhere [24]. It improves the throughput of an OCS without increasing the port count of OCS. However, even if the accuracy of co-flows reaches 100$\%$, i.e., $\xi _{2}^{Ml}$ equals 1, it may not necessarily ensure that the throughput of OCS can exceed 70$\%$. In contrast, the second aggregation scheme can not only tolerate inaccurate flow prediction algorithm to a certain extent but also maximize the network throughput. If accuracies of both flows have a fixed value above 78$\%$, i.e., $\xi _{1}^{Ml}$ and $\xi _{2}^{Ml}$ are more than 78$\%$, the throughput of an OCS could undoubtedly exceed 70$\%$.

3.2 Delay

Second, we formulate the delay model to reduce the delay of the mice flows below 100$\mu$s. To achieve low latency, we use the second selector. Similar to [35], we divide the end-to-end delay into four parts, which are the setup delay S, propagation delay L, transmission delay D and queuing delay. For simplicity, we only take the first three parts. Let $\lambda$(i,j) denote the bandwidth allocation for a source-destination pair (i,j), where i,j$\in${1,2,…,N}. Then, delays are calculated according to the distance from the source node to the destination node. The bandwidth potential of the k-1$^{th}$ layer depends on the switching capability achieved in the k$^{th}$ layer. In EPS, the recommended ratio follows 1:4:10 for the edge-to-aggregation-to-core link. Finally, the estimated traffic volume of OCS and EPS are stored as the delay model’s input.

Objective:

To minimize the network delay for sending the mice flows through a hybrid network.

Minimize:

(11)$$D=\rho \eta D^{OCS}(i,j)+\rho (1-\eta )D^{DEF}(i,j)+(1-\rho )D^{EPS}(i,j)$$

Constraints:

1. Average delay target of mice flows.

(12)$$D\le 100$$

2. EPS constraints:

(13)$$D^{EPS}(i,j)= \left\{ \begin{array}{lr} L_{1}+10\cdot \lambda _{1}R(i,j)+S^{EPS}, 0\le |i-j|< N/4 & \\ L_{2}+2.5\cdot \lambda _{2}R(i,j)+S^{EPS}, N/4\le |i-j|< 3N/4 & \\ L_{3}+1\cdot \lambda _{3}R(i,j)+S^{EPS}, 3N/4\le |i-j|\le N-1 & \end{array} \right.$$

3. OCS constraints of all flows before an OCS is prepared.

(14)$$D^{OCS}(i,j)=L^{OCS}+\lambda (i,j)R(i,j)+S^{OCS}$$

4. Deflection constraints of mice flows before an OCS is prepared.

(15)$$D^{DEF}(i,j)=L^{DEF}+D^{EPS}$$

5. OCS constraints of all flows when the OCS is ready.

(16)$$D^{OCS}(i,j)=L^{OCS}+\lambda (i,j)R(i,j)$$

(12) indicates that the average delay target of mice flows should be reduced to 100$\mu$s. (13) shows that EPS constraints vary according to the distance between the source node and the destination node. (14) represents the Mahout constraints that all flows share the OCS in a first-in-first-out manner. (15) indicates that when the OCS is not ready, the system should immediately deflect the flows from OCS to EPS. (16) shows that when the OCS is ready, all flows are offloaded to the OCS by aggregation scheme. Table 3 lists all parameters used in numerical simulations, partly borrowing from [35,36]. Below, we will discuss the spectrum allocation scheme to optimize flow completion time.

Table 3. Parameter settings in the simulations^a

View Table | View all tables in this article

3.3 Flow completion time

Flow completion time is a complicated network-level metric because it is difficult to know in advance when the last flow reaches its destination. We consider an indirect and straightforward model to evaluate this indicator. Generally, users want their process to complete as soon as possible. The challenge is after using different paths. The co-flows pattern cannot remain unchanged. Since the OCS uses a star topology, the routing and wavelength assignment problem is simplified into the optimal wavelength assignment problem. We consider two possible optical switches. One is to use the traditional MEMS switch with fixed wavelength transmitters and receivers. The other is to use an OXC node with tunable wavelength transmitters and receivers [22]. OCS accepts the average delay matrix and the amount of bandwidth of the optical network as inputs to the model.

Objective:

Minimizing the root mean square value equals to minimize the variation of delay. By reducing this variation, various aggregated flows will reach their destinations with the same completion time, see (17)-(19).

Minimize:

(17)$$\sigma^{OCS}(W)=\sqrt{\frac{1}{N^{2}}\sum_{i=1}^{N}\sum_{j=1}^{N}(D^{OCS}(i,j)-\bar{D})^{2}}$$

where

(18)$$\bar{D}=\frac{1}{N^{2}}\sum_{i=1}^{N}\sum_{j=1}^{N}D(i,j)$$

(19)$$W(i,j)=w_{i,j}\cdot \lambda _{i,j}$$

Constraints:

1. MEMS constraints.

(20)$$\lambda _{i,j}= \left\{ \begin{array}{lr} \textrm{M, if } w_{i,j}=1\\ \textrm{0, if } w_{i,j}=0 \end{array} \right.$$

(21)$$\sum_{i=1}^{N}w_{i,j}=1,\sum_{j=1}^{N}w_{i,j}=1$$

2. OXC constraints.

(22)$$\sum_{i=1}^{N}w_{i,j}=M,\sum_{j=1}^{N}w_{i,j}=M$$

(20)-(21) have their roots on the circuit switching nature. There may be only a given set of wavelengths in a row or column. WDM technology solves this problem to a certain extent, see (22). However, a row or column may have only a given wavelength.

4. Simulation environment and results

Quantitative simulation is designed to validate the effectiveness of the two aggregation schemes over the non-aggregation Mahout scheme. The model was developed in Matlab running on a workstation with an Intel Xeon E3-1536M and 64GB RAM. We simulated the architecture for 1K, 10K, and 100K servers and traffic load ranges from 0 to 1. It takes as input the number of servers, the topology configuration in hybrid EPS/OCS interconnects, the number of wavelengths. Connecting a server to EPS with 10 Gbps link rates are common today whereas each node is connected to a set of DWDM (de)multiplexer at a 100Gbps port. Two architectural choices —MEMS optical switches and OXC switches —are compared: The total number of racks is 25, assuming 40 servers each. For both cases, modern WDM system can handle 100 channels, where 25$\times$40$\times$100=100K.

4.1 Traffic generation

The traffic pattern generated by this article refers to real DCN. Flow characteristics of interest may include ratios, proportions [37], arrival rate [38], packet size, flow durations and ON-OFF characteristics [16]. As mentioned earlier, we consider the most common types of flows, including elephant flows, co-flows, and mice flow. Their ratio and proportions are listed in Table 1. Fig. 3 show the distribution of packet sizes on the time and frequency domains. Unsurprisingly, most of the packet sizes locate at both ends, with 1500 and 64 bytes counting for 0.46 and 0.23 probability, respectively.The duration of each flow type is measured and compared in Fig. 4(a). Results showed that the shortest time belongs to mice flows with a dynamic duration of 20$\mu$s. On the other hand, the most prolonged duration belongs to elephant flows with a duration of 1 second. The average duration of co-flows is equal to the average duration of all flows (about 60ms). However, since the proportion of high flows is large, the curve of the same flow rate is slightly steeper than the average curve.

Fig. 3. Time and histogram of the packet sizes. (a) Generated traffic flows over a second (b) Histogram (c) Cumulative distribution function.

Download Full Size | PDF

Fig. 4. CDF of generated traffic flows. (a) Flows duration of three flows (b) On-period length (c) Off period lengths of the generated traffic flows.

Download Full Size | PDF

In this study, our destination selection is also closer to the actual data center. Elephant flow conforms to one-to-all characteristics, while co-flow conforms to many-to-many characteristics. The bursty mice can be seen everywhere. Therefore, for mouse flows, the destination is randomly selected with an equal probability in all server racks. On the other hand, there is a specific correlation between the destinations of the co-flows. To clarify this point, suppose the first data packet has a random position of R(i,j), where 1$\leq$ i, j$\leq$ N. Then, the new addresses are selected according to the selection of the first data packet. It appears on the diagonal of the position R(i+k, j+k), where 1-max(i,j)$\leq$ k$\leq$ N-max(i,j). Finally, assume the first data packet in the elephant flow also has a random position of R(i,j). Then, the following addresses will randomly appear on the same column R(i,k) at this position, 1$\leq$ k$\ne$j$\leq$ N.

4.2 Throughput results

Figure 5(a)-(c) compares the accuracy of four machine learning assisted flow detection algorithms on elephant flows, co-flows and mice flow under all loads. This paper presents 80$\%$ target as a reference. Decision tree method is the most accurate, which is superior to the other three methods, and the accuracy of each flow type exceeds 90$\%$. Two other methods suitable for aggregation schemes are support vector machines and k-nearest neighbour methods because their prediction accuracy for each flow surpasses 80$\%$. Naive Bayes method is the least accurate. It can also achieve 80$\%$ accuracy when detecting co-flows and mouse flow. However, for large elephant flows, it is slightly inaccurate (less than 80$\%$). The reason behind the failure is that Naive Bayes model in Matlab accepts only two columns as inputs. Besides, elephant flow has the fewest detections so that it may be the least accurate method. At last, we have selected three machine learning prediction algorithms suitable for our aggregation schemes.

Fig. 5. Prediction accuracy of four ML algorithms and network throughput with and without aggregation schemes. Prediction accuracy of EFs (a), of CFs (b), of MFs (c), and total throughput of the hybrid network (d).

Download Full Size | PDF

Figure 5(d) illustrates the effect of the total throughput under all loads with and without aggregation schemes. We consider four ML algorithms. First, regardless of whether an aggregation algorithm is used, the throughput curves of these three algorithms nearly overlap. They are decision trees, support vector matrices, and k nearest neighbors algorithms. However, the Naive Bayes curve method is slightly lower. The second aggregation algorithm has the highest throughput (over 80$\%$), the first aggregation algorithm has the second-highest throughput (over 40$\%$), and the non-aggregation algorithm has the lowest throughput (less than 20$\%$). Besides, the second aggregation algorithm requires the lowest prediction accuracy (>70$\%$) because the throughput curve of the Naive Bayes method overlaps with other algorithms. Different from theory, only when the load provided by the system is more significant than 0.5, the first aggregation scheme has a stable total throughput of more than 80$\%$.

In Fig. 6, the difference between the actual flow and the predicted flow for each flow type is compared. We chose the DT method to predict the flow pattern because it has the highest accuracy among the three suitable machine learning algorithms. The results indicate that all three predicted flow matrices show similar trends to actual flows. Traffic distribution patterns are different from each other. First, elephant flow conforms to one-to-all characteristics, and their destinations appear randomly on the same column. Second, co-flow conforms to many-to-many characteristics, and their destinations appear in the diagonal directions. Third, the bursty mice flows can be seen everywhere. However, prediction errors may occur in some local areas due to inaccuracy of prediction.

Fig. 6. Actual and predicted Traffic using decision tree algorithm of three types of flows. Traffic Matrix of EFs (a), of CFs (b), and of MFs (c). Predicted Traffic of EFs(d), of CFs (e), and of MFs (f).

Download Full Size | PDF

4.3 Latency results

Figure 7(a)-(c) compares the mouse flow delay under all loads with and without the aggregation schemes. In summary, we conduct a comprehensive evaluation of nine aggregation and non-aggregation solutions and three priority algorithms. First, in most cases, delay curves of these two aggregation schemes (namely EF curves, and CF curves) overlap, while the MF curves are the lowest. The curve headed by MF takes the most port resources, while the EF curve occupies the least port resources. Second, in most cases, the delay of the second aggregation algorithm is the lowest (<100ns); the delay of the first aggregation algorithm is the second lowest (<10$\mu$s). The delay of the non-aggregation algorithm is the highest (about 100$\mu$s). As the degree of aggregation deepens, the higher the flows in the OCS and the idler the EPS. Thus, the average latency of mouse flow is the smallest. Finally, in the first-in-first-out scheme, the non-aggregation algorithm can obtain the lower average delay (<1$\mu$s) than the first aggregation scheme (<10$\mu$s). We believe the main reason is the accurate decision tree method. Mouse flow and elephant flow are completely separated, so mice flows will no longer be blocked by elephant flows in EPS. Besides, ECMP balances the number of flows on each link, thereby significantly reducing the average latency in EPS. On the other hand, the first aggregation algorithm needs to deflect mice flows from OCS to EPS, increasing average delay.

Fig. 7. Averaged delay under nine aggregation/non-aggregation schemes and three priority algorithms.

Download Full Size | PDF

Figure 8(a)-(b) compare the average flow completion time of the co-flows with and without aggregation scheme under all loads. It uses two optical switching schemes and the results of the decision tree algorithm. First, regardless of whether the aggregation algorithm is used or not, the OXC switching scheme achieves a shorter flow completion time than the MEMS switching scheme. Because the MEMS switch needs to grant full bandwidth (25 wavelengths) in all port directions, and the OXC switch allows each wavelength to work independently. Some overloaded racks can allocate more available bandwidth shares (12 wavelengths), while other racks can allocate smaller bandwidth shares (0.5 wavelengths). Second, whether it is an OXC switch or a MEMS switch, the second aggregation scheme can minimize the average flow completion time (<1s) and the delay change value (<30). Compared with the non-aggregation scheme (4$\%$) and the first aggregation scheme (4$\%$), the second aggregation scheme increases the number of ports to 20$\%$. Also, we have further relaxed the requirements for applying for optical network resources. The increase in the number of optical ports and the effective use of optical bandwidth significantly reduce the flow completion time. Finally, due to the long tail effect, the flow completion time of the second aggregation scheme still exceeds the limit set by DCN (100ms). The pursuit of shorter flow completion times requires the use of more flexible resource scheduling methods, not just faster switching devices. To prove this, Fig. 8(c)-(e) depicts a MEMS resource allocation scheme and two OXC resource allocation schemes for one second. The switching speed of a MEMS is 1,000,000 times slower than that of an OXC. Compared with the MEMS switches ($\sigma$=25), the resource allocation of the OXC scheme ($\sigma$=23.5 and $\sigma$=0) has a higher similarity to the actual flow matrix. Ideally, when the system is optimized by flexible grid technology with precise bandwidth and center wavelength tuneability, the estimated completion time can be below the lower limit of 100 milliseconds.

Fig. 8. (a)-(b)Flow completion time of traffic with and without aggregation schemes. (c)-(e) Resource allocations under either a MEMS switch or an OXC switch with different standard deviation values.

Download Full Size | PDF

5. Conclusion

We have investigated a novel hybrid DCN architecture based on nanosecond OXC with fast flow control. Based on the actual DC traffic model, we numerically assess the system performance in terms of throughput, latency, and flow completion time with and without aggregation schemes. The results show that the second aggregation algorithm achieves a maximum throughput of the optical circuit more than 90$\%$ compared to the non-aggregation method (below 20$\%$) and the first aggregation method (below 70$\%$). Besides, the second aggregation algorithm requires the lowest prediction accuracy (>70$\%$). Second, compared with the first aggregation algorithm with delay (<10$\mu$s) and the non-aggregation algorithm with delay (about 100$\mu$s), the second aggregation algorithm offers the lowest delay of mice flows (<100ns). Finally, compared to the flow completion time of the first aggregation algorithm (5s) and the non-aggregation algorithm (100s), the combination of second aggregation algorithm and optical cross-connect can reduce the average flow completion time to a level below 0.5 seconds. Further, if the second aggregation algorithm combines with the flex-grid technology, the average flow completion time can be reduced to 100ms.

Funding

China Scholarship Council (201908310028).

Disclosures

The authors declare no conflicts of interest.

References

1. B. Buscaino, B. D. Taylor, and J. M. Kahn, “Multi-tb/s-per-fiber coherent co-packaged optical interfaces for data center switches,” J. Lightwave Technol. 37(13), 3401–3412 (2019). [CrossRef]

2. M. Yuang, P.-L. Tien, W.-Z. Ruan, T.-C. Lin, S.-C. Wen, P.-J. Tseng, C.-C. Lin, C.-N. Chen, C.-T. Chen, Y.-A. Luo, M.-R. Tsai, and S. Zhong, “Optuns: Optical intra-data center network architecture and prototype testbed for a 5g edge cloud,” J. Opt. Commun. Netw. 12(1), A28–A37 (2020). [CrossRef]

3. G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. E. Ng, M. Kozuch, and M. Ryan, “c-through: Part-time optics in data centers,” in Proceedings of the ACM SIGCOMM 2010 conference, (2010), pp. 327–338.

4. C. Pollock, F. Pardo, M. Imboden, and D. Bishop, “Open loop control theory algorithms for high-speed 3d mems optical switches,” Opt. Express 28(2), 2010–2019 (2020). [CrossRef]

5. S. Ma, H. Gu, H. Lan, X. Yu, and K. Wang, “Rss: a relay-based schedule scheme for optical data center network,” Photon. Netw. Commun. 39(1), 70–77 (2020). [CrossRef]

6. J. M. D. Mendinueta, S. Shinada, Y. Hirota, H. Furukawa, and N. Wada, “High-capacity super-channel-enabled multi-core fiber optical switching system for converged inter/intra data center and edge optical networks,” IEEE J. Sel. Top. Quantum Electron. 26(4), 1–13 (2020). [CrossRef]

7. A. Ghazisaeidi, “Theory of coherent wdm systems using in-line semiconductor optical amplifiers,” J. Lightwave Technol. 37(17), 4188–4200 (2019). [CrossRef]

8. Y. Muranaka, S. Ibrahim, T. Nakahara, H. Ishikawa, Y. Sakamaki, and T. Hashimoto, “Fast optical switching technologies for inter/intra data center networks,” in Optical Interconnects XIX, vol. 10924 (International Society for Optics and Photonics, 2019), p. 109240F.

9. M. D. Garcia, P. Girault, S. Joly, L. Oyhenart, V. Raimbault, C. Dejous, and L. Bechou, “An analytical approach to predict maximal sensitivity of microring resonators for absorption spectroscopy,” J. Lightwave Technol. 37(21), 5500–5506 (2019). [CrossRef]

10. N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat, “Helios: a hybrid electrical/optical switch architecture for modular data centers,” in Proceedings of the ACM SIGCOMM 2010 conference, (2010), pp. 339–350.

11. A. R. Curtis, W. Kim, and P. Yalagandula, “Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection,” in 2011 Proceedings IEEE INFOCOM, (IEEE, 2011), pp. 1629–1637.

12. Y. Xiong, J. Shi, Y. Yang, Y. Lv, and G. N. Rouskas, “Lightpath management in sdn-based elastic optical networks with power consumption considerations,” J. Lightwave Technol. 36(9), 1650–1660 (2018). [CrossRef]

13. M. Moralis-Pegios, N. Terzenidis, G. Mourgias-Alexandris, K. Vyrsokinos, and N. Pleros, “A 1024-port optical uni-and multicast packet switch fabric,” J. Lightwave Technol. 37(4), 1415–1423 (2019). [CrossRef]

14. C. Wang, H. Cao, S. Yang, J. Guo, H. Guo, and J. Wu, “Decision tree classification based mix-flows scheduling in optical switched dcns,” in 2018 Optical Fiber Communications Conference and Exposition (OFC), (IEEE, 2018), pp. 1–3.

15. M. M. Saritas and A. Yasar, “Performance analysis of ann and naive bayes classification algorithm for data classification,” Int. J. Intell. Syst. Appl. Eng. 7(2), 88–91 (2019). [CrossRef]

16. A. Yu, H. Yang, Q. Yao, Y. Li, H. Guo, T. Peng, H. Li, and J. Zhang, “Scheduling with flow prediction based on time and frequency 2d classification for hybrid electrical/optical intra-datacenter networks,” in 2019 Optical Fiber Communication Conference, (Optical Society of America, 2019), pp. Th1H–3.

17. J. Zhang, M. Gao, W. Chen, and G. Shen, “Non-data-aided k-nearest neighbors technique for optical fiber nonlinearity mitigation,” J. Lightwave Technol. 36(17), 3564–3572 (2018). [CrossRef]

18. L. Wang, X. Wang, M. Tornatore, K. J. Kim, S. M. Kim, D.-U. Kim, K.-E. Han, and B. Mukherjee, “Scheduling with machine-learning-based flow detection for packet-switched optical data center networks,” J. Opt. Commun. Netw. 10(4), 365–375 (2018). [CrossRef]

19. A. Yu, H. Yang, W. Bai, L. He, H. Xiao, and J. Zhang, “Leveraging deep learning to achieve efficient resource allocation with traffic evaluation in datacenter optical networks,” in 2018 Optical Fiber Communications Conference and Exposition (OFC), (IEEE, 2018), pp. 1–3.

20. C.-T. Lea, “A scalable awgr-based optical switch,” J. Lightwave Technol. 33(22), 4612–4621 (2015). [CrossRef]

21. A. S. Hamza, “Recent advances in the design of optical wireless data center networks,” in Broadband Access Communication Technologies XIII, vol. 10945 (International Society for Optics and Photonics, 2019), p. 109450K.

22. K. Prifti, R. Santos, J. Shin, H. Kim, N. Tessema, P. Stabile, S. Kleijn, L. Augustin, H. Jung, S. Park, Y. Baek, S. Hyun, and N. Calabretta, “All-optical cross-connect switch for data center network application,” in 2020 Optical Fiber Communications Conference and Exhibition (OFC), (IEEE, 2020), pp. 1–3.

23. H. Rastegarfar, M. Glick, N. Viljoen, M. Yang, J. Wissinger, L. LaComb, and N. Peyghambarian, “Tcp flow classification and bandwidth aggregation in optically interconnected data center networks,” J. Opt. Commun. Netw. 8(10), 777–786 (2016). [CrossRef]

24. Q. Kong, Y. Zhan, and P. Wan, “Hybrid ocs/obs interconnect in intra-data-center network,” Chin. Opt. Lett. 17(8), 080605 (2019). [CrossRef]

25. G. M. Saridis, S. Peng, Y. Yan, A. Aguado, B. Guo, M. Arslan, C. Jackson, W. Miao, N. Calabretta, F. Agraz, S. Spadaro, G. Bernini, N. Ciulli, G. Zervas, R. Nejabati, and D. Simeonidou, “Lightness: A function-virtualizable software defined data center network with all-optical circuit/packet switching,” J. Lightwave Technol. 34(7), 1618–1627 (2016). [CrossRef]

26. A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S. Banerjee, “Devoflow: Scaling flow management for high-performance networks,” in Proceedings of the ACM SIGCOMM 2011 conference, (2011), pp. 254–265.

27. R. B. Basat, G. Einziger, R. Friedman, and Y. Kassner, “Optimal elephant flow detection,” in IEEE INFOCOM 2017-IEEE Conference on Computer Communications, (IEEE, 2017), pp. 1–9.

28. M. Chowdhury and I. Stoica, “Efficient coflow scheduling without prior knowledge,” SIGCOMM Comput. Commun. Rev. 45(4), 393–406 (2015). [CrossRef]

29. Z. Feng, W. Sun, J. Zhu, J. Shao, and W. Hu, “Resource allocation in electrical/optical hybrid switching data center networks,” J. Opt. Commun. Netw. 9(8), 648–657 (2017). [CrossRef]

30. T. Benson, A. Anand, A. Akella, and M. Zhang, “Understanding data center traffic characteristics,” SIGCOMM Comput. Commun. Rev. 40(1), 92–99 (2010). [CrossRef]

31. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “Vl2: a scalable and flexible data center network,” Commun. ACM 54(3), 95–104 (2011). [CrossRef]

32. X. S. Huang, X. S. Sun, and T. E. Ng, “Sunflow: Efficient optical circuit scheduling for coflows,” in Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies, (2016), pp. 297–311.

33. S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat, “B4: Experience with a globally-deployed software defined wan,” SIGCOMM Comput. Commun. Rev. 43(4), 3–14 (2013). [CrossRef]

34. X. Jin, Y. Li, D. Wei, S. Li, J. Gao, L. Xu, G. Li, W. Xu, and J. Rexford, “Optimizing bulk transfers with software-defined optical wan,” in Proceedings of the 2016 ACM SIGCOMM Conference, (2016), pp. 87–100.

35. Z. Wang, J. Xu, P. Yang, Z. Wang, L. H. K. Duong, and X. Chen, “High-radix nonblocking integrated optical switching fabric for data center,” J. Lightwave Technol. 35(19), 4268–4281 (2017). [CrossRef]

36. R. Mayer and H.-A. Jacobsen, “Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools,” ACM Comput. Surv. 53(1), 1–37 (2020). [CrossRef]

37. F. Yan, X. Xue, and N. Calabretta, “Hifost: A scalable and low-latency hybrid data center network architecture based on flow-controlled fast optical switches,” J. Opt. Commun. Netw. 10(7), B1–B14 (2018). [CrossRef]

38. S. Kandula, J. Padhye, and V. Bahl, “Flyways to de-congest data center networks,” Proc. of ACM HotNets. pp. 1–6 (2009).

Flow	Ratio	Proportion	Indicators	Targets	Application
EF	4%	24%	OCS Throughput; Total Throughput	$> 70 %; > 80 %$	Download, Twitter
CF	16%	66%	Flow Completion Time	$<$ multiples of RTT(100ms)	Video, AI, Cloud Computing
MF	80%	10%	ETE latency	$< R T T (100 μ s)$	Search

Symbols	Definition
R(i, j)	The total traffic volume of a hybrid network from a source node i to a destination node j.
$R^{O C S} (i, j)$	The estimated traffic volume of an OCS from a source node i to a destination node j.
$R^{E P S} (i, j)$	The estimated traffic volume of an EPS from a source node i to a destination node j.
$A_{l} (i, j)$	The network ports required by any flow type to accommodate all number of source-destination pairs (i,j).
$A_{l}^{(1)} (i, j)$	The network ports required by elephant flows to accommodate all number of source-destination pairs (i,j).
$A_{l}^{(2)} (i, j)$	The network ports required by co-flows to accommodate all number of source-destination pairs (i,j).
$A_{l}^{(3)} (i, j)$	The network ports required by mice flows to accommodate all number of source-destination pairs (i,j).
f(i,j)	The estimated traffic volume of all flow types from the source node i to the destination node j.
$f_{e} (i, j)$	The estimated traffic volume of elephant flows from the source node i to the destination node j.
$f_{c} (i, j)$	The estimated traffic volume of co-flows from the source node i to the destination node j.
$f_{m} (i, j)$	The estimated traffic volume of mice flows from the source node i to the destination node j.
$l$	The number of flows generated by one single type of flow.
$ξ_{1}$	The ratio of the bytes of aggregated elephant flows to the total bytes of all elephant flows.
1- $ξ_{1}$	The ratio of the bytes of non-aggregated elephant flows to the total bytes of all elephant flows.
$ξ_{2}$	The ratio of the bytes of aggregated co-flows to the total bytes of all co-flows.
1- $ξ_{2}$	The ratio of the bytes of non-aggregated co-flows to the total bytes of all co-flows.
$ξ_{3}$	The ratio of the bytes of aggregated mice flows to the total bytes of all mice flows.
1- $ξ_{3}$	The ratio of the bytes of non-aggregated mice flows to the total bytes of all mice flows.
$ξ_{1}^{M l}$	The accuracy of elephant flows detection algorithm.
1- $ξ_{1}^{M l}$	The inaccuracy of elephant flows detection algorithm.
$ξ_{2}^{M l}$	The inaccuracy of co-flows detection algorithm.
1- $ξ_{2}^{M l}$	The inaccuracy of co-flows detection algorithm.
$ξ_{3}^{M l}$	The accuracy of mice flows detection algorithm.
1- $ξ_{3}^{M l}$	The inaccuracy of mice flows detection algorithm.
$ρ$	The ratio of the throughput of an optical network to the total throughput of a hybrid network, which ranges from 70 $%$ to 90 $%$ .
$D^{E P S} (i, j)$	The average delay for each source-destination pair (i,j) in EPS
$D^{O C S} (i, j)$	The average delay for each source-destination pair (i,j) in OCS.
$L_{1}$	The propagation delay of intra-cluster data.
$L_{2}$	The propagation delay of inter-cluster data.
$L_{3}$	The propagation delay of all traffic in and out of a data center.
$L^{D E F}$	The propagation delay of all traffic from OCS to EPS.
$L^{O C S}$	The propagation delay of all traffic between the edge node and the core node.
$S^{E P S}$	The setup delay of each new flow caused by communication between the controller and data-plane.
$S^{O C S}$	The setup delay to set up a new lightpath.
$λ (i, j)$	The spectrum allocation of each source-destination pair (i,j) in OCS.
$λ_{1} (i, j)$	The bandwidth allocation of each source-destination pair (i,j) in the edge layer for the EPS network.
$λ_{2} (i, j)$	The bandwidth allocation of each source-destination pair (i,j) in the aggregation layer for the EPS network.
$λ_{3} (i, j)$	The bandwidth allocation of each source-destination pair (i,j) in the core layer for the EPS network.
$η$	The contribution of mice flows that is currently being offloaded to the OCS network of the total mice flows.
1- $η$	The contribution of mice flows that is currently being deflected to EPS network of the total mice flows.
$D^{O C S} (i, j)$	The average delay of each source-destination pair (i,j) in OCS.
$\bar{D}$	The average delay of all source-destination pairs (i,j) in OCS.
$σ^{O C S}$	The root mean square value of delay matrix in OCS.
$w_{i, j}$	The boolean variable denotes whether there is spectrum allocated for the optical path from source i to destination j in OCS.
$W (i, j)$	The spectrum allocation matrix of all source-destination pairs (i,j) in OCS.
N	The number of server racks.
M	The number of wavelengths.

Speed			10Gbps EPS			100Gbps OCS
Scenarios			EPS ready		OCS not ready		OCS ready
Parameters	Tend-to-end		Time Penalty	Non-Agg MFs/CFs	Deflecting MFs/CFs	Aggregated EFs	Aggregated MFs/CFs	Aggregated EFs
$S^{E P S}$	$T c o n t r o l_{e p s}$	Ttx(min)	3.2ns	3.2ns	3.2ns	-	-	-
		tcable	10ns	10ns	10ns	-	-	-
		4 Tscheduler(min)	38.4ns	38.4ns	38.4ns	-	-	-
		tswitch	3ns	3ns	3ns	-	-	-
$L^{E P S}$	$T f i b e r_{e p s}$	2km	200ns	200ns	200ns	-	-	-
	$T s e r i a l_{e p s}$	8 wavelengths	6.4ns	6.4ns	6.4ns	-	-	-
$L_{D E F}$	Resort	64KB	10ns	-	-	10ns	-	-
$L^{O C S}$	$T f i b e r_{o c s}$	2km	20ns	-	-	20ns	20ns	20ns
	$T s e r i a l_{o c s}$	4 wavelengths	3.2ns	-	-	3.2ns	3.2ns	3.2ns
	Resort	64KB	10ns	-	10ns	-	10ns	-
$S^{O C S}$	$T c o n t r o l_{o c s}$	Tscheduler	75ms	-	-	75ms	-	-
		tswitch	25ms	-	-	25ms	-	-
		Ttx(min)	10ns	-	-	10ns	-	-
		Total		261ns	271ns	100ms	33.2ns	23.2ns

Flow	Ratio	Proportion	Indicators	Targets	Application
EF	4%	24%	OCS Throughput; Total Throughput	$> 70 %; > 80 %$	Download, Twitter
CF	16%	66%	Flow Completion Time	$<$ multiples of RTT(100ms)	Video, AI, Cloud Computing
MF	80%	10%	ETE latency	$< R T T (100 μ s)$	Search

Symbols	Definition
R(i, j)	The total traffic volume of a hybrid network from a source node i to a destination node j.
$R^{O C S} (i, j)$	The estimated traffic volume of an OCS from a source node i to a destination node j.
$R^{E P S} (i, j)$	The estimated traffic volume of an EPS from a source node i to a destination node j.
$A_{l} (i, j)$	The network ports required by any flow type to accommodate all number of source-destination pairs (i,j).
$A_{l}^{(1)} (i, j)$	The network ports required by elephant flows to accommodate all number of source-destination pairs (i,j).
$A_{l}^{(2)} (i, j)$	The network ports required by co-flows to accommodate all number of source-destination pairs (i,j).
$A_{l}^{(3)} (i, j)$	The network ports required by mice flows to accommodate all number of source-destination pairs (i,j).
f(i,j)	The estimated traffic volume of all flow types from the source node i to the destination node j.
$f_{e} (i, j)$	The estimated traffic volume of elephant flows from the source node i to the destination node j.
$f_{c} (i, j)$	The estimated traffic volume of co-flows from the source node i to the destination node j.
$f_{m} (i, j)$	The estimated traffic volume of mice flows from the source node i to the destination node j.
$l$	The number of flows generated by one single type of flow.
$ξ_{1}$	The ratio of the bytes of aggregated elephant flows to the total bytes of all elephant flows.
1- $ξ_{1}$	The ratio of the bytes of non-aggregated elephant flows to the total bytes of all elephant flows.
$ξ_{2}$	The ratio of the bytes of aggregated co-flows to the total bytes of all co-flows.
1- $ξ_{2}$	The ratio of the bytes of non-aggregated co-flows to the total bytes of all co-flows.
$ξ_{3}$	The ratio of the bytes of aggregated mice flows to the total bytes of all mice flows.
1- $ξ_{3}$	The ratio of the bytes of non-aggregated mice flows to the total bytes of all mice flows.
$ξ_{1}^{M l}$	The accuracy of elephant flows detection algorithm.
1- $ξ_{1}^{M l}$	The inaccuracy of elephant flows detection algorithm.
$ξ_{2}^{M l}$	The inaccuracy of co-flows detection algorithm.
1- $ξ_{2}^{M l}$	The inaccuracy of co-flows detection algorithm.
$ξ_{3}^{M l}$	The accuracy of mice flows detection algorithm.
1- $ξ_{3}^{M l}$	The inaccuracy of mice flows detection algorithm.
$ρ$	The ratio of the throughput of an optical network to the total throughput of a hybrid network, which ranges from 70 $%$ to 90 $%$ .
$D^{E P S} (i, j)$	The average delay for each source-destination pair (i,j) in EPS
$D^{O C S} (i, j)$	The average delay for each source-destination pair (i,j) in OCS.
$L_{1}$	The propagation delay of intra-cluster data.
$L_{2}$	The propagation delay of inter-cluster data.
$L_{3}$	The propagation delay of all traffic in and out of a data center.
$L^{D E F}$	The propagation delay of all traffic from OCS to EPS.
$L^{O C S}$	The propagation delay of all traffic between the edge node and the core node.
$S^{E P S}$	The setup delay of each new flow caused by communication between the controller and data-plane.
$S^{O C S}$	The setup delay to set up a new lightpath.
$λ (i, j)$	The spectrum allocation of each source-destination pair (i,j) in OCS.
$λ_{1} (i, j)$	The bandwidth allocation of each source-destination pair (i,j) in the edge layer for the EPS network.
$λ_{2} (i, j)$	The bandwidth allocation of each source-destination pair (i,j) in the aggregation layer for the EPS network.
$λ_{3} (i, j)$	The bandwidth allocation of each source-destination pair (i,j) in the core layer for the EPS network.
$η$	The contribution of mice flows that is currently being offloaded to the OCS network of the total mice flows.
1- $η$	The contribution of mice flows that is currently being deflected to EPS network of the total mice flows.
$D^{O C S} (i, j)$	The average delay of each source-destination pair (i,j) in OCS.
$\bar{D}$	The average delay of all source-destination pairs (i,j) in OCS.
$σ^{O C S}$	The root mean square value of delay matrix in OCS.
$w_{i, j}$	The boolean variable denotes whether there is spectrum allocated for the optical path from source i to destination j in OCS.
$W (i, j)$	The spectrum allocation matrix of all source-destination pairs (i,j) in OCS.
N	The number of server racks.
M	The number of wavelengths.

Machine Learning assisted aggregation schemes for optical cross-connect in hybrid electrical/optical data center networks

Abstract

1. Introduction

2. Hybrid network, a control plane, and two proposed aggregation schemes

2.1 Hybrid network and an edge node structure

2.2 Proposed traffic aggregation schemes and a central controller

3. Traffic characteristics and main indicators

3.1 Throughput of the optical circuit switch

3.2 Delay

3.3 Flow completion time

4. Simulation environment and results

4.1 Traffic generation

4.2 Throughput results

4.3 Latency results

5. Conclusion

Funding

Disclosures

References

Cited By

Figures (8)

Tables (3)

Equations (22)

OSA Continuum