Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Opening up ROADMs: streaming telemetry [Invited]

Open Access Open Access

Abstract

We present an implementation of streaming telemetry of optical metrics within open hardware reconfigurable add/drop multiplexers (ROADMs). Our key achievements are sub-second updates of high-resolution spectrum scans, and we demonstrate a sustained telemetry stream of the full C-band with a sub-GHz resolution. The telemetry streaming is implemented over a standard, Internet Engineering Task Force (IETF)-defined protocol (YANG Push) in collaboration with an open-source YANG software stack and device-specific code. As the telemetry collector, we used a common time series database (TSDB) along with a visualization dashboard. We also extended the Open Network Operating System (ONOS) software-defined network (SDN) controller to act as a telemetry receiver.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. INTRODUCTION

Contemporary optical networks are complex, analog systems [1] of light waveguides. In order to avoid driving such a system blind [2], network operators require some data. Only with proper metrics can the network be measured, characterized, and tuned for the right balance of performance and reliability. Offline planning tools such as GNPy [3] are excellent for network optimization, yet they help only so much with investigation of day-to-day degradations. Once the bit error rate (BER) counters hit their thresholds and the proprietary forward error correction (FEC) encodings throw their proverbial towels into the ring, someone needs to investigate which part of the waveguide network is misbehaving. That is when the detailed, yet conveniently available, history of performance data shows its true value [4].

Streaming telemetry is an important addition to a network operator’s troubleshooting tool bag. The first demonstrations focused on the digital side of the dense wavelength division multiplexing (DWDM) network, mainly on signal quality metrics obtained from transponders [5]. Subsequent research emphasized the importance of monitoring the underlying optical line system (OLS) as well [6,7].

When the telemetry system provides sufficient data, faults can be detected [8] and networks optimized [9,10], yet too much data might occasionally constitute an operational problem on its own [11]. With the right tools and data, network maintenance might eventually become automated and proactive rather than reactive [12].

Of course, proper monitoring of the optical layer predates the current popularity of the term “telemetry” [13]. A modern telemetry system, however, improves upon the past state-of-the-art with delivering a frequent, yet controllable stream of performance data [14].

At Layer-0 [15], processing the optical telemetry requires an out-of-band communication channel. That is unlike the digital networks that support high-level programmable queries [16] and in-band telemetry data [17], or the IP-optical networks where the telemetry can cross layer boundaries [18,19]. The usual software-defined network (SDN) principles are followed, and the network elements (NEs) are programmable and generic in their functionality. Specific functionality, such as threshold-based optical path protection [20], is not built in to the NEs; instead, the SDN controller acts upon the data transferred via the telemetry stream as needed.

In this work we set out to tweak our open hardware reconfigurable add/drop multiplexer (ROADM) designs [2123] with focus on Layer-0 telemetry. We aimed at providing the optical measurements to the network management system (NMS) as accurately and frequently as it is practical, and at implementing these features in a robust manner with open standards and—preferably—using existing open-source code. Optical networking is not the first field in the world that discovered a need for frequent updates of performance metrics; in particular, the computing industry has faced a similar set of problems with the move to virtualization, cloud computing, and containerized microservices. This paper aims at the lowest layers of a network telemetry stack, focusing on the actual gathering of raw optical data and its efficient transport toward the upper layers of the monitoring stack. Results of basic telemetry processing are presented as well.

 figure: Fig. 1.

Fig. 1. Monitoring capabilities within a ROADM node.

Download Full Size | PDF

The rest of the paper is structured as follows. Hardware features of ROADMs, which are important for telemetry and monitoring, are described in Section 2. In Section 3 we present the protocol landscape of streaming telemetry, and in Section 4 we describe the software stack in the open hardware ROADMs. We focus on telemetry handling. In Section 5, we present two examples on how to use the telemetry data. We show a telemetry visualization and an integration into an SDN controller. Finally, Section 6 concludes the paper with a summary of our findings and suggested next steps.

2. MONITORING CAPABILITIES OF OPTICAL NETWORK ELEMENTS

The primary purpose of an OLS is to carry a set of media channels (MCs) over the optical network. The importance of non-intrusive monitoring of MCs is described in ITU-T G.872 [24], and in a typical scenario, qualitatively different monitoring is implemented at the edges of the optical transport sections (OTSes) compared to the edges of the optical multiplex sections (OMSes). Often, the OTS monitoring takes place at the amplifier nodes via an optical photodetector (PD) with no means of spectrum filtering. The OMS monitoring, implemented in the ROADMs, is often capable of reporting the optical power on a per-MC basis at various points [25]. More advanced ROADMs support measuring of the optical spectra within the MCs, i.e., their capabilities to a certain extent approach those of a lab-grade optical spectrum analyzer (OSA). “Exhaustive monitoring is possible with an unlimited budget” [26]; however, for practical reasons, we only considered spectrum analysis. More advanced measurements, such as chromatic dispersion (CD) and polarization mode dispersion (PMD) monitoring or polarization analysis [27], are out of the scope of the presented study.

In this work we are using fully disaggregated ROADM devices [21] where each ROADM node comprises several ROADM modules. From the management perspective, each of the ROADM modules acts as an independent entity, with its own operating system instance, NETCONF server, independent northbound control interface, and dedicated telemetry stream. Individual ROADM modules are not necessarily homogeneous; as an example, we utilize several add/drop architectures that differ in MC capabilities, in the type of measuring device, and in its spectral resolution.

A. ROADM Spectrum Monitoring

A typical ROADM node of degree three with redundant add/drop stages is shown in Fig. 1 as an example. In each Line Degree ROADM module, a multiport optical channel monitor (OCM) analyzes spectra at the Line-IN and Line-OUT ports. The express ports are not monitored directly, but only through the corresponding wavelength-selective switch (WSS) module. This architecture represents a practical trade-off in terms of equipment price, complexity, and measurement latencies. An important result of this trade-off is that the Line Degree node on its own is not capable of measuring performance of MCs from those Express IN ports, which have not been selected for egress via Line OUT. We will, however, show that this is easily compensated via measurement at the Express OUT port of the preceding ROADM module.

The add/drop stages themselves can be implemented in several ways; the options that we have experimentally verified are shown in Fig. 2. The simplest option is a passive architecture [Fig. 2(a)]. Client signals from transponders are passively broadcast to ingress the Express port of each Line Degree, and all signals coming from Line ports are routed by the Line Degrees to the relevant client transponder via a passive coupler. In this architecture, the add/drop stage is fully passive, and the optical spectrum of client signals can, therefore, only be monitored after it has been selected by the egress WSS. This design is feasible for small two-degree ROADM nodes only, or for non-redundant terminal nodes where transponders plug directly to the Line Degree ROADMs. The limiting factor is an excessive power loss in multiport couplers in the Drop direction.

 figure: Fig. 2.

Fig. 2. Add/drop architectures determine physical measurement capabilities. (a) Passive A/D for two-degree ROADMs. (b) Coherent A/D. (c) Twin WSSes with dual-port OCM. (d) High-resolution measurement at client ports.

Download Full Size | PDF

The same approach can be extended with active amplification and PD-based monitoring on Add (but still no spectrum filtering) as shown in Fig. 2(b) (and as demonstrated in [22]). Such an approach allows independent monitoring of full-band power on all Add ports. No spectrum filtering or MC-level spectrum monitoring is available, though, so this approach is only suitable for coherent-detection transponders under the control of the OLS operator.

More advanced add/drop stage designs typically employ WSSes and OCMs. Figure 2(c) shows the internal topology of one such add/drop module based on twin WSSes and a dual-port, flexgrid OCM [21]. The OCM capabilities provide sub-MC spectrum monitoring for each Drop port. Monitoring of the Add signals requires a proper WSS configuration. However, because we assume full route-and-select [28] architecture of ROADM nodes, the mere presence of a signal on an Express cross-connect will not substantially affect other signals thanks to the WSS at the egress direction of the Line Degree module. That being said, the WSS modules themselves suffer from wavelength contention, i.e., they are not truly contentionless devices in an $M \times N$ configuration. This means that it is not possible to monitor the full spectrum of all client ports when there is some traffic to be added. If MC is added via a port, the corresponding wavelength range is blocked on the WSS, and the corresponding spectrum cannot be monitored on other client ports.

This limitation is addressed by a more complex ROADM add/drop module design as shown in Fig. 2(d). Compared to the previous dual-port OCM approach, this extended design enables direct monitoring of optical spectrum on all client ports—a feature that is especially important for alien wavelengths (AWs) [29,30]. Such large-port OCMs are typically not commercially available, so our design utilizes an optical switch as an extra component between the monitoring fiber taps and the OCM input port. Apart from slightly increased software complexity, this results in slower operation because the OCM module can no longer perform multiport scans in parallel.

None of the described designs utilizes direct optical monitoring of the Drop ports. The reported power levels are computed from a monitoring tap located immediately before the egress WSS [upper right in Figs. 2(c) and 2(d)] and adjusted for the WSS per-port insertion losses. The ROADM, therefore, cannot detect a potential fault in the Drop WSS hardware, and the measurements do not directly account for the effect of the filter roll-off. While the latter can be compensated for in software, a physical post-WSS monitoring would require adding dozens of monitoring taps and, therefore, another many-port, low-loss optical switch. For Drop, the measurement inaccuracy depends on physical tolerances of components (which are calibrated at the manufacturing time) and on possible faults of the ROADM hardware. That is unlike the Add direction where the signal properties depend on an external component and the transponder, and indirect measurements are only possible once the spectrum is routed. Since the component cost of the existing optical switch in Fig. 2(d) already constitutes roughly 20% of the add/drop costs, the proposed design relies on indirect monitoring at the Drop ports.

B. Physical Device Performance

Our devices [23] use optical measurement submodules of varying capabilities and performance. Often, the richer the measurements that are supported, the slower the device returns its results. Table 1 illustrates a summary of a few of the typical measurement components.

Tables Icon

Table 1. Measurement Capabilities versus Measurement Latency

At the simpler side of the spectrum, erbium-doped fiber amplifiers (EDFAs)—which are used in ROADMs as boosters and preamps, and in-line amplifiers as well—provide a relatively straightforward interface. Typically utilizing just a few PDs internally for an internal power control loop, EDFAs offer a readout of the total optical power aggregated over the full spectrum at their input and output ports. This power level monitoring is crucial for transient suppression, and literature suggests that the typical end-to-end latencies of the whole control loop are of the order of 10–100 µs [31,32]. These internal control-loop latencies are, however, not always achievable by software, which runs outside of the EDFA component.

The particular component that we are using communicates with the host via a universal asynchronous receiver-transmitter (UART) interface. When asked to measure the power levels at the PDs, transferring the required number of bytes over the UART consumes about 20 ms. Based on advice from the vendor, our device software polls the EDFAs for power level data about every 100 ms.

In the “Coherent A/D” [22] [Fig. 2(b)], we required real-time performance monitoring of all Client IN ports. We used an eight-channel PD array with an integrated A/D converter and a microcontroller with an ${{\rm{I}}^2}{\rm{C}}$ interface. Full readout of eight channels takes about 15 ms on our hardware where the bus is physically shared with the small form-factor pluggable (SFP) transceiver cage, which might impose a 100 kHz clock rate on the bus. We slowed the monitoring loop a bit further down, and the device software monitors the power levels at 20 Hz, yielding about 50 ms latencies.

The components presented so far only monitor the aggregate power levels over their wide spectral range. For more detailed information about the channel spectra, an OCM is required. In the Line Degree ROADM modules and in the WSS-based add/drop [Fig. 2(c)] we used a multiport OCM (p. 5 in [21]) with a serial peripheral interface (SPI) communication interface. The SPI bus is usually employed for high speed communication, and indeed the component we selected supports clock rates over 10 MHz. The amount of returned data grows as well, though. An extended C-band scan using a sweeping window with 2 GHz increments, which is the most detailed measurement this particular component supports, produces about 20 kB of data when a two-port scan is requested. Other measurement configurations are available, but we focus on the most universal component configuration.

The communication protocol imposes special timing constraints, and the SPI transfer, therefore, takes roughly 60 ms. A more time-consuming step is actual signal acquisition, which requires roughly 250 ms. In total, the system can produce full C-band scans three times in a second.

Finally, the most advanced spectrum monitoring capabilities are provided by a special, higher-resolution OCM [33,34], which we used in our latest add/drop ROADM module design [Figs. 2(d) and 3]. The spectrum scan step size is just 312.5 MHz, yielding 15,600 measurement points over the full C-band range (this improvement in optical spectrum resolution is shown in Fig. 4). Each scan produces about 30 kB of data, which requires roughly 50 ms of SPI transfer time. The device requires additional time to acquire the signal and analyze the result, so the end-to-end measurement latency is roughly 750 ms on our system, with a potential for further improvements via request pipelining. In a typical ROADM scenario, the OCM is connected via an optical switch to several measurement points, so the effective scan rate for a given optical port is reduced depending on the user-defined policy. For example, individual client ports might be monitored less often than the output port.

 figure: Fig. 3.

Fig. 3. Lab testing of the updated ROADM electronic interface. A high-resolution OCM is underneath the board.

Download Full Size | PDF

 figure: Fig. 4.

Fig. 4. Comparing the resolution of a flexgrid (black) and a high-resolution (green) OCM.

Download Full Size | PDF

As a practical consideration, the addition of an optical switch imposes additional attenuation to the measurement path and requires extra calibration.

Given an unlimited budget, it is possible to monitor additional properties of the optical signal [35] than just the optical spectrum. The spectrum analysis alone, however, provides significant insight [36] into the performance of an OLS.

3. TELEMETRY PROTOCOLS

Historically, periodic data retrieval from network devices has been handled by a variety of protocols. Some of these protocols were tailor-made for networking devices [3739], while others shared the transport protocol with monitoring solutions from the computing and server world [40,41]. In particular, collectd deserves honorable mention [42] thanks to its efficient binary protocol. When combined with rrdcached, the system easily scaled to almost 500,000 metrics on 2007-era hardware [43].

Since the widespread adoption of the YANG modeling language [44], telemetry protocols that communicate updates to YANG-described data have become more important. The main benefit of a YANG-driven modeling tool is the self-describing nature of the telemetry update stream. A particularly popular choice is gNMI [4547] with roots in the OpenConfig [48] project.

Unfortunately, the OpenConfig project does not fully conform to the YANG standard [49]. A major interoperability concern is the format of the regular expressions where OpenConfig mandates a certain dialect [50] of the Portable Operating System Interface (POSIX) extended regular expressions [51]. Several projects attempted to implement compatibility quirks [5254] to a varying level of success. As of 2021, the situation remains unclear with parts of OpenConfig using both YANG-compliant and POSIX-inspired regular expressions via an extension at the same time. However, some of the converted models still contain regular expression patterns that will reject valid data due to the implicit pattern anchoring in the standard YANG.

Native OpenConfig models are expected to implement two data subtrees at each level: the config container for the desired configuration and the state container for operational state. Based on the problem statement from the OpenConfig project [55] and earlier work by others [56], this was eventually addressed via the Network Management Datastore Architecture [57] (NMDA). This opened up ways for implementing telemetry in a data-agnostic manner on top of generic datastores. The authors decided to investigate the practical utility of the streaming telemetry in a manner which is aligned with the Internet Engineering Task Force (IETF) standards.

A. YANG Push

The IETF started their YANG-based telemetry efforts in 2014 [58], eventually defining a suite of request for comments (RFC) documents [5963]. Conceptually, these documents describe an operating model where the data that is subject to streaming telemetry is described by a YANG model and stored in a datastore. Telemetry receivers have a choice of what changes to subscribe to, how often to send the updates, and what transport mechanism to use. The standards are generic, even though a practical transport method is typically either NETCONF [64] or RESTCONF [65]. Change notifications are either periodic, or—for supported subtrees of YANG datastores—on-change.

The IETF standards define two elementary notification data types that differ in the amount and format of data returned. Using the push-update notification, the YANG datastore is expected to send a complete representation of the full subtree, which is subject of the notification subscription. An example notification in the XML serialization is shown in Fig. 5.

 figure: Fig. 5.

Fig. 5. YANG push-update notification example.

Download Full Size | PDF

On the other hand, the push-change-update notification contains a YANG Patch [66] with an incremental datastore update. An example, once again using the XML serialization, is shown in Fig. 6.

 figure: Fig. 6.

Fig. 6. YANG push-change-update notification example.

Download Full Size | PDF

It is important to note that both of these mechanisms support conveying information about any possible change within the YANG datastore. Compared to the telemetry use-case, which often centers around delivering updates to, essentially, a set of numbers, the mechanism defined in YANG Push is more flexible—and it also comes with a higher overhead. Both of these examples clearly illustrate the amount of boilerplate required for a simple change of two leafs. Perhaps surprisingly, the method based on sending an incremental update actually fares much worse in this trivial example, mainly due to the requirement on using the rather verbose YANG Patch [66] data format—which, on the other hand, can encode complex transformations of YANG trees more effectively.

In both cases, the actual change is delivered as a YANG notification, and therefore the data can be transferred either in JSON or XML formats. After serialization, the change notifications are delivered over an application-specific data channel. In the case of NETCONF [64], the notification stream with XML data is the only standardized option. In the case of RESTCONF [65], the channel is a long-lived HTTP request using the text/event-stream [67] media type, and the serialization is either JSON or XML as determined by the client.

Some experimental drafts exist that try to address the complexity of these serialization formats. Recently, there has been a renewed interest in delivering push notifications via outgoing HTTP connections [68]. It is feasible that a combination of using a compact Concise Binary Object Representation (CBOR) [69] to serialize YANG data [70], possibly with string compaction, will address possible data length concerns and formatting overhead. Some other drafts attempt to remove the overhead of the text-based communication channel even further by moving to User Datagram Protocol (UDP) [71]. Paired with the CBOR encoding, the proposal could reduce the overhead substantially.

B. Modeling the Optical Metrics

In the optical L0 domain, which deals with MCs and optical spectrum, some important metrics are vectors, not scalars. As an example, it is feasible to use a pair of YANG leafs for information about the input and output power at the given optical port, and implicitly assume that the power refers to a full spectrum measurement with negligible contribution of signals from outside of, e.g., the C-band. Extending this measurement to return a per-MC result is straightforward, perhaps via a pair of such leafs in a new container per MC that is configured. However, this approach does not scale to flexgrid spectrum measurement with sub-MC resolution.

In Section 2.B we presented physical capabilities of commercially available OCM modules. Clearly, returning 15,600 individual measurements imposes different performance constraints than pushing a few dozen metrics.

In our previous work [72], we used NETCONF remote procedure calls (RPCs) for triggering the spectrum scan on-demand. The results were returned as a binary YANG type to bypass overhead of the YANG processing libraries; others used a similar approach [39]. This workaround solved the immediate problem, but it required ad hoc support on both sides of the processing pipeline. Because the scan was always triggered on-demand, we were concerned about scalability issues with concurrent access, and the usability for telemetry streaming was limited.

As we will show in Section 4.D, we have extended the YANG models of the ROADM devices so that the frequency of full-band scans is user-defined.

The YANG language supports defining complex structures, and at first glance, using a list data structure keyed by the frequency and returning the measured optical power was the most obvious solution. The advantages of this approach are its simplicity and support for returning arbitrary data, which is possibly not uniformly spaced. The biggest disadvantage is the amount of data returned, where the list key in particular contains a lot of redundant information.

As an alternative, we considered using a leaf-list holding just a vector of measured frequencies. The list is held in a container that contains sibling nodes indicating the lowest frequency and the measurement step size (i.e., frequency resolution).

An example model that explores both approaches is shown in Fig. 7. A real model used in our ROADMs is presented later (Section 4.D).

 figure: Fig. 7.

Fig. 7. YANG tree for OCM spectrum scanning.

Download Full Size | PDF

While initially very promising, we identified significant obstacles in the leaf-list approach. The XPath encoding used for leaf-lists uses values of the individual items as a unique index, which—in practice—makes it impossible to refer to an $n$th element in such a vector directly. It is also very common within the YANG software ecosystem to approach leaf-lists as sets (possibly unordered, possibly ordered, depending on the schema definition). That is, the data structure is usually optimized for insertion, deletion, and membership testing. Those leaf-lists with an ordered-by: user stanza also support explicit reordering operations, but once again these work by referring to items by their value rather than by their index in the enclosing leaf-list. In fact, there is no such concept as an enclosing leaf-list in YANG at all, making it impossible to perform operations such as removal of all leaf-list values [73].

Another complication is the fact that value repetition has only been allowed in YANG since the 1.1 release (p. 11 in [44]). As a result, some popular software might enforce value uniqueness even on state data [74]. It is also common to try to compute diffs and create YANG-level patches when comparing two YANG trees with leaf-lists (cf. Section 4.D).

As a pragmatic approach, we explored the anydata YANG construct (cf. Section 4.D). That way, the vector is transported in the exact same syntax as if it was a leaf-list, while at the same time often bypassing the inappropriate handling of the leaf-list in the YANG software stack.

C. Push, or Pull?

No discussion of streaming telemetry is complete without mentioning the difference between push and pull monitoring [75,76]. The gist of the idea of streaming telemetry is that the telemetry consumer expresses a wish to be notified about certain events without any active polling. This is often explained in contrast to legacy systems with, e.g., Simple Network Management Protocol (SNMP) polling at overly long intervals, such as 15 min. Unlike the old systems, a modern approach based on streaming telemetry should deliver updates to metrics without undue delay.

 figure: Fig. 8.

Fig. 8. Software architecture of the YANG/NETCONF/RESTCONF stack.

Download Full Size | PDF

However, streaming telemetry does not imply an approach where any update is delivered immediately. Indeed, a typical telemetry consumer might not be interested in obtaining on-change, immediate updates for a byte counter on a 600 Gbps DWDM line card. For all practical purposes, even those systems that are physically capable of reading a metric every 100 µs—and a software counter definitely is one of these instances—are throttled to keep the resource usage within a reasonable range. Furthermore, other metrics, especially the optical ones, are derived from physical measurements that require a substantial time for their actual readout.

Care should be also taken not to confuse the choice of the telemetry connection originator with the resulting performance. NETCONF notifications can be delivered over both dial-in and callback connections, and many cloud-native monitoring tools perform a simple periodic poll over HTTP, and yet they are considered to deliver a proper streaming telemetry.

4. SOFTWARE ARCHITECTURE

The basis and the central piece of the whole architecture is sysrepo (Fig. 8). It is a YANG datastore implementation in the form of a library with many features allowing the user to implement all kinds of YANG modules. One such feature is the support for publishing state data, which take the form of specific operational data in the NMDA specification that sysrepo is compliant with. Using this mechanism, applications that run locally on the ROADMs (highlighted in gray in Fig. 8) can communicate their state information to any other processes connected to sysrepo very efficiently.

A. YANG Datastore: sysrepo

sysrepo is a full-fledged YANG datastore with many advanced features [77]. It acts as global data storage for all managed applications or devices on the local system. Inter-process communication (IPC) is implemented using designated shared memory (SHM) segments with two master segments holding all sysrepo global metadata.

The primary feature of sysrepo is storing and providing YANG configuration data. Applications register callbacks to be notified about specific changes in designated subtrees of the YANG datastore. That way, actual device modifications are delegated to the application-specific code, with sysrepo focusing on the generic handling of YANG-formatted data. The operation can be customized via callbacks, so that, e.g., additional integrity constraints can be enforced on top of the standard YANG-level validation. Arbitrary conditions can be verified by the user code, and targeted error messages and return codes can be communicated back to the change originator. Similar principles based on callbacks are also employed for state data, custom RPC invocation, and notification delivery. State data are distinct in several aspects as discussed in the following section.

1. Operational Data

State data, i.e., data tagged with config: false in the YANG model, appear only in the operational datastore as defined by the NMDA. The operational datastore is defined to accurately reflect the current state of the device, which in turn comprises two kinds of data: the system configuration that is actually in effect, and the YANG-level state data. Since the mechanisms for providing both of these are the same in sysrepo, the data are called operational data.

There are two ways of providing operational data by applications, either push or pull. If the pull method is selected, the application registers a callback, which is invoked on-demand, each time a particular YANG subtree is required. As a result, the data are not stored directly in sysrepo. This method is suitable for operational data that either change often, and where application-specific polling could introduce unacceptable latency, or for data that are required only occasionally. The second possibility for providing operational data is by the push method. In that case the application simply sets the specific data the same way they are set for all the other datastores—as if the application was making a change in configuration, for example. It is the domain-specific application that controls when the data are pushed, and the data are stored and cached within sysrepo until the originator decides to update them. As such, this approach is especially suitable for data which are, e.g., created by a periodic process within the NE, or for the data where the device-level code already provides asynchronous, event-based notifications.

While the used application programming interface (API) is the same for all the datastores, the functionality differs significantly. The operational datastore is a complex datastore whose final content depends on several conditions—including the dynamic binding of applications and their respective callbacks to YANG subtrees. The consequence is that push operational data cannot be stored directly as YANG data within sysrepo but rather as the difference between the current data and the desired data. Moreover, whenever the current data change, these stored operational data must be updated to reflect these changes. Despite the complexities, there are advantages to using the push approach. Since the operational data are stored, the application must update them only when they change; otherwise, any further use is managed fully by sysrepo transparently for the application. Also, given that sysrepo is fully aware of the data life cycle and their changes, it is possible to propagate notifications about data changes, possibly using standard protocols—and that is indeed how we implemented the streaming telemetry.

The scope of the sysrepo APIs is internal to the devices, and it predates standardization of various YANG-level protocols for data change handling (cf. Section 3). This protocol independence is intentional because it allows for several such servers to be connected to sysrepo simultaneously while handling only the functionality relevant for the protocol in question.

B. NETCONF Server: netopeer2

netopeer2 [78] is an open-source NETCONF [64] server, which utilizes the libnetconf2 library for all NETCONF-related work, built on top of sysrepo. Internally, netopeer2 connects to sysrepo as any other client, setting up the required callbacks and invoking the APIs in response to incoming NETCONF protocol messages.

Full support for YANG Push, including the dynamic filtering of subscriptions, subscribed notifications, and ad hoc notification channels, is being actively worked on by the authors.

C. RESTCONF Server: rousette

While the netopeer2 project implements a NETCONF server, we started the rousette project to provide a RESTCONF interface to sysrepo. As of March 2021, the project is not RESTCONF-compliant yet, and only the practical minimum for basic telemetry streaming has been implemented. In particular, the server can retrieve subtrees of YANG data in the JSON format, and there is a predefined endpoint that provides a continuous stream of text/event-stream [67] responses for changes in the DWDM-related parameters. The returned data follow the push-update mechanism as defined in YANG Push (Section 3.A). Rousette is an open-source project [79] implemented in C$++$17 using an asynchronous HTTP/2 server library [80].

 figure: Fig. 9.

Fig. 9. YANG tree of the ROADM device model. All nodes support streaming telemetry updates. Some identifiers were shortened.

Download Full Size | PDF

D. Application Code

Changes that are requested within a YANG datatstore are handled by sysrepo and eventually handed over to the application code that is running in the NE. The application runs domain-specific code (pp. 6–8 in [21]), typically with low-level drivers that are specific to the optical components used in a particular device model. For example, our ROADMs support three different OCM models, WSSes, and EDFAs.

Adding support for high-resolution telemetry (cf. Section 3.B) posed a set of challenges for the whole YANG/NETCONF/RESTCONF software stack. Serialization of list and leaf-list data as used for spectrum measurement was of particular interest. Spectrum measurement involves transfers of vectors rather than scalars (cf. Section 2.B), and we conducted measurements analyzing the performance with vectors of 20,000 items. Our profiling data [81] showed that end-to-end latency of about 0.7 s was realistic with list instances, whereas the leaf-list structure—which actually carries less data—initially required 3.4 s. We removed several bottlenecks in the process, and we eventually identified a problem common to many YANG libraries that attempt to store differences of YANG data. Unlike a typical use-case for leaf-list in device configuration that must only contain unique values, vectors of measurement results might contain duplicates. Upon pushing of revised operational data (or, in the case of other libraries, also when receiving the result), the YANG software stack would helpfully try to compute an incremental update of two large vectors. Apart from having an ${\cal O}({n^2})$ time complexity, the reported result would often be incorrect due to their accidental deduplication.

We solved this with a switch to the anydata YANG construct (Figs. 9 and 10). Upon processing the OCM scan result, the application code prepares a pre-serialized array of values that is suitable for transmission over RESTCONF. The downside is that we cannot use the YANG model to enforce, e.g., the type of the individual array items or their allowed numeric range. At the same time, we effectively bypass any bottleneck in the associated YANG libraries, with the resulting data push consuming about 10 ms. The numbers were measured on a modern, high-performance CPU (AMD Ryzen 7 PRO 4750U) (Fig. 11). When ported to the actual ROADMs that use an embedded system with an ARM system on chip (SoC), Marvell Armada 388 88F6828, a dual-core Cortex A9, we observed latency of roughly 80 ms.

 figure: Fig. 10.

Fig. 10. HTTP telemetry stream in the YANG Push push-change format. The JSON payload was reformatted to increase readability. Some data were omitted for brevity.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. Relative performance of list, leaf-list, and anydata in sysrepo when pushing 20,000 items.

Download Full Size | PDF

Replacing the YANG list with the anydata statement is not a final solution; the loss of data validation and the lack of machine readable structure description in particular are significant drawbacks. There is clearly potential for optimization, including a more aggressive grouping of memory allocation for adjacent data tree nodes. However, even after the bottlenecks in libyang and sysrepo are addressed, a similar performance issue can be expected at clients that are YANG-aware to an extent of understanding push updates. Any software that attempts to compute diffs of YANG data trees will be prone to hitting an ${\cal O}({n^2})$ code path as an example.

5. CONSUMING THE DATA STREAM

Generating telemetry data in the NEs is just the first part of the story; the telemetry data need to be retrieved and processed. We acknowledge that there are significant opportunities for applying artificial intelligence and machine learning (AI/ML) into the optical domain [4,8,36], these applications are, however, outside of the scope of this paper.

In our demonstration at OFC 2020 [72], we used a web dashboard for visualizing live spectrum data from eight ROADM modules. In this section we explore how feasible it is to replace this bespoke code with a set of off-the-shelf software that builds on the streaming telemetry. In particular, we investigate plugging the telemetry streams with sub-second refresh rate into a time series database (TSDB) to produce graphs that visualize the C-band spectrum on a live web dashboard. We also demonstrate how we integrated the streaming telemetry support into an SDN controller.

A. Streaming Telemetry in ONOS

The Open Network Operating System (ONOS) [82] is one of the leading SDN controllers, and it has gained remarkable popularity among researchers working in the optical domain thanks to its Open Disaggregated Transport Network (ODTN) [83] project. ODTN follows an operator-driven process, and thanks to ONOS being an open-source project, interested parties can contribute code, ideas, use-cases, and demos.

The initial version of the device driver for our ROADMs relied on NETCONF exclusively. NETCONF was used for device discovery, MC provisioning, and obtaining power readouts on devices’ ports. This meant that the optical power was obtained on-demand via a separate NETCONF protocol transaction.

For this work we contributed [84] a patch that turns ONOS into a telemetry receiver. When the optical power data are first requested, the device driver opens a long-living HTTP connection toward the ROADM and processes the YANG data in the YANG Push format (cf. Section 3.A) directly. During the course of the implementation, we hit some technical issues. In particular, it appears that ONOS currently assumes that each device only ever talks one protocol, which is featured prominently in the DeviceId internal device identifier. We, therefore, had to bypass the RESTCONF driver support and implement the HTTP connection and parse Server-Sent Events directly.

A visual overview of the just-updated snapshot of power levels is shown in Fig. 12. The currently released version of ONOS does not directly utilize the optical power metrics for internal purposes. Once ODTN gets extended with, e.g., ROADM channel power equalization, having access to a stable stream of frequently updated telemetry data will improve the robustness of the control algorithm.

 figure: Fig. 12.

Fig. 12. Live telemetry data are shown in ONOS port view.

Download Full Size | PDF

B. Time Series Database

A TSDB is a special-purpose database that is optimized to store a series of values that change over time. A well-configured TSDB can store tens of millions of individual metrics over time, with progressively degrading time resolution as the data records age. TSDBs have evolved since the time of round-robin databases (RRDs) [85], with Prometheus and InfluxDB appearing to have gained significant traction—especially in the world of containerized services and cloud infrastructure. Given our focus on sub-second telemetry streaming, we chose VictoriaMetrics [86], a relatively new contender that preserves API compatibility with PromQL [87], a popular querying language, as well as the import and export APIs of Prometheus.

A typical modern TSDB supports OpenMetrics [88] as a data ingest format while our devices offer an IETF-aligned YANG Push (Section 3.A) telemetry stream. OpenMetrics, a de facto standard for telemetry scraping, can be adopted as the output format of the NEs. There is currently no standard for mapping YANG-formatted telemetry updates into OpenMetrics keys. As a simplification, we implemented a translating agent in 99 lines [89] of Python code. Our translator connects to a number of devices, listens for YANG Push updates, and extracts a set of chosen telemetry values and feeds them into an OpenMetrics-compliant data collector.

C. Visualization

As a graphical front-end we selected Grafana [90], which is a platform for data visualization through dashboards and metrics. Grafana utilizes a range of plugins for working with time series and tabular data. Accessing data via VictoriaMetrics is natively supported via the PromQL back-end. The concept of plugins is also used for data presentation. Grafana ships with a set of bundled plugins for visualisation of charts, lists, gauge monitors, pie charts, tables, and maps. There are third-party plugins available via an online store, mainly focusing on strongly specific data kinds. Some plugins offload rendering to a third-party JavaScript library such as Plotly [91]. We utilized the natel-plotly-panel [92], which shows a scatter plot of the most recent snapshot of the spectrum.

The result is shown in Fig. 13 (with Fig. 4 shown earlier also rendered via Grafana). We deliberately aimed for a graphical layout similar to our earlier approach, which was implemented via on-device web dashboards (Fig. 4 [72]). We were able to visualize the spectrum, but the sheer number of individual time series metrics, which were required for each graph, are probably already slightly beyond the practical limit [93] of today’s Grafana. It is important to emphasize, though, that we have hit a limit of the graphical visualization tool and not the limit of the telemetry processor itself.

 figure: Fig. 13.

Fig. 13. Optical spectra from a live telemetry stream, as visualized via Grafana.

Download Full Size | PDF

The current state of spectrum visualization is adequate as a static snapshot, but it presents ample opportunities for improvements. Apart from rendering performance, a welcome improvement would be showing the history of the data. The standard approach for spectrum visualization is spectrograms, but an even simpler visualization with persistent spectrum would be a significant improvement [94]. However, a persistent spectrum plot requires essentially a regular histogram (or a time-decaying histogram) per underlying time series, i.e., one histogram per pixel column in the plot. Grafana’s support for histograms (Heatmap) appears rather limited for this purpose as the horizontal resolution of our spectrum is up to 15,600 data sets per one measurement point.

6. CONCLUSION

We have extended our ROADM designs with native support for streaming telemetry of all measured parameters. Of particular interest are high-resolution data from spectrum scans, which now scan the C-band at the resolution of 312.5 MHz. The ROADMs deliver their measurements over an IETF-aligned notification stream that follows the YANG Push specification. We kept our focus on performance, and as a result, the measurements are stored into a standard TSDB at a higher rate than once per second. The resulting telemetry is processed via an open source, high-performance TSDB software stack, and the optical spectra are visualized in real time via a common, off-the-shelf plotting tool, as well as passed to the ONOS SDN controller.

To the best of our knowledge, this is the first time that a sub-second streaming of spectrum data from ROADMs has been practically demonstrated—and we have achieved this milestone with open hardware and a largely open-source software stack.

Visualization of these high-volume data has opportunities for enhancement, and we identified performance bottlenecks in the plotting components. However, we observed no scalability issues in the telemetry streams, metric transport, storage, and analysis, and as such, we did not have to resort to any throttling of metric update rates [39,95] at the devices themselves.

As shown in the existing literature, the collected data provide significant opportunities for AI/ML-driven analysis. Within the YANG middleware, we plan to further optimize the change processing performance, explore more efficient wire protocols, and support finer-grained, dynamic event selection. Additional topics worth exploring include usage of telemetry within the optical domain SDN controller for further network optimization and ROADM control.

Funding

Ministerstvo školství, mládeže a tělovýchovy (LM2018140); Telecom Infra Project (OOPT-PSE).

Acknowledgment

We would like to thank Andrea Campanella, Jan Kofroň, Jan Růžička, and Jakub Mer for their valuable contribution toward this article.

REFERENCES

1. C. Xie, L. Wang, L. Dou, M. Xia, S. Chen, H. Zhang, Z. Sun, and J. Cheng, “Open and disaggregated optical transport networks for data center interconnects [Invited],” J. Opt. Commun. Netw.12, C12–C22 (2020). [CrossRef]  

2. T. Monteiro, “Driving blind? See the light in your optical network using software and automation,” 2020, https://www.infinera.com/blog/driving-blind-see-the-light-in-your-optical-network-using-software-and-automation/tag/software-and-automation/.

3. A. Ferrari, M. Filer, K. Balasubramanian, Y. Yin, E. L. Rouzic, J. Kundrát, G. Grammel, G. Galimberti, and V. Curri, “GNPy: an open source application for physical layer aware open optical networks,” J. Opt. Commun. Netw.12, C31–C40 (2020). [CrossRef]  

4. A. P. Vela, B. Shariati, M. Ruiz, F. Cugini, A. Castro, H. Lu, R. Proietti, J. Comellas, P. Castoldi, S. J. B. Yoo, and L. Velasco, “Soft failure localization during commissioning testing and lightpath operation,” J. Opt. Commun. Netw.10, A27–A36 (2018). [CrossRef]  

5. A. Sadasivarao, S. Jain, S. Syed, K. Pithewan, P. Kantak, B. Lu, and L. Paraschis, “High performance streaming telemetry in optical transport networks,” in Optical Fiber Communication Conference (Optical Society of America, 2018), paper Tu3D.3.

6. A. Sgambelluri, J.-L. Izquierdo-Zaragoza, A. Giorgetti, L. Gifre, L. Velasco, F. Paolucci, N. Sambo, F. Fresi, P. Castoldi, A. C. Piat, R. Morro, E. Riccardi, A. D’Errico, and F. Cugini, “Fully disaggregated ROADM white box with NETCONF/YANG control, telemetry, and machine learning-based monitoring,” in Optical Fiber Communication Conference (Optical Society of America, 2018), paper Tu3D.12.

7. R. Martínez, R. Casellas, J. M. Fabrega, R. Vilalta, R. M. Noz, L. Nadal, M. S. Moreolo, A. Villafranca, and P. Sevillano, “Experimental validation of transport SDN restoration of signal-degraded connections in flexi-grid networks,” in Optical Fiber Communication Conference (Optical Society of America, 2018), paper M3A.6.

8. K. S. Mayer, J. A. Soares, R. P. Pinto, C. E. Rothenberg, D. S. Arantes, and D. A. A. Mello, “Soft failure localization using machine learning with SDN-based network-wide telemetry,” in European Conference on Optical Communications (ECOC) (2020).

9. A. Sgambelluri, A. Giorgetti, D. Scano, F. Cugini, and F. Paolucci, “OpenConfig and OpenROADM automation of operational modes in disaggregated optical networks,” IEEE Access8, 190094–190107 (2020). [CrossRef]  

10. L. Paraschis, H. Bock, A. S. Sadasivarao, S. Syed, B. Sommerkorn-Krombholz, J. Rahn, B. Lu, J. Pedro, P. Doolan, and P. Kandappan, “System innovations in open WDM DCI networks,” Photon. Netw. Commun.40, 269–280 (2020). [CrossRef]  

11. S. Xu, Y. Hirota, M. Shiraiwa, M. Tornatore, S. Ferdousi, Y. Awaji, N. Wada, and B. Mukherjee, “Emergency OPM recreation and telemetry for disaster recovery in optical networks,” J. Lightwave Technol.38, 2656–2668 (2020). [CrossRef]  

12. T. Tanaka, S. Kuwabara, T. Oda, K. Kitamura, F. Inuzuka, and T. Inui, “Advances toward AI-assisted autonomous network diagnosis,” in Asia Communications and Photonics Conference (ACPC) (Optical Society of America, 2019), paper S4C.2.

13. M. Dallaglio, Q. P. Van, F. Boitier, C. Delezoide, D. Verchere, P. Layec, A. Dupas, N. Sambo, S. Bigo, and P. Castoldi, “Demonstration of a SDN-based spectrum monitoring of elastic optical networks,” in Optical Fiber Communication Conference (Optical Society of America, 2017), paper Tu3L.5.

14. K. Ishii, S. Yanagimachi, A. Tajima, and S. Namiki, “Submilisecond control/monitoring of disaggregated optical node through a direct memory access based architecture,” in Optical Fiber Communication Conference (OFC) (Optical Society of America, 2019), paper Tu3H.5.

15. O. F. Yilmaz, S. St-Laurent, and M. Mitchell, “Automated management and control of a multi-vendor disaggregated network at the L0 layer,” in Optical Fiber Communication Conference (Optical Society of America, 2018), paper Tu3D.9.

16. M. Yu, “Network telemetry: towards a top-down approach,” SIGCOMM Comput. Commun. Rev.49, 11–17 (2019). [CrossRef]  

17. L. Tan, W. Su, W. Zhang, J. Lv, Z. Zhang, J. Miao, X. Liu, and N. Li, “In-band network telemetry: a survey,” Comput. Netw.186, 107763 (2021). [CrossRef]  

18. B. Niu, J. Kong, S. Tang, Y. Li, and Z. Zhu, “Visualize your IP-over-optical network in realtime: a P4-based flexible multilayer in-band network telemetry (ML-INT) system,” IEEE Access7, 82413–82423 (2019). [CrossRef]  

19. K. Christodoulopoulos, P. Kokkinos, A. Di Giglio, A. Pagano, N. Argyris, C. Spatharakis, S. Dris, H. Avramopoulos, J. Antona, C. Delezoide, P. Jennevé, J. Pesic, Y. Pointurier, N. Sambo, F. Cugini, P. Castoldi, G. Bernini, G. Carrozzo, and E. Varvarigos, “ORCHESTRA–optical performance monitoring enabling flexible networking,” in 17th International Conference on Transparent Optical Networks (ICTON) (2015).

20. J. Kundrát, J. Vojtěch, P. Škoda, R. Vohnout, J. Radil, and O. Havliš, “YANG/NETCONF ROADM: evolving open DWDM toward SDN applications,” J. Lightwave Technol.36, 3105–3114 (2018). [CrossRef]  

21. J. Kundrát, O. Havliš, J. Jedlinský, and J. Vojtěch, “Opening up ROADMs: let us build a disaggregated open optical line system,” J. Lightwave Technol.37, 4041–4051 (2019). [CrossRef]  

22. J. Kundrát, O. Havliš, J. Radil, J. Jedlinský, and J. Vojtěch, “Opening up ROADMs: a filterless add/drop module for coherent-detection signals,” J. Opt. Commun. Netw.12, C41–C49 (2020). [CrossRef]  

23. “Czech Light open line system,” 2019, https://czechlight.cesnet.cz/en/open-line-system/sdn-roadm.

24. “Architecture of optical transport networks,” ITU-T Recommendation G.872, 2012.

25. F. Paolucci and A. Sgambelluri, “Telemetry in disaggregated optical networks,” in International Conference on Optical Network Design and Modeling (ONDM) (2020).

26. D. C. Kilper, R. Bach, D. J. Blumenthal, D. Einstein, T. Landolsi, L. Ostar, M. Preiss, and A. E. Willner, “Optical performance monitoring,” J. Lightwave Technol.22, 294–304 (2004). [CrossRef]  

27. M. Šlapák, J. Vojtěch, O. Havliš, and R. Slavík, “Monitoring of fibre optic links with a machine learning-assisted low-cost polarimeter,” IEEE Access8, 183965–183971 (2020). [CrossRef]  

28. M. Filer and S. Tibuleac, “N-degree ROADM architecture comparison: broadcast-and-select versus route-and-select in 120 Gb/s DP-QPSK transmission systems,” in Optical Fiber Communication Conference (OFC) (Optical Society of America, 2014), paper Th1I.2.

29. L. Alahdab, E. Le Rouzic, C. Ware, J. Meuric, A. Triki, J.-L. Augé, and T. Marcot, “Alien wavelengths over optical transport networks,” J. Opt. Commun. Netw.10, 878–888 (2018). [CrossRef]  

30. H. Wessing, P. Skoda, M. N. Petersen, A. Pilimon, P. Rydlichowski, G. Roberts, R. Smets, J. Radil, J. Vojtech, R. Lund, Z. Zhou, C. Tziouvaras, and K. Bozorgebrahimi, “Alien wavelengths in national research and education network infrastructures based on open line systems: challenges and opportunities,” J. Opt. Commun. Netw.11, 118–129 (2019). [CrossRef]  

31. C. Tian and S. Kinoshita, “Analysis and control of transient dynamics of EDFA pumped by 1480- and 980-nm lasers,” J. Lightwave Technol.21, 1728–1734 (2003). [CrossRef]  

32. H. S. Carvalho, I. J. G. Cassimiro, F. H. C. S. Filho, J. R. F. de Oliveira, and A. C. Bordonalli, “AGC EDFA transient suppression algorithm assisted by cognitive neural network,” in International Telecommunications Symposium (ITS) (2014).

33. H. Rosenfeldt, I. Clarke, S. Frisken, G. Dash, X. Huang, H. Li, W. Cui, J. Zhang, J. Chen, Z. Kong, and S. Poole, “Miniaturized heterodyne channel monitor with tone detection,” in Optical Fiber Communication Conference (Optical Society of America, 2015), paper W4D.7.

34. “Flexgrid high resolution optical channel monitor (OCM) FOCM01FXC1MN,” https://ii-vi.com/product/high-resolution-optical-channel-monitor-focm-series/.

35. Z. Dong, F. N. Khan, Q. Sui, K. Zhong, C. Lu, and A. P. T. Lau, “Optical performance monitoring: a review of current and future technologies,” J. Lightwave Technol.34, 525–543 (2016). [CrossRef]  

36. B. Shariati, M. Ruiz, J. Comellas, and L. Velasco, “Learning from the optical spectrum: failure detection and identification,” J. Lightwave Technol.37, 433–440 (2019). [CrossRef]  

37. F. Paolucci, A. Sgambelluri, F. Cugini, and P. Castoldi, “Network telemetry streaming services in SDN-based disaggregated optical networks,” J. Lightwave Technol.36, 3142–3149 (2018). [CrossRef]  

38. B. Bullers, “True unmanned telemetry collection using OC-12 network data forwarding,” in International Telemetering Conference Proceedings (2003).

39. A. Sadasivarao, S. Syed, D. Panda, P. Gomes, R. Rao, J. Buset, L. Paraschis, J. Brar, and K. Raj, “Demonstration of extensible threshold-based streaming telemetry for open DWDM analytics and verification,” in Optical Fiber Communication Conference (OFC) (Optical Society of America, 2020), paper M3Z.5.

40. D. Josephsen, “iVoyeur: rediscovering collectd,” ;login:39, 52–54 (2014).

41. L. Gardi, “Hardware monitoring with collectd,” Tech. Rep. (2018).

42. J. Kundrát, M. Adam, D. Adamová, J. Chudoba, T. Kouba, M. Lokajček, A. Mikula, V. Říkal, J. Švec, and R. Vohnout, “Grids and clouds in the Czech NGI,” Phys. Part. Nucl. Lett.13, 669–671 (2016). [CrossRef]  

43. D. Plonka, A. Gupta, and D. Carder, “Application buffer-cache management for performance: running the world’s largest MRTG,” in Large Installation System Administration Conference (LISA) (2007), pp. 63–78.

44. M. Bjorklund, “The YANG 1.1 data modeling language,” RFC 7950 (RFC Editor, 2016).

45. B. Claise, J. Clarke, and J. Lindblad, Network Programmability with YANG: The Structure of Network Automation with YANG, NETCONF, RESTCONF, and gNMI (Addison-Wesley Professional, 2019).

46. R. Vilalta, C. Manso, N. Yoshikane, R. Muñoz, R. Casellas, R. Martínez, T. Tsuritani, and I. Morita, “Telemetry-enabled cloud-native transport SDN controller for real-time monitoring of optical transponders using gNMI,” in European Conference on Optical Communications (ECOC) (2020).

47. A. Shaikh and J. George, SDN in the Management Plane: OpenConfig and Streaming Telemetry (2015).

48. A. Shaikh, T. Hofmeister, V. Dangui, and V. Vusirikala, “Vendor-neutral network representations for transport SDN,” in Optical Fiber Communication Conference (Optical Society of America, 2016), paper Th4G.3.

49. R. Shakir and A. Shaikh, “Observations on modelling configuration and state in YANG,” in Routing Area Working Group Session at IETF’98, Chicago, 2017, https://datatracker.ietf.org/meeting/98/materials/slides-98-rtgwg-open-config-modeling-and-observations-00.

50. “OpenConfig issue 44: wrong pattern in openconfig files,” 2017, https://github.com/openconfig/public/issues/44.

51. “Regular Expressions, IEEE Std 1003.1-2017 (revision of IEEE Std 1003.1-2008)—IEEE standard for information technology—portable operating system interface (POSIX) base specifications, issue 7,” 2018, https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html.

52. “Why doesn’t my OpenConfig YANG module work with netconfd-pro or yangcli-pro?” 2018, https://yumaworks.freshdesk.com/support/solutions/articles/1000254022-why-doesn-t-my-openconfig-yang-module-work-with-netconfd-pro-or-yangcli-pro-.

53. “CZ-NIC/yangson: honor regexp-posix in openconfig models,” 2019, https://github.com/CZ-NIC/yangson/issues/23.

54. “p4lang/PI: Remove regex anchors when importing OpenConfig YANG,” 2018, https://github.com/p4lang/PI/commit/109105775aaff9e5eadb67dead05e74597aa59ec.

55. R. Shakir, A. Shaikh, and M. Hines, “Consistent modeling of operational state data in YANG,” Internet-Draft draft-openconfig-netmod-opstate-01 (IETF Secretariat, 2015).

56. M. Björklund and L. Lhotka, “Operational data in NETCONF and YANG,” Internet-Draft draft-bjorklund-netmod-operational-00 (IETF Secretariat, 2012).

57. M. Björklund, J. Schönwälder, P. Shafer, K. Watsen, and R. Wilton, “Network management datastore architecture (NMDA),” RFC 8342 (RFC Editor, 2018).

58. B. Claise, “Model-driven Telemetry: IETF YANG Push and/or OpenConfig Streaming Telemetry?” 2020, https://www.claise.be/model-driven-telemetry-ietf-yang-push-and-or-openconfig-streaming-telemetry/.

59. E. Voit, A. Clemm, and A. G. Prieto, “Requirements for subscription to YANG datastores,” RFC 7923 (RFC Editor, 2016).

60. E. Voit, A. Clemm, A. G. Prieto, E. Nilsen-Nygaard, and A. Tripathy, “Subscription to YANG notifications,” RFC 8639 (RFC Editor, 2019).

61. A. Clemm and E. Voit, “Subscription to YANG notifications for datastore updates,” RFC 8641 (RFC Editor, 2019).

62. E. Voit, A. Clemm, A. G. Prieto, E. Nilsen-Nygaard, and A. Tripathy, “Dynamic subscription to YANG events and datastores over NETCONF,” RFC 8640 (RFC Editor, 2019).

63. E. Voit, R. Rahman, E. Nilsen-Nygaard, A. Clemm, and A. Bierman, “Dynamic subscription to YANG events and datastores over RESTCONF,” RFC 8650 (RFC Editor, 2019).

64. R. Enns, M. Bjorklund, J. Schoenwaelder, and A. Bierman, “Network configuration protocol (NETCONF),” RFC 6241 (RFC Editor, 2011).

65. A. Bierman, M. Bjorklund, and K. Watsen, “RESTCONF protocol,” RFC 8040 (RFC Editor, 2017).

66. A. Bierman, M. Bjorklund, and K. Watsen, “YANG patch media type,” RFC 8072 (RFC Editor, 2017).

67. I. Hickson, “Server-sent events,” W3C Recommendation (W3C, 2015), https://www.w3.org/TR/2015/REC-eventsource-20150203/.

68. M. Jethanandani and K. Watsen, “An HTTPS-based transport for configured subscriptions,” Internet-Draft draft-ietf-netconf-https-notif-07 (IETF Secretariat, 2021).

69. C. Bormann and P. Hoffman, “Concise binary object representation (CBOR),” STD 94 (RFC Editor, 2020).

70. M. Veillette, I. Petrov, and A. Pelov, “CBOR encoding of data modeled with YANG,” Internet-Draft draft-ietf-core-yang-cbor-15 (IETF Secretariat, 2021).

71. G. Zheng, T. Zhou, T. Graf, P. Francois, and P. Lucente, “UDP-based transport for configured subscriptions,” Internet-Draft draft-ietf-netconf-udp-notif-01 (IETF Secretariat, 2020).

72. J. Kundrát, A. Campanella, E. Le Rouzic, A. Ferrari, O. Havliš, M. Hažlinský, G. Grammel, G. Galimberti, and V. Curri, “Physical-layer awareness: GNPy and ONOS for end-to-end circuits in disaggregated networks,” in Optical Fiber Communication Conference (OFC) (Optical Society of America, 2020), paper M3Z.17.

73. “How do I delete all list or leaf-list entries at once in NETCONF?” 2020. https://yumaworks.freshdesk.com/support/solutions/articles/1000254811-how-do-i-delete-all-list-or-leaf-list-entries-at-once-in-netconf-.

74. “pyangbind issue 179: Leaf-list stores multiple duplicate values,” 2018, https://github.com/robshakir/pyangbind/issues/179.

75. D. Rafique and L. Velasco, “Machine learning for network automation: overview, architecture, and applications [Invited Tutorial],” J. Opt. Commun. Netw.10, D126–D143 (2018). [CrossRef]  

76. J. Volz, “Pull doesn’t scale—or does it?” 2016, https://prometheus.io/blog/2016/07/23/pull-does-not-scale-or-does-it/.

77. CESNET, “Sysrepo: storing and managing YANG-based configurations for UNIX/Linux applications,” https://www.sysrepo.org/.

78. CESNET, “netopeer2: Netopeer2–NETCONF Server,” 2021, https://github.com/CESNET/Netopeer2/.

79. CESNET, “rousette: An almost-RESTCONF server,” 2021, https://github.com/CESNET/rousette.

80. T. Tsujikawa, “Nghttp2: HTTP/2 C and C++ library,” 2015, https://nghttp2.org/.

81. “sysrepo issue 2288: Pushing 20k items into a leaf-list in the operational DS,” https://github.com/sysrepo/sysrepo/issues/2288.

82. P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor, P. Radoslavov, W. Snow, and G. Parulkar, “ONOS: towards an open, distributed SDN OS,” in Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking (HotSDN) (Association for Computing Machinery, 2014).

83. A. Campanella, B. Yan, R. Casellas, A. Giorgetti, V. Lopez, Y. Zhao, and A. Mayoral, “Reliable optical networks with ODTN: resiliency and fail-over in data and control planes,” J. Lightwave Technol.38, 2755–2764 (2020). [CrossRef]  

84. J. Kundrát, “ONOS Gerrit Change 24455: CzechLight: Handle incoming streaming telemetry,” 2021, https://gerrit.onosproject.org/c/onos/+/24455.

85. T. Oetiker, “Monitoring your IT gear: the MRTG story,” IT Prof.3, 44–48 (2001). [CrossRef]  

86. “Victoria metrics: the aspiring open source monitoring solution,” 2018, https://victoriametrics.com/.

87. N. Sabharwal and P. Pandey, Working with Prometheus Query Language (PromQL) (Apress, 2020), pp. 141–167.

88. R. Hartmann, B. Kochie, B. Brazil, and R. Skillington, “OpenMetrics, a cloud-native, highly scalable metrics protocol,” Internet-Draft draft-richih-opsawg-openmetrics-00 (IETF Secretariat, 2020).

89. CESNET, “rupicapra: Reading the YANG push telemetry stream,” 2021, https://github.com/CESNET/rupicapra.

90. “Grafana—The analytics platform,” 2021, https://grafana.com/grafana/.

91. “Plotly: The front end for ML and data science models,” 2021, https://plotly.com.

92. Natel Energy, “Plot.ly Panel for Grafana,” 2019, https://grafana.com/grafana/plugins/natel-plotly-panel/.

93. R. Callon, “The twelve networking truths,” RFC 1925 (RFC Editor, 1996).

94. B. Zarlingo, “Analyze agile or elusive signals using real-time measurement and triggering,” Agilent Technologies Inc., 2013, https://www.keysight.com/us/en/assets/9018-50150/training-materials/9018-50150.pdf.

95. T. Tanaka, S. Kuwabara, H. Nishizawa, T. Inui, S. Kobayashi, and A. Hirano, “Field demonstration of real-time optical network diagnosis using deep neural network and telemetry,” in Optical Fiber Communication Conference (OFC) (Optical Society of America, 2019), paper Tu2E.5.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (13)

Fig. 1.
Fig. 1. Monitoring capabilities within a ROADM node.
Fig. 2.
Fig. 2. Add/drop architectures determine physical measurement capabilities. (a) Passive A/D for two-degree ROADMs. (b) Coherent A/D. (c) Twin WSSes with dual-port OCM. (d) High-resolution measurement at client ports.
Fig. 3.
Fig. 3. Lab testing of the updated ROADM electronic interface. A high-resolution OCM is underneath the board.
Fig. 4.
Fig. 4. Comparing the resolution of a flexgrid (black) and a high-resolution (green) OCM.
Fig. 5.
Fig. 5. YANG push-update notification example.
Fig. 6.
Fig. 6. YANG push-change-update notification example.
Fig. 7.
Fig. 7. YANG tree for OCM spectrum scanning.
Fig. 8.
Fig. 8. Software architecture of the YANG/NETCONF/RESTCONF stack.
Fig. 9.
Fig. 9. YANG tree of the ROADM device model. All nodes support streaming telemetry updates. Some identifiers were shortened.
Fig. 10.
Fig. 10. HTTP telemetry stream in the YANG Push push-change format. The JSON payload was reformatted to increase readability. Some data were omitted for brevity.
Fig. 11.
Fig. 11. Relative performance of list, leaf-list, and anydata in sysrepo when pushing 20,000 items.
Fig. 12.
Fig. 12. Live telemetry data are shown in ONOS port view.
Fig. 13.
Fig. 13. Optical spectra from a live telemetry stream, as visualized via Grafana.

Tables (1)

Tables Icon

Table 1. Measurement Capabilities versus Measurement Latency

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.