Introduction

Last modified by Kevin Austin on 2018/11/27 04:10

This document describes OpenKilda design including the design and the interactions of the subsystems.

kilda_block_diagram.png

Apache Storm

OpenKilda uses Apache Storm to provide a real-time correlation engine that allows you to monitor different events to determine if a flow, or an end to end service requires a topology change. By using Apache Storm, you can take multiple data points to detect a real topology change and try to reroute the flows.

Storm topologies are a collection of spouts and bolts. Spouts are a way to inject a tuple (data point) into a topology of bolts, and bolts are where it processes the information. Bolts are similar in concept to a microservice, but smaller, the functions they provide are very specific. Every bolt can have multiple instances running at the same time to parallelize at the individual bolt level, so you can have immense horizontal scale.

A tuple will come into Storm, it will be broken up into smaller pieces and it’s validated to make sure it’s the right data points, after validation it sent on to the next bolt. An example workflow for stats would be:

Screen Shot 2018-11-12 at 8.11.55 PM.png

Visualization of "stats" Storm Topology

  1. Floodlight receives a stats message for port information from a switch and passes the payload to Kafka
  2. Kafka passes the payload to a Stats Topology in Storm
  3. Storm classifies the messages into what kind of stats have been received
  4. Storm breaks the payload down into individual payloads for each stat and port from the switch and puts it into a specific message format
  5. Storm passes the message for each of the datapoints for every port back to Kafka
  6. Another topology pulls those messages back out of Kafka into a cache for OpenTDSB. OpenTDSB does a compare and if the data hasn’t changed in 10 minutes it discards the data, if the data has changed within the past 10 minutes it writes the data via a micro-cache mechanism into hbase (for use with OpenTSDB).

Storm Topologies

The two storm topologies that send messages back through the speakers are:

  • Flow Modifications
  • ISL Discovery
  • Switch Controller (coming soon)

The state for Flow Modifications and ISL Discovery are always kept in Storm for OpenKilda. Currently, OpenKilda allows you to turn ports on and off, the switch controller topology will allow you to set port speeds as well, give a list of every switch and an inventory of ports for each switch.

Floodlight – OpenFlow Speaker

Floodlight is used as a lightweight OpenFlow speaker, essentially a translator, which talks OpenFlow on the southbound interface and it listens to the Kafka queues to the north. Floodlight does not keep any state information except for what switches are registered to it. Floodlight sends a message at specified intervals to the switches to send flow and port stats request. Most of the features of floodlight are turned off except for the topology features. OpenKilda does add modifications to floodlight that are in the /services/src/floodlight-modules directory.

In the future, OpenKilda will support a model where Floodlight can be deployed with multiple instances that can reside locally to switches in different geographic regions or have dedicated speakers for different message types (discovery packets, flow updates, flow stats, etc) for better resiliency.

OpenKilda has a pseudo-implementation of priority queues. Currently there are two listeners that Floodlight pulls messages from Kafka spread across four threads:

  • Flow Mod operations
  • ISL Discovery topics

There are several queues when Floodlight pushes messages to Kafka. It spreads those messages across different queues depending on the type of message.  The different queues are:

  • Stats
  • topology discovery
  • northbound interface queue

OpenTDSB/HBase

The DatapointParseBolt validates the message format and does field-based grouping based on the tuple, creates a hash on the switch ID, port and the data point that’s collected. If it’s a flow it would be the flowID and the direction of the data point. The result is sent to OpenTSDBFilterBolt where it does a compare to see if the data has changed in the last 10 minutes and discards the data if there is no change. The otsdb-bolt will commit new or updated messages to TSDB.

Screen Shot 2018-11-13 at 8.52.01 PM.png

To scale the process, you can have multiples of the same bolts or workers. The incoming data and will be sharded across the workers using the hash that is created in the DatapointParseBolt. If one of the workers is lost, Storm will automatically start a new worker and assign that hash to the new worker. If a new worker is spawned for scaling purposes new incoming hashes will be assigned to that new worker.

Proactive OpenFlow Model

In the event that the controller loses connectivity with the network, the flows within the data-plane still continue to operate. This is the same model of traditional switches and routers losing the host control processor. When the controller reconnects with the switches, the flow information within the switches is not erased, the controller uses the switches to learn the state of the network and reconciles inconsistencies.  This is especially useful for catastrophic situations where the Controller has to restart from with no network state in its’ database.

Neo4j

OpenKilda does not use any of the graphing features within Neo4j, but it can be used for diagnosing.a problem through it’s GUI using cypher queries. Flows get committed, switches get committed, ISLs get committed to neo4j. 

Neo4j is not used as the path computation engine since its shortest path algorithm does not deal with islands of switches. 

Path Computation Engine

In OpenKilda, you define a source port and a destination port, OpenKilda will program all of the switches in the path for that flow individually. The PCE will account for link utilization so that you do not oversubscribe any individual link.

The PCE is a breadth first algorithm written in java that is called from the flow topology. A breadth first algorithm can deal with negative cost and OpenKilda can pass in other variables to create a path. For instance, one of the features in OpenKilda is “negative affinity” where you can specify a switch that should not be considered in the path of the flow.

Flow path creation

To illustrate how the systems in OpenKilda work together we can use the creation of a flow as the example. When a user makes a request through the Northbound Interface, the request is forwarded to the Flow Topology through a Kafka topic. The Flow Topology will get a path from the Path Computation Engine and send a series of flow commands to the speaker (Floodlight) to program each of the switches in the path of the flow. After this is completed successfully, the path is committed, and a response is sent to the Northbound API.

flow_path_timing_diagram.png

 

Tags:
Created by Kevin Austin on 2018/08/25 16:17
©2018 OpenKilda