Monday, November 24, 2014

SDN, OpenFlow, and Google

SDN for Everyone!

Let's face it, everyone is talking about SDN-- Software Defined Networking. Vendors all up and down the infrastructure stack are evangelizing SDN. As a future-focused kind of engineer I recently had a chance to watch what Goolge is doing ( has done ) with OpenFlow in their environment  ( video ). From a conceptual perspective, SDN isn't new-- we've used SDN in various incarnations since the dawn of networking. For example, leveraging SNMP to send a trap to an NMS and having that NMS server send an SNMP set to bring down a link or backup a configuration. Another example would be Remotely Triggered Black Hole routing ( RFC5635 ) whereby a controlling component signals all other nodes to discard a source or a destination IP address ( in this case the triggering component could be an IPS/IDS or an application that detects unwanted behavior). Still another would be RSVP ( RFC2205 )which is a solid first-attempt at making the network more application aware.

Leading from in Front

What was so impressive about the Google talk was not the technology but the leadership and focus on Engineering. In the end, the business needs to run applications that people use to bring in money-- its not particularly interested in how the network provides such a service. Introducing OpenFlow, as Urs explained, was not a small-risk proposition. Certainly any business would find it better to stick with a system that works now than to go with a system that may work later ( or otherwise has characteristics that may be useful in the future). There's where the technical leadership comes in. All too often the best technical solution is cast aside for the status quo or the "tried and true." Google's OpenFlow rollout is a prime example of the technical organization understanding the risks, deciding the path, executing, and, hopefully reaping the rewards.

Cannot Improve What you Can't Measure

Before Google rolled out a single change, they took the time and invested in making sure they could properly test the change. What that meant for Google was building a simulation environment--it paid big dividends in that they could prototype their controller in an environment that closely matched the operational parameters and characteristics of their network. Without this, the development-deployment feedback loop is broken as there's no way to precisely measure the impact of your change. In this case, the change is in shifting how path selection is done. Instead of a report driven, trending, analysis loop--path selection is near real-time, based on the real-time constraints of the network and needs of the application. Google envisioned the tool needed to test the product, not just the tool itself.