What happened to the OpenFlow dream?

Wednesday, 29 Apr 2020

Apparently I've had this blog idea in draft for nearly two years, since April 2018, so it probably seems apt a time to expand on it. I hope this won't come out as one of those twee "Thot Leader" pieces, as I'm certainly not; nor, like many Thot Leaders, do I have any authority in telling anyone what to do or think. But I've had a drink, so I will wax lyrical about the topic with an opinion or two.

The OpenFlow Dream

Cast your mind back to 2018 (or perhaps a few years earlier if you're lucky enough to not work in #PubSec), and the two things you'd have heard ad infinitum in the Twitterverse and Tech World:

  1. Software Defined Networking (SDN) will take over all teh Networkz!!!11!!
  2. omgomgOMG OpenFlow is teh only futurez!!1111!! CLI is lame lmao

Well, OK - maybe not; maybe it's unfair to think the Thot Leaders could spell that well or luck out on sentence structure, but humour me a little. Certainly from where I stood (or sat), everyone seemed to be banging on about a centralised SDN Controller of some sort controlling all Network Control Planes ever, across the Data Centre, Enterprise, Wireless and, well, pretty much everything. For some reason everyone had simultaneously seen the light around the death of "One box, one job" (i.e. Firewall is one box; Router is another box; Switch is another box; DDoS is another box; IPS is...) while also eschewing a utopian future with one centralised Controller that would replace all these middleboxen, and essentially become the entire Network.

The Rhetoric

Irn Bru must get through

It was compelling, you could even argue it was so obvious it was revolutionary - none of these middleboxes existed just 5-10 years prior, when mostly we were worried about Routers and Switches (or for the unlucky of you, the Core Switch Routing Modules - I still have the scars, Cisco...) - all we previously had to manage was maybe a handful of L2 Access Switches or ToRs, and a few Campus Edge Routers, and away we went. But then the fleets of middleboxes came, and suddenly the Campus or Data Centre looked like some horrific mash up of Lego meets Playmobil meets Duplo: nothing quite fitted, everything was managed in something else, and it was bloody impossible to even think about driving it based on higher-level abstractions like "Intent".

So when someone turned around and said, "Yeah, sack all those boxes off; bang in an OpenFlow Controller, and it'll do the whole lot for you, mate" people listened. Many people listened in fact, and I was definitely one of them - which led to this:

Northbound Networks ZodiacFX OpenFlow Switch

The Northbound Networks ZodiacFX

Buzzword happy, imagine my excitement when Northbound Networks announced a small, mini-USB powered 4-port OpenFlow Switch - I didn't even mind my PayPal being charged in AUD to get my hands on one, this was the future! After a few weeks of international shipping, when it arrived I immediately got to work messing with OpenFlow Controllers like:

Faucet was probably my favourite (I'm a sucker for a cool name or logo, I really am) - and it was great. I sat through hours of tutorials from people like David Bombal and INE Instructor Jasson Casey and duly saw that I could program an OpenFlow Controller to per-packet police, Access Control, Rate Limit and basically do anything I'd previously had to do on a fleet of middleboxes. This was great! The future was surely OpenFlow, all hail the new king... right?

Well, no - wrong. Just like the yo-yo craze had quickly came and gone in my 90's childhood (I'm bitter, ProYo, I couldn't afford you at the time), so too did my love for OpenFlow near-overnight evaporate. But why? Am I just another Network Buzzword Jockey (well maybe, but let's keep some of my fragile ego intact...)? Did I not "get" the OpenFlow paradigm shift? What was going on?

The OSI Model

Remember this, the stupid thing they force you to learn by rote in Computer Science and Cisco Exams:

  1. Application
  2. Presentation
  3. Session
  4. Transport
  5. Network
  6. Data Link
  7. Physical

You learn it, you don't really understand why you're learning it, you spit it out for the exam; you move on with your life (wholly indoors, during this time of COVID-19, for the history books [Hi, Archive.org waves]). A few years later, you get a job in IT/Telco/Networking, maybe a few people superficially reference it now and again; you're still not getting it. Then, in more recent years, people say a few phrases that really make it "click" in your head - one of them being "up the stack".

Up the Stack

The Stack being a allegorical notion, it has a bottom, a middle, and a top - and is largely used in the IT/Networking field when talking to a collection of all the Routers, Switches, Servers, Middleboxes and so on you require to host a given Application/Set of Applications/"Service". For me, in the CapEx-rich/OpEx-poor Public Sector world, this often means one unique stack of Routers/Switches/Servers/etc per Application/"Service", due to a unique Project-driven focus in the world that no Real World Private Sector firm could afford to employ (you do "Multi-tenant", I do "Same tenant, second house on the street"). Either way, "the Stack" - for me as an Infrastructure Guy - refers to the collection of stuff that makes this up. If you were a Software Girl, this might refer to the stack of middleware (SQL, MongoDB, Amazon SQS etc) as well as the data structures and ultimately code or language binaries/compilers that run your Application - it's much the same concept, you'd just be decomposing an upper-part of the OSI Model.

Which aptly brings me to my point; Infrastructure peoples like me are at the "bottom of the Stack"; Software peoples are instead at the "top of the Stack". A major paradigm shift did indeed occur around 2016-2018, but it wasn't SDN; it was something inspired by the notion that Software is Eating The World, and it moved the problem "up the Stack" (see, that OSI Model is useful - as a shared point of reference - that's the bit they neglect to tell you in school). Where to? Glad you asked.

Kubernetes and the Sunshine Gang

"Na, na, na, na, na..." Alright, I give (it) up, that's not going to work. By now (2016 for the Real World, no idea when for the PubSec world), something called k8s was taking over mindshare from OpenFlow, and SDN in general, and really driving home the point of software being the driving force of the IT world. The same was (and is) happening in the world of Cloud vs On-Prem; the true value of Cloud is in it's higher-layer abstraction and orchestration capabilities (up that Stack again), not because it's marginally cheaper than on-Prem. There are many more things you can do in the world of k8s, OpenShift, EKS et al around overcoming the "middlebox problem" - and pushing it back to where it belongs/who knows the most about it (the Application and Software Peoples) - than using an SDN-backed approach.

Regarding the term "On-Prem" - I know, I know; but it sounds cool. Fight me.

Kubernetes searches on Google Search Trends

The nail in the OpenFlow coffin

I don't doubt that OpenFlow has a few valid use cases in the real world, but they are few and far between; but for the main, the bitter cold truth is the IT world is, currently, split into sects aligned around two disciplines:

  1. Developers
  2. Operators

How do I prove I'm right here? DevOps - you can't have an abbreviation based on tribes that don't exist. You know what this means in practice, or did mean in practice? Tribes of people, aligned to that pesky OSI Model:

  1. Applicationy Peoples
    1. Aligned to OSI L4-7
  2. Infrastructurey Peoples
    1. Aligned to OSI L1-3

How do I prove I'm right here? Middleboxes - those things that were neither OSI L3 nor OSI L4, they were a bit of both. You know why they were always a pain in the arse? Because they were trying to do with physical kit what DevOps is trying to do with people; align the tribes.

As the world has progressively moved up the Stack, and in doing so to "enabling the Application" (and by extension, the Developer), unfortunately the Infrastructure Peoples (myself included) have become less relevant. With k8s and it's ilk, no longer do you need us to mangle some middlebox via Chinese Whispers ("I'm sure he said TCP/1521? That's the Oracle DB Port isn't it? I'll set up a Load Balancer VIP Pool for that, not got time to ask him..."); you can do it yourself with things like Istio, Envoy and other cool-kid stuff I'm not Dev enough to do.

And frankly, why wouldn't you? It's your App; you built it (or are unlucky enough to be charged with keeping its COTS form alive and kicking); you know what it does, what it needs to do, and what it doesn't need to do. Us Infrastructure folk, frankly, don't; and we don't really care, because we're too busy tweaking various OSI L1-3 knobs to stop everything setting on fire.

Which brings me nicely back to OpenFlow. Sure, it's a great idea; but it's an Infrastructure person's view of the world; not a Developer's view of the world. It's us as Infrastructure folk trying to bring our detailed, abstraction-averse OSI L1-3 thinking (as you go down the Stack, the level of detail for any one given Layer goes up inversely) up the Stack to the more abstraction-dependent OSI L4-7 thinking of the Developer folk. Sprinkle on some organisation politics; Development vs Operations tribal thinking and add in some Enterprise "JFDI it, my golf mate wrote it, we must use Crapplication 1.2 now!" and you've got a recipe for a self-tapping hammer for the nail in the OpenFlow coffin.

Farewell, OpenFlow - I hardly knew ye.

Hitlessly change a Cisco switchport from an Access to a Trunk Port in Production

Monday, 27 Apr 2020

So you've got an Access Port on a Cisco Switch going down to a Host/Server of some sort, and for whatever reason you then swap out that Server for maybe a Virtualisation Host (ESXi, oVirt, Docker, k3s....), or in my case a Cisco WAVE594 with a built-in Tyan AST2050 BMC (where the iLO shares the same physical NIC as the Server, no seperate Mgmt Port) - and you want to do it hitlessly. Hmm, sounds like a good challenge to me!

Draw it up then!

What we have is:

  • Left-hand side (before)
    • Baremetal Server SERVER Eth0 connected to Cisco Switch SWITCH Gig 1/0/9
    • Eth0 set as 10.0.0.9/24 without VLAN tagging/membership
    • Cisco Switch SWITCH Gig 1/0/9 set as Member (Access) of VLAN 99
    • VLAN 99 associated with SVI99 10.0.0.1/24 for inter-VLAN Routing
  • Right-hand side (after)
    • Baremetal Server SERVER Eth0 connected to Cisco Switch SWITCH Gig 1/0/9
    • Eth0 associated with Software vSwitch (or similar)
    • Internal Mgmt Loopback/Interface associated with Physical NIC Eth0, set as 10.0.0.9/24 without VLAN tagging/membership
    • Virtual Machine VIRTUALMACHINE vNIC0 associated ultimately with SERVER Eth0 pNIC, but set as 10.2.0.28/24 and to tag VLAN 86
    • Cisco Switch SWITCH Gig 1/0/9 set as Trunk and Native (Access) for VLAN 99
    • VLAN 99 associated with SVI99 10.0.0.1/24 for inter-VLAN Routing
    • VLAN 86 associated with SVI86 10.2.0.1/24 for inter-VLAN Routing

Topology showing Cisco Switches and Access-connected Hosts

Tag vs Untag / Cisco vs Others

First, some terminology that always caught me out as a Cisco Guy - the difference between Tag and Untag, and Access and Trunk. In my simple Cisco Head, the world is just two things:

  • Access Ports
    • Member of one VLAN
    • 1:1 Port to VLAN mapping
  • Trunk Ports
    • Member of as many VLANs as you like
    • 1:many Port to VLAN mapping

I'd then learnt about a "Native" VLAN, but it was mostly a superficial reference to something that didn't make sense, so I moved on with my life. Then I encountered the following that change my views, and made it all "click" for me:

  • Tag Ports
  • Untag Ports
  • Voice VLANs

Voice VLANs never sat right with me, as they break the Simple Cisco Paradigm of "A (Access) or B (Trunk); there is no C (Other)" - as Cisco sell a Voice VLAN as if it's an Access Port:

interface Gig1/0/2
 description "Connection to an Overpriced Cisco Phone"
 switchport mode access
 switchport access vlan 100
 switchport voice vlan 101
end

Yet I know that's two VLANs down one port, so it's a friggin' 802.1q Trunk dammit - nothing access'y about that at all, but don't let that stop you using the "access" keyword! My mental model, you done broke my mental model argh!

Access is Trunk, Trunk is Voice, Voice is Access?

After a few coffees whiskeys though, it clicks - Cisco are once again, lying; the true answer is there are three states a port can be in:

  • Untagged for one VLAN (or "Access" as Cisco call it)

Except when it's voice, because who loves consistency? Not Cisco BU's clearly - "Oh hi, welcome to Model1! Yes, it's 'show mac-address-table' here... Oh hi, welcome to Model2! Yes, it 'show mac address-table' here. Mess up your scripting did that? LOLZ, sucks to be you, fool!"

  • Tagged for many VLANs (or "Trunk" as Cisco call it)
  • Tagged for many VLANs and Untagged for one special-little-snowflake VLAN (or "Trunk" with a "Native VLAN")

The third is what we're going to use, where one VLAN (of your choosing) doesn't get a VLAN tag appended to it when thrown onto the wire; meaning the reverse is true. What the above hints at is a Switch-centric view of the world, but if we take that from a Server-centric view of the world:

Port Type Switch-centric View
(Port Egress)
Server-centric View
(Port Ingress)
Action
Access No VLAN tag present No VLAN tag present Switch appends VLAN on ingress/Server no clue it's in a VLAN
Trunk VLAN tag present VLAN tag present Switch sees VLAN on ingress/Server knows it's in a VLAN

...which is why the "Tagged" and "Untagged" definitions make infinitely more sense than normal Cisco nomenclature "Access" versus "Trunk", as the Cisco speak hides the fact that a port can have three states (Access; Trunk with no Native & Trunk with Native), whereas a given logical Server port's membership is one of two states (Explicit Member - Tagged, Server knows the VLAN exists or Silent Member - Untagged, Server has no clue it is within a VLAN [Native]).

What's a BU? A Business Unit, obviously dummy - everyone in Cisco knows that, so we don't bother explaining that acronym, duh. What, you mean you didn't know that? Best hand that CCNP back, boyzone...

The solution

So back to the problem at hand; I want the Server's Mgmt IP to remain at 10.0.0.9/24, but not be Tagged (not know it's within a VLAN), and I want to hitlessly (or as near as possible) change the configuration on the fly from Access to Trunk. Here's my way forward:

  • Current configuration
    • interface GigabitEthernet1/0/9
      description "To SERVER Eth0"
      switchport mode access
      switchport access vlan 99
      no shutdown
      end
  • Future configuration
    • interface GigabitEthernet1/0/9
      description "To SERVER Eth0"
      switchport mode trunk
      switchport trunk encapsulation dot1q
      switchport trunk native vlan 99
      no shutdown
      end

Which effectively tells the ingress-behaviour (Server-to-Switch) of the Trunk to bang all untagged Server traffic (in our case, that's to or from Mgmt virtual/logical Interface, 10.0.0.9/24) into VLAN99 - without the Server ever knowing it is a member of VLAN99, but while allowing the Trunk to pass explicitly-tagged Virtual Machine, or other, traffic (such as VLAN86).

This is incredibly useful when working with inline iDRAC/BMC configurations, as you can do this without burning another cabling run/Server port (or in my case with the Cisco WAVE594, because it doesn't have any other option/no in-built Dedicated iDRAC Mgmt port):

  • Operating System/ISO Eth0 Port
    • Untag (hits the "Native" VLAN)
  • iDRAC/BMC Eth0 Port
    • Tag (hits the explicit VLAN)

Or the other way around - whatever works for you. Which is exactly the same that Cisco ended up bodging their "Voice VLAN" configuration to do, only it's a Trunk not an Access... let's not go there again.

Get bodging!