When BGP AS-Override goes the wrong way

Sunday, 13 Jan 2019

BGP AS-Override

Much like my post on when BGP SoO goes the wrong way, I seem to have a problem with directionality of commands on Cisco IOS - this time, with BGP AS-Override. I came across this in an Enterprise Network (the same kind where we say "MPLS" but actually mean "IP VPN we buy from someone else"), where the ISP we used had an offering they called "Shared Access" - which basically means they'll let you hook an Access Circuit into someone else's IP VPN/VRF with them, as long as you, the ISP and the "VRF Owning Company" co-sign an agreement saying it's allowed.

Why might you want to do this? Think along the lines of Extranets, and furthering the idea that "Everything is just a Line Card" across Company boundaries; particularly useful if you work in the Large Enterprise and Public Sector space, as here there are often strange agreements where multiple Managed Service Providers (MSPs), Systems Integrators (SIs) and sometimes even Service Providers (SPs) (reluctantly) come together to offer a common "Service" back to either the General Public, or perhaps some large Industry Sector. Regardless of the why, the problem is normally the same old BGP-over-VRF limitation - if you use the same ISP for multiple IP VPNs/VRFs, and have end-to-end BGP reachability, BGP doesn't know to turn off it's split-horizon-based-on-ASN functionality; because it just sees the same ASN twice in the AS-PATH, rather than "knowing" that the AS-PATH consists of two differing VRFs/Routing Domains.

The Scenario Topology

BGP AS-Override Topology

This is the Scenario Network Topology, showing:

  • 2x My Network MPLS CE Network Customer Edge (CE) Router
  • 4x MPLS SP Network Provider Edge (PE) Routers
    • 2x Connected to My Company Network IP VPN/VRF VPNN123456
    • 2x Connected to Other Company Network IP VPN/VRF VPNN654321
  • 2x eBGP Peering from My Company Network < -> SP MPLS PE Router, connected to My Company IP VPN/VRF VPNN123456
  • 2x eBGP Peering from Other Company Network <-> SP MPLS PE Router, connected to Other Company IP VPN/VRF VPNN654321
    • 1x "Foreign Network" CE Router @ My Company Data Centre
  • AS-Override applied on My Network MPLS CE Network (CE) Router (towards My Company IP VPN/VRF VPNN123456)
    • Note that I am "piggy-in-the-middle"

Some notes on SP Terminology

As some of this is specific to using a Third Party SP's MPLS Network, through a "wires-only" IP VPN offering - here's a quick primer on some terminology I'm using, as this will differ between varying SP's:

  • "wires-only" - Means the SP drops a NTE/NTU in My Company's Premises, to which I attach my self-managed CE Router
    • The SP does not manage any of CE Router; I eBGP Peer direct from a Private ASN to the SP's Public ASN (or whatever they use)
    • I'm told this model is more popular in the USA than Europe (but I'm in the UK, so there are exceptions to the rule...)
  • VPNNxxxxxx - The SP-allocated IP VPN/VRF Identifier, so that they can differentiate between their various Customers (they could name their VRF instances by Company Name, but what happens when the Company changes name, or two different Companies have the same/similar names...)
  • ASN Numbers - Those on the left-hand side are My Network ones; those on the right-hand side are "Foreign" (Other Company Network) ones
    • Just like between IPsec Encryption Domains, it's a good idea to make sure these don't conflict (tricky when everyone is using the same Private BGP ASN Range)
    • It is the same Core ASN/PE-CE Peering ASN that the SP uses for all Customers
  • CE Devices - I am the Customer (or one of two), and not the SP here; I have no visibility or access to any of the PE's in this topology
    • This is a very different slant to most write-ups and blog posts I've read on the matter; everyone seems to work for an SP bar me!
  • AS-Override - This is applied at My Company end only; the "Foreign" Company are not performing AS-Override
    • So the AS-PATH they "advertise" to me contains the raw SP ASN for their own CE-PE Peering,Their CE1 <-> PE2 and Their CE15 <-> PE66

What I thought would happen

Caveat - apparently, Cisco IOS doesn't let you use AS-Override in the Global Routing Table (GRT, y'know, the one that's not in an "address-family" command); but it sometimes does (worked on my ASR1K's), and that's not the point of this post.

Focussing on My Company Data Centre - and ignoring the "Southbound" eBGP Peering from this DC into MPLS IP VPN/VRF VPNN123456 - here's an example of the Prefix I'm looking at, received from "Foreign" Company:

CE1#172.31.0.0/24 via , AS-PATH: 65007 1234 64999

Now, if we look at the "Southbound" eBGP Peering towards My Company IP VPN/VRF VPNN123456, I want to re-advertise "Foreign" Company Prefix 172.31.0.0/24 onward, via VPNN123456, into My Company Other Campus DEF (bottom-right). Given the "as-override" command is applied towards the SP's PE Router, I expected the "find-and-replace" operation to work in a similar (outbound) manner. That is, for this configuration on my CE1 Router @ My Company Network, Data Centre ABC:

CE1#sh run | sec router bgp
router bgp 65432
 neighbor 192.168.0.1 remote-as 1234
 neighbor 192.168.0.1 as-override

I thought my CE1 Router would therefore rewrite it's own AS65432 (Local ASN, CE1 Router) with the SP's AS1234 (Foreign ASN, CE1 Router perspective) - so an AS-PATH that actually looks like this, to the downstream PE1 (and any other Routers) on VPNN123456:

PE1(VRF "VPNN123456") or CE99#172.31.0.0/24 via 192.168.0.2, AS-PATH: 65432 65439 65007 1234 64999

...but that's not how AS-Override works here.

What actually happens

It transpires the "find-and-replace" behaviour isn't working with the "find" parameter I think it is. If I use some colouring here, this will be easier to see. If we show the entire AS-PATH (including the Routers at either end, which you normally wouldn't see in BGP outputs), here's what you've got for Prefix 172.31.0.0/24 going all the way to CE1 @ My Company Data Centre ABC:

  • 64999 1234 65007 65439 65432 1234 65430

I appreciate this runs inverse/reverse to the AS-PATH that CE1 actually sees; but bear with my incorrect directional thinking here. So the part I'm focusing in on is between CE1 <-> PE1, or this part:

  • ...65432 1234...

At this point, in my head, I'm thinking "The neighbour command is applied outbound to the 192.168.0.1 SP PE1 peering, so it must use this relationship in the find-replace activity", so I'm thinking, after the AS-Override rewrite, it looks like this:

  • 64999 1234 65007 65439 65432 65432 65430

Here's the kicker

The reality is that AS-Override doesn't care about eBGP Peering relationships; it acts as a dumb "find-replace" algorithm, but it uses the eBGP Peering configuration to get it's "find" parameter, by looking at the ASN value after the "remote-as" command, so here for CE1:

router bgp 65432
 neighbor 192.168.0.1 remote-as 1234

What it then "dumbly" does is looks at the entire AS-PATH it already has, and simply replaces the value with it's , before "advertising" this out, so for CE1 it would do this instead:

  • 64999 65432 65007 65439 65432 65432 65430

Which completely broke my thinking, as I hadn't appreciated that a downstream Router could overwrite an AS-PATH entry that happened much earlier-on in the formation of the AS-PATH (i.e. for a Peering Association it wasn't involved in, so how could it dare overwrite anything to do with that?).

So what next

For the example given, we actually ended up moving all this entirely, such that we had a PE-like Router where we could control ingress/egress into both IP VPNs (and AS-Override in both directions, between both IP VPNs) - but this isn't always possible. Technologically, it's easy to look dismissively at the Scenario Topology; but if you step back a bit, you appreciate our hand was forced. As I described earlier, this is a politically complex setup, with various MSPs and SIs - and as you can see, although CE15 sits in "our" DC (actually an MSP, but anyway...), it's actually a CE Router of our "Foreign" (think Extranet) Company's IP VPN (VPNN654321); which they just so happen to have with the same SP that we have Our Company IP VPN (VPNN123456) with.

Sure, this isn't a great place to be - but (in that time-honoured phrase), "It is, what it is"; looking longingly at CCNP and CCIE Greenfield Exam Topologies isn't making this self-rectify. We were fortunate because we had the capability to entirely redesign this (something for another blog post), but if we hadn't, there's a whole manner of constraints here causing pain, such as:

  • SP won't let us reconfigure their PEs on either IP VPN/VRF (so no quick-win "Bang AS-Override on PE66 and PE1" for you)
  • Commercials mean we can't collapse-out the CE15 <-> PE66 arrangement
  • CE1 / Data Centre ABC doesn't just exist for this flow (so no quick-win "Bang the VPNN123456 eBGP Peering into a VRF Lite instance, instead of the GRT")

What's the point then?

Ignoring the goal of getting this working, this was a useful real-world exercise, as it taught me:

  • BGP AS-Override is dumb, and will quite happily assume the to Peering is the only one that contains the , which couldn't possibly already be in the AS-PATH
  • BGP is not VRF-aware; it's rules of split-horizon are there to annoy me and rob me of sleep
  • Stop reading "neighbor" commands and assuming they imply the directionality of the thing they are doing
  • Googling for issues like this throws up limited results, because everyone else seems to be able to access the SP PE Routers
  • I need to flip-round the way I think about AS-PATH as "Destination-to-Source" rather than "Source-to-Destination"