Installing ESXi onto a Cisco WAVE 594 WAN Optimisation Appliance

Saturday, 09 Feb 2019

Installing ESXi on a Cisco Wide Area Virtualisation Engine Appliance

Why would you want to do this? No real reason, but we've been decommissioning some hardware, and it's pretty clear that Cisco WAVE Appliances are just a Compute Server, with some stuff like VGA Ports removed. Originally these Appliances were designed for CDN-like WanOp purposes, so they have extras like Cavium Crypto/Offload Cards onboard, and some SATA storage; so I thought I'd have a go at loading VMware ESXi Hypervisor onto them.

The box I have is a Cisco WAVE 594, with specifications as follows:

  • Processor - Intel Xeon X3430 @ 2.4 GHz
  • Memory - 8 GB DDR3 RAM
  • Storage - 2x Hot-pluggable 500 GB SATA 7.2k Hard Drives
  • Storage - 1x Internal 4 GB USB Flash Disk
  • Network* - 2x Intel 82574L 1 GbE Network Ports

* = Not detected by ESXi, even though they're on the VMware Hardware Compatibility List (HCL)

What have we got here, Captain?

Here's a few photos of what we've got to work with:

undefined

undefined

Inside, you'll notice an internal USB port, plugged into a 4 GB USB Flash Drive (by some company I've never heard of); outside, you'll notice I've plugged in a USB 3 Ethernet Adapter (that uses the Realtek RTL8152 Chipset).

Port-wise, all we have to play with is:

  • 1x External USB Port
  • 1x micro-USB Console Port
  • 1x RJ45 Console Port (Serial Port)
  • 2x RJ45 1 Gbps Network Ports

What you don't have is a VGA Port, or spare USB Port to plug a Keyboard into (as well as a USB Flash Disk for the ESXi HV/OS Volume), which will make it pretty hard to process the Next/Next/F11 sequence required to install ESXi.

Time to ask a friend

I was a bit flummoxed at this point, but handily a friend suggested that ESXi doesn't care about hardware changes after the fact - so I could stage all this by pre-installing ESXi onto the internal 4 GB USB Drive. Which is exactly what I did, so to do this, I:

  1. Created a VMware Workstation (I know, it's a work machine - I'm normally a VirtualBox man) Virtual Machine called "USB Test" on my Laptop
    1. Allocate this at least 2x vCPUs with 2x Cores
    2. Allocate this at least 4 GB RAM
  2. Followed this guide on How To USB Boot a VM in VMware Workstation 11
  3. Downloaded ESXi 6.5.0 ISO from VMware vSphere Hypervisor (ESXi) 6.5
  4. Inserted the 4 GB USB Drive
  5. Opened Rufus Bootable USB Maker
  6. Flashed VMware-VMvisor-Installer-6.5.0-4564106.x86_64.iso onto my 4 GB USB Drive
  7. Booted my "USB Test" VM, which boots the 4 GB USB Drive
  8. Followed the ESXi installation process and installed ESXi over the 4 GB USB Drive volume
  9. Rebooted the "USB Test" VM, and attached a "Host-only" Network Adapter to it
  10. Waited for ESXi to Boot, and receive a 192.168.85.x Host-only IP Address

Now I've got ESXi built onto the 4 GB USB, I need to tweak a few bits before I plug it into the Cisco WAVE 594. Using the Host-only NIC in VMware Workstation means I can locally navigate to https://192.168.85.x/ui/ on the same Laptop running VMware Workstation to jump onto ESXi vSphere and configure it ("Host-only" means it's a virtual network between just that VM and your Laptop's OS - Windows 7 for me - which sees it as a Virtual NIC).

Making it work without VGA

As well as any other ESXi settings - such as Hostname, vmk0 IP Address, Storage Volumes (although no point doing that until this is plugged into the Cisco WAVE 594 itself) - I'll need to tweak ESXi to output it's boot screen (VMware call this the Direct Console User Interface, or DCUI; I call it the "yellow and black ESXi boot screen", much catchier) somewhere other than VGA, as the WAVE 594 doesn't have a VGA Port.

Doing this is quite easy; what ends up happening is that a VGA-like output (i.e. the VMware DCUI) gets redirected to the Serial port, which in this case is the trusty old blue RJ45 Console port. To do this, follow the instructions on VMware's website Redirect the Direct Console to a Serial Port Using the vSphere Client:

  1. Login to the vSphere HTML Client (i.e. https://192.168.85.x/ui/)

  2. Click the Configuration tab

  3. Click Host, then Advanced Settings

  4. Search for parameter VMkernel.Boot.logPort

    1. Make sure it says default

  5. Search for parameter VMkernel.Boot.gdbPort

    1. Make sure it says default

  6. Search for VMkernel.Boot.tty2Port
    1. Set it to com1
  7. Click OK

Job done, now we can simply insert the USB Drive into the internal USB slot, connect our trusty blue Console Cable and USB Adapter into the Console Port, and set PuTTY or Screen to 115200 Baud rate*, and boot the Cisco WAVE, then wait for the ESXi Boot Messages and DCUI to flow...

undefined

* = If you want to see the WAVE BIOS boot messages, you'll have to set it to 9600 baud first, and then change it to 115200 when you get garbage characters on your screen output.

So close, but yet so far

Remember that asterisk note I wrote before, where VMware lie and say they support the Intel 82574L in their HCL? Well, they don't - and to save you time, they:

  • Don't in ESXi 5.5
  • Don't in ESXi 6.0
  • Don't in ESXi 6.5
  • Don't even when you mess around with custom and obsolete net1000e VIB driver packs

Now what, not much use having an ESXi Node with no Physical Networking on it! This is where the second brainwave clicks in; lets use that USB Ethernet Adapter we've got lying around! Luckily Jose Gomes has had exactly the same idea and created a lovely guide on using a USB Ethernet driver for ESXi 6.5 - so follow that. For me, this looked like:

  1. Download the Driver VIB for the Realtek USB Adapter
  2. Enable SSH Service in ESXi vSphere Web UI (the Service is called "tsm-ssh")
  3. Use FileZilla to login as "root", and copy-paste the VIB to /tmp/
  4. Follow VMware KB Article 2147650 to disable the newer USB Drivers
  5. Install the custom Realtek VIB, from SSH this command should do it:
    1. esxcli software vib install -v /tmp/r8152-2.06.0-4_esxi65.vib
  6. Reboot ESXi

Let's see what we get this time then, when we also plug our cheapo USB 3 Ethernet Adapter in to the front USB port (and ESXi 4 GB USB into the internal USB port):

undefined

Great Success!

There is a caveat here - I find that, on reboots, ESXi DCUI will uncheck the "Use vmnic32 for Management" box, so it won't be contactable from the Network/won't get a DHCP IP until you manually press F2 -> Login to DCUI -> Re-enable it, which isn't much use if it's remote and the power goes.

Apparently there's a fix for that here in Install ESXi on a server/laptop with only USB Ethernet with an aptly-named file called "weasel", but I've had stoat-all success in getting it to work, so it's a limitation I've just lived with.

As a side note, because we didn't run the interactive installer on ESXi while it was connected to the WAVE 594 Hardware, you'll need to manually use the ESXi Datastore -> Storage -> Adapter -> Delete Partition option to wipe the partitions of data on both the 2x 500 GB SATA Disks, and can then set them both up as "New Datastores", so they can be used to hold VMs as VMDK virtual hard drive files.

Here's a handy guide on How To Erase ESXi Disks With ESXi Host Client v3.

Have fun!

Automation - The "Script it" versus "Do it" continuum

Sunday, 03 Feb 2019

The "Script it" versus "Do it" continuum

recent tweet from @nickrusso4258 got me thinking about something I've been trying to express in my professional (don't laugh, people sometimes say I am) life for a while now, that can strike a nerve with the "Automate ALL THE THINGS!" crowd; scripting something (and by extension automating something), isn't always the right answer for an Organisation's use of Time (read: your 9-5 they pay for).

As I appreciate that not everyone is a Coder, DevOps or new-kid (some of us still get paid to be Cisco Mario; not everything is up in Toad Cloud yet...), this concept can apply a little wider than just to Developers, and even probably to the Business-y people all us IT Folk interact with on the daily. Using my finely-honed MS Paint skills (side-note: you've not lived until you've done a Network Diagram in MS Paint), here's a sexy graphical approximation of the theory:

undefined

Making stuff up #1 - Payback sweet spot

What the graph is trying to demonstrate is that the world of repetitive tasks can loosely be split into two partisan camps:

  1. "Script it"
    1. i.e. Put the additional effort (more than to just "bang another <repetitive task> out") in, and automate it/script it/somehow make it easier to perform than just doing the do over-and-over, with two tangible outputs:
      1. Completion of <the task>
      2. Automation of <the task>
  2. "Do it"
    1. i.e. Don't worry about the why, just repeat the manual steps you'd normally do and "bang another <repetitive task> out", with one tangible output:
      1. Completion of <the task>

The obvious sweet spot here is that, for a given number of repetitions of <the task> over time, eventually the additional effort of "scripting it" (the time taken to do the automation, on top of that of just <doing the task>) will eventually pay itself back, as after a given "Payback sweet spot", you've now got time back to do other stuff, which you'd otherwise have spent just doing <the task> again and again.

Alright, I'm buying it #2 - Positive opportunity cost

Or in other words, you're now in "Positive opportunity cost" - that is, <the task> is in someway automated, and you can dedicate your time to the other fifteen-million items on your "To Do" list, instead of this <task>. All is well in the world, you've automated all the things - and bar a little troubleshooting and debugging you unexpectedly have to do (i.e. when you discover your vGhetto VCB Cron Job uses a file that gets overwritten at ESXi System Reboot...), you're actually "earning time" saved through the script parallel-working the task for you.

Bully for you; your life is complete, you've moved all the things to teh Cloudz, and you're about to marry Princess Toadstool, and live in the Kingdom of the Mushroom Cloud forever mor...

Wait a minute, what's this #3 - Negative opportunity cost

But look over there on the left-hand side of this conceptual model; what's that pesky "Negative opportunity cost" all about then? I'm just about to pop the ring on Princess Toadstool, you saying I've got a problem here?

What I'm referring to here is the cold reality of Work; you're ultimately paid to produce output that a Customer wants - whether that's direct tangible stuff ("Hi, make this Network Switch go now please") or otherwise intangible stuff ("Hi, move these Apps to the Cloud? Mkay, you'll need to make a Project Plan for that, I get it...") - it's all output that's working towards a tangible goal.

You know what isn't output working towards a tangible goal? Scripting.

You know what you can't accurately do? Predict the future.

You know what the problem is here? Scripting a <task> that actually needs to be run, in future, with less individual repetitions of duration than just manually repeating the <task> would have taken you (and you get multiple lots of tangible output for that).

Let's give you a worked example; suppose you need to write a script to output all the IP Helper Addresses of a Cisco IOS Script, and (you don't know it yet), but you're not great with Bash Shell script (well, you do know that...), and it'll take end up taking you 16 hours. Sounds great; much easier than ripping through 500 devices and doing that manually, right - that'd take you maybe, I dunno, 5 hours and a bit of hand-cramp?

But what if I said to you that, unbeknown to you, we're about to swap out all that Cisco IOS kit for <SD-WAN Vendor XYZ> kit; where stuff like this (IP DHCP Relay Address) is pushed out in a programmatic, templated fashion anyway. What's happened now? Well, in Business Output terms, you've just wasted the time it would have taken to do it manually (5 hours) subtracted from the time it took you to script it and get the tangible output (16 hours), so you've cost the Business 11 hours of time you could have been doing something else productive.

Which would be 11 hours' worth of "Negative opportunity cost", and seems to be something the Automation Crowd rarely focus on; none of you are Mystic Meg; none of you have Crystal Balls; none of you can predict the future.

Something to dwell on. Meanwhile, I hope Princess Toadstool likes Hula Hoop crisps as engagement rings...

Disappearing DNS entries when your CNAME TTL differs from your PaaS Provider's

Saturday, 26 Jan 2019

The dreaded "Page can't be displayed" error

Most people in the field of IT or Networking will have seen this lovely Internet Explorer error, and immediately recognised their day was about to change course away from the schedule:

undefined

The why can vary massively; for this blog post, we'll look at one case in point - what happens when your DNS Time to Live (TTL) record, on your CNAME, doesn't match-up with your Platform as a Service (PaaS) provider's A Name. But first, a bit of background here - names changed to protect the innocent.

The Scenario with the PaaS Provider

We've got a Web Application that we've decided to farm-out to a PaaS Provider, which used to be on-premises (or "on-prem" for you cool Cloud Kids). It's very important to the Business, but for the purpose of technology employed it's nothing special - think a HTTPS Website, where the PaaS provider does DNS-based "Elastic" (boing!) Load Balancing - also known as GSLB, but the new Cloudy World has to re-invent the terms we're already used to... *grumble* *grumble*

Let's throw in some made-up pseudonyms to anonymise this a bit, and add some context:

  • My Employer (Enterprise Business, or "the Business")
    • Name - MyCompany Ltd
    • Main External URL/Domain - mycompany.com
    • Main Internal URL/Domain - prod.mycompany.uk
  • PaaS Provider
    • Name - PaaS Co. Ltd
    • Main PaaS URL/Domain - paasco.com
    • Cloud Environment Name - PaaSCloud
    • Use Load Balancers from - BigAssLoadBalancers (Vendor)

Because the Business (rightly) thinks that a new PaaS URL of https://bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com might not be as easy-to-remember as the old on-prem (yes, I'm trying to bait you with that phrase) one of https://appname.prod.mycompany.uk; and because we've got no choice about the PaaS URL, we've taken the decision to make a new sub-domain of *.paascloud.mycompany.com. While we're there, we think we'll sort out the outmoded concept of Internal (prod.mycompany.uk) vs External (mycompany.com) URLs, because this is all hosted off-prem anyway; so it's technically no longer part of our "internal" Domain.

Regardless of PasS Co, MyCompany uses Internal DNS that sits on Active Directory Domain Controllers; for the sake of ease, I'll call this "Internal DNS". MyCompany outsources it's Internet DMZ Data Centres to another MSP; we'll call them MSPCo. MSPCo's only relevance here is that they run our External DNS/Domain (from Internet-facing ns1.mspco.com DNS Servers), whereas we run our Internal DNS/Domain AD-DC DNS Servers. Or, in short:

  • MyCompany
    • Run Internal DNS Servers (i.e. pdc1.mycompany.uk) that are authoritative (but not advertised to Internet) for *.mycompany.uk
  • MSPCo
    • Run External DNS Servers (i.e. ns1.mspco.com) that are authoritative for *.mycompany.com

To give us an easy-to-remember FQDN for the AppName Web Application, we've setup the following which means it will be https://appname.paascloud.mycompany.com:

  • Sub-Domain Space (for all Apps on PaaS Co)
    • *.paascloud.my.company.com
  • Current PaaS Web App (one of the Apps on PaaS Co)
    • appname.paascloud.mycompany.com
  • Internal DNS (MyCompany, i.e. pdc1.mycompany.uk)
    • Authoritatively Resolve requests for *.prod.mycompany.uk
    • Conditional Forward requests for *.paascloud.mycompany.com to ns1.mspco.com
  • External DNS (MSPCo, i.e. ns1.mspco.com)
    • Authoritatively Resolve requests for *.paascloud.mycompany.com

The Problem with DNS Recursion

All that we've achieved above is a series of "forwarders", such that, for the worst case (Internal Client), they'll do this:

  1. Lookup appname.paascloud.mycompany.com against Internal AD-DC DNS (i.e. pdc1.mycompany.uk)
  2. Internal AD-DC DNS Condtional Forwards this to MSPCo External DNS (i.e. ns1.mspco.com)
  3. MSPCo External DNS (i.e. ns1.mspco.com) resolves this to a CNAME of bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com
    1. MSPCo External DNS (i.e. ns1.mspco.com) then Recursively Resolves this against it's upstream DNS Provider (let's say dns1.bigisp.com)...
    2. ...Which queries the Root DNS Servers (i.e. a.root-servers.net), which tell it to ask the PaaS Co Authoritative DNS Servers (i.e. ns1.paasco.com) for the A Name associated with bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com...
    3. ...Which comes back from PaaS Co DNS Servers (i.e. ns1.paasco.com) as Public IP Address 203.0.113.234 (not real, check out RFC 5737 - IPv4 Address Blocks Reserved for Documentation)
  4. Internal AD-DC DNS replies back to the Internal Client, for a request of appname.paascloud.mycompany.com, with:
    1. (The CNAME) bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com
    2. (The A Name) 203.0.113.234

Phew, there's a lot of steps eh? But at least we're out of the woods now, the client has the IPv4 Address it needs, so what's the "Page not Displayed" thing all about?

Pesky DNS TTLs

Here's the bit where the hierarchy of recursion in DNS starts to 1-up you, and the bad day kicks in - perhaps as known all-too-well by these graffiti artists:

undefined

Firstly, a caveat - all of the below may be different for your scenario, depending on how MSPCo DNS Recursion is/isn't setup.

If we make use of the lovely nslookup tool on Windows, here's what we can deduce for our good response (i.e. when the page actually displays, rather than the dreaded IE "Page not Displayed" error). Remember that pdc1.mycompany.uk is my Internal DNS Server (for this example anyway, in reality AD has a Parent/Child Regional Domain Controller hierarchy, so each Client uses a different AD-DC):

C:\Users\NervousAdmin>nslookup
> set debug
> server pdc1.mycompany.uk
<snip - goes off and resolves pdc1.mycompany.uk to IP 10.0.1.99>
> appname.paascloud.mycompany.com.
Server: pdc1.mycompany.uk
Address: 10.0.1.99

------------
Got answer:
 HEADER:
 opcode = QUERY, id = 24, rcode = NOERROR
 header flags: response, want recursion, recursion avail.
 questions = 1, answers = 2, authority records = 0, additional = 0

 QUESTIONS:
 appname.paascloud.mycompany.com, type = A, class = IN
 ANSWERS:
 -> appname.paascloud.mycompany.com
 canonical name = bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com
 ttl = 7200 (2 hours)
 -> bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com 
 internet address = 203.0.113.234
 ttl = 60 (1 min)
<snip>
------------
Name: bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com
Address: 203.0.113.234
Aliases: appname.paascloud.mycompany.com

Given the response above is good (when everything is working), what does the above tell you? If we focus on the TTL sections, you'll see Windows has cached two responses here:

  1. appname.paascloud.mycompany.com -[CNAME]-> bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com, cached for 7200 seconds (or 2 hours)
  2. bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com -[A Name]-> 203.0.113.234, cached for 60 seconds (1 min)

So what happens in 60 seconds, when that A Name expires then? Let's find out - the ">" shows you are within nslookup, so just hit the Up key, and Enter to re-lookup "appname.paascloud.mycompany.com." (as per prior posts, the appended dot means "just this exact FQDN, and no additional DNS Suffixes"), eventually you'll notice the bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com section goes to a TTL of 0:

> appname.paascloud.mycompany.com.
Server:  pdc1.mycompany.uk
Address:  10.0.1.99
<snip - only interested in the CNAME ttl section>
ANSWERS:
<snip>
-> bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com 
 internet address = 203.0.113.234
 ttl = 0

But you'll notice your browser access to https://appname.paascloud.mycompany.com works fine during these tests; until you do the nslookup again, after the "ttl = 0" response. Now, there be dragons.

Uh-oh, where's my response gone?

When you refresh again, your heart will drop, your bum will tighten, your browser access to https://appname.paascloud.mycompany.com will stop working, and you'll see this:

C:\Users\NervousAdmin>nslookup
> set debug
> server pdc1.mycompany.uk
<snip - goes off and resolves pdc1.mycompany.uk to IP 10.0.1.99>
> appname.paascloud.mycompany.com.
Server:  pdc1.mycompany.uk
Address:  10.0.1.99

------------
Got answer:
    HEADER:
        opcode = QUERY, id = 28, rcode = NOERROR
        header flags:  response, want recursion, recursion avail.
        questions = 1,  answers = 1,  authority records = 0,  additional = 0

    QUESTIONS:
        appname.paascloud.mycompany.com, type = A, class = IN
    ANSWERS:
    ->  appname.paascloud.mycompany.com
        canonical name = bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com
        ttl = 6926 (1 hour 55 mins 26 secs)

<snip>
------------
Name:    appname.paascloud.mycompany.com

Which will give you your dreaded "Page not Displayed friend", for exactly another 1 hour, 55 minutes and 26 seconds.

And how do I know that? Because that's what the TTL says that CNAME entry will stay in your cache for - regardless of the fact your Windows Client hasn't had a recursive response of the actual IP Address that it ultimately resolves to (203.0.113.234).

So what's the fix? Firstly, lets touch on DNS TTL. This isn't much different to IPv4 TTL; it just means that, once the TTL hits 0, the entry will be purged from your local DNS Cache. What happens next is the crucial part, dictated by the "DNS Response Hierarchy" your response had; if it's just a straight single-level hierarchy (i.e. domain.com -> 203.0.113.1), then your Client will go off and re-request the DNS Request to lookup domain.com to an IP Address.

But our case is different, and not in a good way - our "DNS Response Hierarchy" looks like this:

  1. (Parent) Fetch appname.paascloud.mycompany.com
    1. (Child) If you got here, now fetch bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com

But our TTL's look like this:

  1. (Parent) appname.paascloud.mycompany.com = TTL <bigger than "Child">
    1. (Child) bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com = TTL <smaller than "Parent">

That's not what we want at all; given these are two differing DNS Administrative Domains (owned and operated by two differing Companies - MSPCo for appname.paascloud.mycompany.com and PaaS Co for bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com), we (MyCompany) don't have any direct control over these. Regardless though, we need them to flip-it-around so that this happens:

  1. (Parent) appname.paascloud.mycompany.com = TTL <smaller (or same) than "Child">
    1. (Child) bigassloadbalancers-appnameprodmycompany-paascloud.paasco.com = TTL <bigger than "Parent">

This way, when the "Parent" (initial, or root, or "actual FQDN I wanted the IP for") TTL expires, it will remove the "Child" (CNAME) entry with it; which means the DNS Lookup process will re-occur, and we'll happily get an IPv4 Address back. Technically simple, but you try and explain that to MSPCo and PaaS Co, and you'll find your "shouty voice TTL" quickly gets towards that precious 0...

Remotely changing the Management SVI on a Cisco 3524XL

Friday, 25 Jan 2019

A Cisco 35-what-what now?

You probably haven't heard of a Cisco 3524XL. You're possibly sat reading this thinking: "I've heard of the Nexus 3K, sure, but WTF is a 3520-Seires, am I behind already?". The answer is no, you aren't (or yes, you are if you're unfortunate enough to know what a C3524XL is) - but don't take my word for it, let's ask what Danny Dyer thinks:

undefined

Why are you blogging about a Cisco Switch that went EoL over a decade ago?

Indeed, the Cisco Catalyst 3524XL went End of Life in 2002 - far before I even started working in the field of Networking. So why am I talking about it here? Well, a few reasons:

  1. @DarrenFullwel challenged me to on Twitter
  2. It's got lessons to teach us all
  3. History needs to remind us that banging on the suffix "XL" should only be confined to fast food and t-shirts

Let's focus on what it can teach us - first, a little primer on my chief bugbear with it as a "capable Layer 3 Campus Access Switch".

The C3524XL only supports one SVI

That's not too bad you might think; you probably only want to give it a Management IP Address to the SVI, and let something more capable handle inter-VLAN Routing. But what happens when you want to do something like this:

  1. Remotely re-IP Address the Management IP (and the boss won't let you hire a van and take the day to drive to the arse-end of nowhere)
  2. Remotely change the configuration your colleague left with it using VLAN1 as the SVI, but everywhere else uses VLAN55 for Switch Management (and the boss still won't let you hire that van)

Any ideas on how you're going to sort that out, remotely? Let me introduce you to the age-old Network Engineering practice of...

Squeaky bum time

undefined

There's nothing for it, soldier; we've got two basic choices to do this remotely, and we're gonna need a stock of toilet roll for both:

  1. Use a SNMP-based config upload tool like Network Billy (coincidentally the finest thing to have come out of a GeoCities website)
  2. Use a TFTP-based config upload tool (like TFTPd32)
  3. Keep hassling the boss for that van

I went for option two, TFTP-based; but the basic concepts are the same. Firstly, we're going to double-check what we want to achieve; for my scenario, that's two things:

  1. Disable VLAN1
  2. Migrate the Management IP to VLAN55 (172.31.0.0/24)
    1. I'll also have to change this upstream, so that my L3 Default Gateway Switch/Router moves 172.31.0.0/24 from VLAN1 to VLAN55, or have both co-exist for a while and VRF Lite one VLAN off from the other; but that's for another blog post

To do this interactively, I'd want to do something like the following:

conf t
int vlan1
 no ip address
 no desc
 shut
vlan 55
 name Mgmt_VLAN
int vlan55
 desc Management VLAN
 ip address 172.31.0.99 255.255.255.0
 no shutdown
ip default-gateway 172.31.0.1
end
wr mem

But we don't have that luxury, so we'll go for a three-step approach.

Step 1 - The interactive bit

We need to setup the VLAN (just at Layer 2) ready to go; as we're talking about an archaic C3524XL, depending on the age of IOS on the Switch, that's either going to be the "new Cisco way" (as above), or if you're as unlucky as Dyer thinks, the old VLAN Database method, like this:

C3524XL#vlan database
vlan 55
exit

Regardless of which, we'll then check we've got the VLAN ready to go, and if necessary, add it to any 802.1q Trunk interfaces up to the Core (L3) Switch:

C3524XL#sh vlan id 55
C3524XL#sh int trunk | inc Span|Port|55

Now onward to the offline part.

Step 2 - The offline bit

Firstly, we need to grab the config file off the C3524XL. If you've got TFTPd32 running on your PC (which needs to be accessible from the existing C3524XL VLAN1 SVI IP Address, say your PC is 10.0.0.99), this is just a matter of turning TFTPd32 on, configuring it to a directory and ensuring Winblows Firewall isn't blocking inbound TFTP (UDP/69). Then login to your C3524XL, and do something like this to copy the config from the Switch to your PC:

C3524XL#copy run tftp://10.0.0.99/c3524xl-confg
yes

Now you have the file locally, we'll be editing it in a text editor to make the changes above, and turn it into the startup-config (for the sake of space, I'm only showing the changed lines; the rest of the config needs to be there, you are only Find-Replacing these sections):

<snip - rest of config removed, but would be there>
hostname C3524XL
<snip - rest of config removed, but would be there>
int vlan1
 no ip address
 no desc
 shut
int vlan55
 desc Management VLAN
 ip address 172.31.0.99 255.255.255.0
 no shutdown
<snip - rest of config removed, but would be there>
ip default-gateway 172.31.0.1
<snip - rest of config removed, but would be there>

A few handy hints here:

  • Make sure all your interconnect, Trunks and Management SVI VLAN55 are set to "no shutdown"
  • Triple-check that in your scenario it is actually VLAN 55 for Management; the IP Address is correct and doesn't conflict & VLAN55 exists and would be allowed on the Trunk

Nothing left now but to execute our actions and make rocket go now!

Step 3 - The bit you make a calming brew beforehand for

Now it's crunch time. You've obviously got an RFC Change Request that's approved to do this (because you wouldn't "Lab on Live", would you?), so what's to fear, eh?

Firstly, we upload the amended config file, straight into startup-config:

C3524XL#copy tftp://10.0.0.99/c3524xl-startup.txt startup-config

Then we get paranoid and double-check it copied everything correctly, that we're definitely Trunking that VLAN55 and we've set the Management VLAN 55 to "no shut":

C3524XL#sh start
C3524XL#sh vlan id 55
C3524XL#sh int trunk | inc Span|Port|55

And finally we sup-up that brew, clench the derriere, and invoke the outage-causing Management IP switchover:

C3524XL#reload
yes

Then we wait, and nervously set our local PC Command Prompt "ping-t" going, waiting for it to pop back up with the new Management IP address:

C:\Users\NervousAdmin>ping -t 172.31.0.99

Pinging 172.31.0.99 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
<2-3 nervous minutes later>
Reply for 172.31.0.99: bytes=32 time=13ms TTL=64
Reply for 172.31.0.99: bytes=32 time=13ms TTL=64
[CTRL+C]

Wrapping it up

And there we go; remotely changing the Management VLAN and IP Address of a Switch that's older than time - and hopefully a useful tip if you have a similar single-SVI-only piece of sh... kit. Enjoy!

Newer posts → Home ← Older posts