Terraforming an F5 Cluster into Azure with pesky DO

Thursday, 23 Mar 2023

Struggling to make an F5 BIG-IP Virtual Edition (VE) just cluster up already and stop giving errors like these?

Error 422: Invalid IP Address
Failed to send declaration
Error 500 invalid config - rolled back

Then you're in luck, because I too have been there, gotten the t-shirt and will one day wreak revenge upon F5 use other Load Balancers instead out of spite. But first, we'll need some primers on the over-convoluted ecosystem that F5 use to orchestrate provisioning of their BIG-IP via IaC techniques - or as they brand it, the F5 BIG-IP Automation Toolchain. This mainly consists of the following "Extensions", that are in effect F5's equivalent of an apt-get/yum package on other Linux distributions:

  • Declarative Onboarding (DO)
  • Cloud Forwarding Engine (CFE)
  • Application Services 3 (AS3)
  • Cloud-init
    • This isn't strictly speaking an F5 thing, but you'll need to wrangle with it to get DO and CFE to do their thing on boot of the F5 BIG-IP VE.

The BIG-IP Azure Terraform Module

Handily (or so you'll initially think), F5 supply a Terraform Module to "rapidly" get you up and running with a single or clustered F5 BIG-IP VE node(s) in Azure, in the Terraform Registry as F5Networks/bigip-module/azure. For those of you not familiar with a Terraform Module, it's just a collection of Terraform Resources with an opinionated setup (i.e. the F5 Module deploys Azure VMs that end "-f5vm01" in their Hostname), but which can accept some configurable options - you can quickly see how the BIG-IP Module works by scrutinising main.tf and working through how it passes-in variables inside from the module "bigip" input calls outside.

The savvy amongst you will be drawn in by the custom_user_data input - essentially this is used by the BIG-IP Module to invoke a one-time Linux startup/bootstrap script (using Cloud-init under the hood) to pass-through the DO, CFE and AS3 declarations into the F5 in a rendered file which is part Bash Script, part YAML and all pain. It's probably best to start at this Cloud-init Bash Script, as that will help you understand some more relationships between variables passed-through from the tfvars file - let's take the suggested custom-onboard-big.tmpl and break down what it does. First, here's the code - basically one big Bash script:

#!/bin/bash -x

# NOTE: Startup Script is run once / initialization only (Cloud-Init behavior vs. typical re-entrant for Azure Custom Script Extension )
# For 15.1+ and above, Cloud-Init will run the script directly and can remove Azure Custom Script Extension


mkdir -p  /var/log/cloud /config/cloud /var/config/rest/downloads

mkdir -p /config/cloud

LOG_FILE=/var/log/cloud/startup-script.log
[[ ! -f $LOG_FILE ]] && touch $LOG_FILE || { echo "Run Only Once. Exiting"; exit; }
npipe=/tmp/$$.tmp
trap "rm -f $npipe" EXIT
mknod $npipe p
tee <$npipe -a $LOG_FILE /dev/ttyS0 &
exec 1>&-
exec 1>$npipe
exec 2>&1

# Run Immediately Before MCPD
/usr/bin/setdb provision.extramb 1000
/usr/bin/setdb restjavad.useextramb true

curl -o /config/cloud/do_w_admin.json -s --fail --retry 60 -m 10 -L https://raw.githubusercontent.com/F5Networks/terraform-azure-bigip-module/main/config/onboard_do.json


### write_files:
# Download or Render BIG-IP Runtime Init Config

cat << 'EOF' > /config/cloud/runtime-init-conf.yaml
---
runtime_parameters:
  - name: USER_NAME
    type: static
    value: ${bigip_username}
  - name: HOST_NAME
    type: metadata
    metadataProvider:
      environment: azure
      type: compute
      field: name
  - name: SSH_KEYS
    type: static
    value: "${ssh_keypair}"
EOF

if ${az_keyvault_authentication}
then
   cat << 'EOF' >> /config/cloud/runtime-init-conf.yaml
  - name: ADMIN_PASS
    type: secret
    secretProvider:
      environment: azure
      type: KeyVault
      vaultUrl: ${vault_url}
      secretId: ${secret_id}
pre_onboard_enabled: []
EOF
else

   cat << 'EOF' >> /config/cloud/runtime-init-conf.yaml
  - name: ADMIN_PASS
    type: static
    value: ${bigip_password}
pre_onboard_enabled: []
EOF
fi

cat /config/cloud/runtime-init-conf.yaml > /config/cloud/runtime-init-conf-backup.yaml

cat << 'EOF' >> /config/cloud/runtime-init-conf.yaml
extension_packages:
  install_operations:
    - extensionType: do
      extensionVersion: ${DO_VER}
      extensionUrl: ${DO_URL}
    - extensionType: as3
      extensionVersion: ${AS3_VER}
      extensionUrl: ${AS3_URL}
    - extensionType: ts
      extensionVersion: ${TS_VER}
      extensionUrl: ${TS_URL}
    - extensionType: cf
      extensionVersion: ${CFE_VER}
      extensionUrl: ${CFE_URL}
    - extensionType: fast
      extensionVersion: ${FAST_VER}
      extensionUrl: ${FAST_URL}
extension_services:
  service_operations:
    - extensionType: do
      type: inline
      value:
        schemaVersion: 1.0.0
        class: Device
        async: true
        Common:
          class: Tenant
          hostname: '{{{HOST_NAME}}}.com'
          myNtp:
            class: NTP
            servers:
              - 0.pool.ntp.org
            timezone: UTC
          myDns:
            class: DNS
            nameServers:
              - 168.63.129.16
          admin:
            class: User
            partitionAccess:
              all-partitions:
                role: admin
            password: '{{{ADMIN_PASS}}}'
            shell: bash
            keys:
              - '{{{SSH_KEYS}}}'
            userType: regular
          '{{{USER_NAME}}}':
            class: User
            partitionAccess:
              all-partitions:
                role: admin
            password: '{{{ADMIN_PASS}}}'
            shell: bash
            keys:
              - '{{{SSH_KEYS}}}'
            userType: regular
post_onboard_enabled: []
EOF

cat << 'EOF' >> /config/cloud/runtime-init-conf-backup.yaml
extension_services:
  service_operations:
    - extensionType: do
      type: inline
      value:
        schemaVersion: 1.0.0
        class: Device
        async: true
        Common:
          class: Tenant
          hostname: '{{{HOST_NAME}}}.com'
          myNtp:
            class: NTP
            servers:
              - 0.pool.ntp.org
            timezone: UTC
          myDns:
            class: DNS
            nameServers:
              - 168.63.129.16
          admin:
            class: User
            partitionAccess:
              all-partitions:
                role: admin
            password: '{{{ADMIN_PASS}}}'
            shell: bash
            keys:
              - '{{{SSH_KEYS}}}'
            userType: regular
          '{{{USER_NAME}}}':
            class: User
            partitionAccess:
              all-partitions:
                role: admin
            password: '{{{ADMIN_PASS}}}'
            shell: bash
            keys:
              - '{{{SSH_KEYS}}}'
            userType: regular
post_onboard_enabled: []
EOF

# # Download
#PACKAGE_URL='https://cdn.f5.com/product/cloudsolutions/f5-bigip-runtime-init/v1.1.0/dist/f5-bigip-runtime-init-1.1.0-1.gz.run'
#PACKAGE_URL='https://cdn.f5.com/product/cloudsolutions/f5-bigip-runtime-init/v1.2.0/dist/f5-bigip-runtime-init-1.2.0-1.gz.run'
for i in {1..30}; do
    curl -fv --retry 1 --connect-timeout 5 -L ${INIT_URL} -o "/var/config/rest/downloads/f5-bigip-runtime-init.gz.run" && break || sleep 10
done
# Install
bash /var/config/rest/downloads/f5-bigip-runtime-init.gz.run -- '--cloud azure'
# Run
f5-bigip-runtime-init --config-file /config/cloud/runtime-init-conf.yaml
sleep 5
f5-bigip-runtime-init --config-file /config/cloud/runtime-init-conf-backup.yaml

So let's break down what happens when this is passed by the Azure VM Extension to be run as a Bash Script and saved on the F5 BIG-IP VE itself as executable /var/lib/waagent/CustomData. If you're interested in how this ends up on the F5 Linux VM as this file, this is a good write-up of custom data and Cloud-init on Azure Virtual Machines.

Here's the skinny on that Bash Script's workings:

  • (Lines 1-20) Set this up as a Bash Script for execution, and setup some of the Linux log, config and Named Pipe operations to allow it to interact with stdin/stdout
  • (Lines 21-23) Do some performance tweaks I don't know why F5 don't just bake into their stock image
  • (Line 25) Run some factory DO via JSON (yeah I know, I said YAML earlier - it takes both because F5 hate consistency it seems) as a pre-install
  • (Lines 31-46) Use heredoc (or multi-line strings to you and me) to generate the DO YAML file from the passed-in variables and save it within the F5 BIG-IP itself as YAML file /config/cloud/runtime-init-conf.yaml
    • You'll see two types of variable that can be passed-through here (and used anywhere within the heredoc definition, that is from Line 31 to Line 175, effectively)
      • So-called "moustache" variables are like {{{THIS}}} and refer back to the values passed in to the runtime_parameters section of the runtime-init-conf.yaml DO declaration - these effectively only reference variables locally defined within the same tmpl Bash Script file.
      • Standard Linux escape variables are like ${this} and refer back to the values passed in from the inbuilt variables definition within the templatefile definition of the custom_user_data variable in your main.tf file
        • Which in turn, are probably references back to Terraform variables such as var.INIT_URL, specified in your terraform.tfvars file (turtles all the way down)
    • When run, if you login to the F5 BIG-IP VE instance CLI (using Azure Serial Console), you can cat /config/cloud/runtime-init-conf.yaml to see the difference in how these two variables work at runtime, where
      • "Moustache" variables (like {{this}}) remain the same as you typed them initially; the replacement is done on execution of f5-bigip-runtime-init by this binary itself - so maybe still look like hostname: '{{{HOST_NAME}}}.com'
      • Standard Linux escape variables (like ${this}) have already been replaced by the text string and differ from how you typed them initially; the replacement has been done by the Terraform run itself - so maybe now look like value: password123 instead of previously being value: ${bigip_password}
  • (Lines 48-68) Do a Bash "if" loop based on the value derrived from Standard Linux escape vairable ${az_keyvault_authentication} (true or false) - which is defined in main.tf in the templatefile definition of the custom_user_data variable (so only passed-through one layer of turtles, from main.tf into custom-onboard-big.tmpl)
    • Output the next section of F5 DO YAML into /config/cloud/runtime-init-conf.yaml based on whether this was set to true (i.e. your F5 CFE cluster password is stored in an Azure Keyvault) or false (i.e. you're just hard-setting a password in the YAML DO definition)
  • (Line 70) Make a backup fo the /config/cloud/runtime-init-conf.yaml definition and save this as /config/cloud/runtime-init-conf-backup.yaml
  • (Lines 72-131) Append the F5 DO YAML soup which tells all the extensions to install (if you've used Linux, this is the equivalent of a string of apt-get install... commands, shown instead as YAML), and uses the mosutache/Linux escape variables you defined earlier to setup the box - Hostname, DNS, NTP, System Users and so on
  • (Lines 133-175) Repeat what was done for the "production" /config/cloud/runtime-init-conf.yaml F5 DO YAML file above for the "backup" F5 DO YAML file located at /config/cloud/runtime-init-conf-backup.yaml
  • (Lines 177-184) Download and install the f5-bigip-runtime-init executable - which is effectively F5's version of Cloud-init
  • (Line 186) Invoke F5 Cloud-init with the /config/cloud/runtime-init-conf.yaml file - to kick in the DO, CFE and AS3 processes and make your F5 go whir now
    • (Line 188) Bonus "do that again for no particular reason" run (Cloud-init is a one-time operation, not something that runs on every reboot...)

So that was fun eh? You mean to say I got all that from one bag of F5 oranges?

Troy McClure squeezing those F5 YAML oranges the hard way

What are you on about these turtles for?

It's a fancy way of saying (with the F5 DO/YAML, what feels like) infinite recursion - this Wikipedia write-up explains it better than I can.

Under the hood

To understand some of the pain you're going to encounter (yes, there's more), it's worth understanding the internals of what really happens under the hood. That's right, there's even more fun to this story that's hidden in those extension_services stanzas - and to expand on this, we need to move away from YAML for a second and focus on the F5 BIG-IP Automation Toolchain, namely what happens when these stanzas execute in the Cloud-init YAML:

  • extension_services -> service_operations -> extensionType: do
  • extension_services -> service_operations -> extensionType: cfe

Declarative Onboarding (Aga DO, DO, push pineapple...)

Somewhere in the backend, your YAML is converted into JSON, and posted to a HTTP REST API endpoint, specifically one you can probe yourself in advance by posting the content of a file you saved as do_test.json by swapping from the F5 BIG-IP default tmsh shell to the standard Linux bash shell as follows:

  1. Login to F5 via SSH or Azure Serial Console
  2. Swap to Bash prompt by typing: bash then hit Return key
  3. Save some DO-formatted JSON (like this example) as a file called do_test.json
  4. Throw it at the HTTP REST API with a curl post as follows:
curl -su admin: -d "@do_test.json" http://127.0.0.1:8100/mgmt/shared/declarative-onboarding | jq
  1. You'll get a JSON payload back, consisting first of a HTTP Status Code for the result, and also a playback of the JSON payload yoy posted in the declaration section

Here's an example F5 DO JSON payload you can tweak and play with:

{
  "schemaVersion": "1.0.0",
  "class": "Device",
  "async": true,
  "Common": {
    "class": "Tenant",
    "hostname": "f5vm01.test.net",
    "myDb": {
      "class": "DbVariables",
      "provision.extramb": 1000,
      "restjavad.useextramb": true,
      "dhclient.mgmt": "disable",
      "config.allow.rfc3927": "enable",
      "tm.tcpudptxchecksum": "Software-only"
    },
    "myModules": {
      "class": "Provision",
      "asm": "nominal",
      "ltm": "nominal"
    },
    "myNtp": {
      "class": "NTP",
      "servers": [
        "time.windows.com"
      ],
      "timezone": "UTC"
    },
    "myDns": {
      "class": "DNS",
      "nameServers": [
        "168.63.129.16"
      ]
    },
    "admin": {
      "class": "User",
      "partitionAccess": {
        "all-partitions": {
          "role": "admin"
        }
      },
      "shell": "bash",
      "userType": "regular",
      "keys": []
    },
    "bigipuser": {
      "class": "User",
      "partitionAccess": {
        "all-partitions": {
          "role": "admin"
        }
      },
      "shell": "bash",
      "userType": "regular",
      "keys": []
    },
    "internal": {
      "class": "VLAN",
      "interfaces": [
        {
          "name": "1.1",
          "tagged": false
        }
      ],
      "mtu": 1500,
      "tag": 4094,
      "cmpHash": "default",
      "failsafeEnabled": false,
      "failsafeAction": "failover-restart-tm",
      "failsafeTimeout": 90
    },
    "internal-self": {
      "class": "SelfIp",
      "address": "10.255.2.4/24",
      "vlan": "internal",
      "allowService": "none",
      "trafficGroup": "traffic-group-local-only"
    },
    "configSync": {
      "class": "ConfigSync",
      "configsyncIp": "/Common/internal-self/address"
    },
    "failoverAddress": {
      "class": "FailoverUnicast",
      "address": "/Common/internal-self/address",
      "port": 1026
    },
    "failoverGroup": {
      "class": "DeviceGroup",
      "type": "sync-failover",
      "members": [
        "f5vm01.test.net",
        "f5vm02.test.net"
      ],
      "owner": "/Common/failoverGroup/members/0",
      "autoSync": true,
      "saveOnAutoSync": false,
      "networkFailover": true,
      "fullLoadOnSync": false,
      "asmSync": false
    },
    "trust": {
      "class": "DeviceTrust",
      "localUsername": "admin",
      "remoteHost": "/Common/failoverGroup/members/0",
      "remoteUsername": "admin"
    }
  }
}

Generally here, 200 or 20x (where x is any number) means times are gravy, and the F5 successfully took your DO and configured itself as per your commands. Anything else and you should sit yourself down for some debugging fun, some helpful hints here:

  • It won't tell you which line your invalid IP Address is in, so good luck fishing
    • Note that in F5 land, this is a valid IP Address that effectively refers to whatever you configured the Internal NIC as: /Common/internal-self/address, or you can go for the more traditional 10.255.2.4 approach if, y'know, you like sleeping and/or seeing your kids of an evening
  • Sometimes it decides an error is not passable, and rolls back your entire config accordingly
    • It's much quicker having the F5 kick you in the balls via a "try some JSON and see if it works" approach using this method of tweaking do_test.yaml and POSTing to the HTTP REST Endpoint URL than forming the runtime-init-conf.yaml file from initial custom-onboard-big.tmpl and having to wait for terraform apply and related Azure VM Extension to kick in, and run a f5-bigip-runtime-init all over again (2-4 minutes)
  • Not that the F5 ARM template examples make it obvious, but on both nodes in a HA Active/Passive cluster, it wants both of them to refer within failoverGroup members in a consistent (Line 1) node0.hostname.com and (Line 2) node1.hostname.com
    • If you're like me, you'll read them and the order-swapping of remote_host between instance01.yaml and intance02.yaml and think "Huh, so it's specified Node0/Node1 on Node0 DO YAML, then swaps to Node1/Node0 order on Node1 DO YAML file" - nope, it's just that F5 actually mean "not the current node, y'know, the other one" when they say remote - meaning it changes each time and is locally relative

If you just want to check the status of the latest DO JSON invocation without supplying some fresh DO JSON to execute, then run:

curl -su admin: http://127.0.0.1:8100/mgmt/shared/declarative-onboarding | jq

(Note: The jq command the output is piped into simply takes the JSON response and pretty prints it into a space-delimited, multi-line JSON output, rather than just-showing-it-as-one-big-block-of-illegible-text)

Cloud Forwarding Extension (DO isn't an "Extension" clearly, otherwise it'd be acronym'd as "DOE")

Pretty much the same idea goes here, but there's less you can configure and this is a good reference of some working CFE JSON, in this case to test CFE in advance you would:

  1. Login to F5 via SSH or Azure Serial Console
  2. Swap to Bash prompt by typing: bash then hit Return key
  3. Save some DO-formatted JSON (like this example) as a file called cfe_test.json
  4. Throw it at the HTTP REST API with a curl post as follows:
curl -su admin: -d "@cfe_test.json" http://127.0.0.1:8100/mgmt/shared/cloud-failover/declare | jq
  1. You'll get a JSON payload back, consisting first of a HTTP Status Code for the result, and also a playback of the JSON payload you posted in the declaration section

Here's an example F5 CFE JSON payload you can tweak and play with:

{
  "failoverAddresses":{
     "enabled":true,
     "scopingTags": {
        "f5_cloud_failover_label": "mydeployment"
     }
     "addressGroupDefinitions": [
        {
           "type": "networkInterfaceAddress",
           "scopingAddress": "10.0.1.100"
        },
        {
           "type": "networkInterfaceAddress",
           "scopingAddress": "10.0.1.101"
        }
     ]
  }
}

If you just want to check the status of the latest CFE JSON invocation without supplying some fresh DO JSON to execute, then run:

curl -su admin: http://127.0.0.1:8100/mgmt/shared/cloud-failover/info | jq

(Note: The jq command the output is piped into simply takes the JSON response and pretty prints it into a space-delimited, multi-line JSON output, rather than just-showing-it-as-one-big-block-of-illegible-text)

Bonus Timesaver - F5 Extension versions that don't work with each other

For added bonus fun, when you're looking around Interwebs to find the correct F5 Extension RPMs (yes, at least that bit is standard Linux-like), you might stumble on a few in some of F5Networks own GitHub repos that don't work well together at all, and in one case cause the Cloud-init process to crap out just after the DO process is done and before the CFE process even begins. Here's a list of F5 Extension versions that don't play well together and which you should avoid using:

F5 Extension Version URL
DO 1.21.0 https://github.com/F5Networks/f5-declarative-onboarding/releases/download/v1.21.0/f5-declarative-onboarding-1.21.0-3.noarch.rpm
TS 1.20.0 https://github.com/F5Networks/f5-telemetry-streaming/releases/download/v1.20.0/f5-telemetry-1.20.0-3.noarch.rpm
FAST 1.9.0 https://github.com/F5Networks/f5-appsvcs-templates/releases/download/v1.9.0/f5-appsvcs-templates-1.9.0-1.noarch.rpm
CFE 1.8.0 https://github.com/F5Networks/f5-cloud-failover-extension/releases/download/v1.8.0/f5-cloud-failover-1.8.0-0.noarch.rpm
Cloud-init 1.2.1 https://cdn.f5.com/product/cloudsolutions/f5-bigip-runtime-init/v1.2.1/dist/f5-bigip-runtime-init-1.2.1-1.gz.run

All or nothing

One big thing to note with the Cloud-init/DO combo is it's all or nothing - if any one single part of that large /config/cloud/runtime-init-conf.yaml file goes wrong during the Cloud-init process, the F5 rolls itself back to it's factory state like a wet fish. Flapping. In the Azure winds. With an unknown default username and password you can't login to your newly-deployed localhost.localdomain with to debug.

My Terraform'd F5 leaving me unable to do anything, which was nice

What you therefore might like to do, to combat this situation while working out the magic incantations you need to generate your Cloud-init YAML (what, you weren't just born knowing the F5 Schema?), is to plant a known username and password before the Cloud-init process kicks off, by inserting the relevant F5 tmsh commands in to the top part of the custom-onboard-big.tmpl Bash script, before the cat /config/cloud/runtime-init-conf.yaml > /config/cloud/runtime-init-conf-backup.yaml section. Probably something like:

# Set Admin User password before DO process fails miserably
tmsh modify auth password bigipuser password123!

Time to hit refresh (yes, that is a F5 pun)

Much like an F5 deployed through Cloud-init, I've hit the end of my metaphorical anger threshold and it's probably time for a reset on what's left of my soul and sanity. I genuinely hope some of this helps those of you unfortunate enough to have to deploy an F5 into Azure through automation.

For the rest of you, I implore you - make better Load Balancer decisions. F5's appear to be flakier than a Cornetto from your local Ice Cream van in the summer.

Juniper SRX Filter-based Forwarding (FBF) Policy Based Routing

Thursday, 15 Sep 2022

If you're a Cisco Guy (or Girl, I'm not biased; both sexes are welcome to the Cisco Pain Train), you'll likely have come across a need for Policy Based Routing (PBR) to change the normal Destination IP-based logic of Network Routing to instead be a Source-IP or Interface-based logic, for some purpose of bypass. On Cisco gear, this is normally achieved via Route Maps or Route Policy Language (RPL) matching Access Control Lists (ACLs), Prefix Lists, Interfaces or some other attribute to match the desired traffic flows to "PBR" away from normal Routing.

Junos - or specifically Filter-based Forwarding (FBF) - takes a very different approach to achieving this, which may take your "Cisco brain" a bit of context-switching to adapt to, namely:

  • PBR (FBF) is performed within a new, seperate, Routing Instance (VR, Virtual Router, or VRF)
    • So all entry (ingress) / exit (egress) interfaces will need to be "imported" to also "attach" to this PBR-specific VRF
  • FBF is applied inbound on the Source-facing interface
    • So normal, non-PBR traffic flow still needs to be explicitly allowed through (otherwise it'll not just not be PBR'd, it'll be blocked from flowing entirely...)

Worked example

A worked example probably helps here, so suppose we have the following:

  • Zscaler Internet Access (or equivalent Security Service Edge [SSE] or Cloud-based Internet Firewall/Proxy)
    • Traffic is forced into Zscaler via a GRE or IPsec Tunnel which runs atop the Internet into the closest/selected Zscaler Data Centre PoP (Point of Presence)
    • Zscaler (fake) Data Centre GRE Endpoint 165.225.99.99/32 acts as the far-side GRE Tunnel IP Endpoint/Peer
  • 1 Gb Direct Internet Access (DIA) from BigNetCo with Private Allocated (PA) 192.0.2.0/24 IPv4 Address Block, effectively "leased" to Your Company
    • Static IPv4 /30 Handoff/Linknet 203.0.113.0/30 between Your Company and BigNetCo (out of BigNetCo's IPv4 Address Space)
  • Default Route on your Juniper SRX towards VRF "Zscaler-Internet"
    • GRE Tunnel is transited via Underlay VRF Internet, but forms an extension of Overlay VRF Zscaler-Internet - that is, Default Route on LAN-side of Firewall forwards traffic (following the Default) via VRF Zscaler-Internet into the GRE Tunnel to Zscaler
    • Underlay VRF "Internet" only really exists to transit the GRE Tunnel itself (i.e. has a Static Route via next-hop 203.0.113.1 BigNetCo for GRE Peer Endpoint 165.225.99.99/32 only)
  • VRF Prod exists as the only internal-facing DMZ/Security Zone, and routes the whole 10.0.0.0/8 Your Company Summary IPv4
    • Of this, most of 10.0.0.0/8 should have "Normal Internet access" - that is to say, flow via the GRE Tunnel to be scrubbed by Zscaler (i.e. appear to the Internet as a 165.x.x.x Zscaler Source IP)
    • Apart from "Direct Internet access" Subnet 10.2.0.0/24 - which (for reasons) needs to bypass Zscaler inspection and go direct to the Internet (i.e. appear to the Internet as a 192.0.2.0/24 Source IP)
      • And only then if they are going to HTTPS destinations; if they try and go to, say, HTTP or DNS, we want to send that via Zscaler inspection as per `"Normal Internet access"``

Topology

A picture paints a thousand words, so that summates as:

Juniper PBR Filter-based Forwarding Topology

Configuration

Let's look at the raw config first, and then deconstruct it down to what this is doing. For brevity I've omitted the GRE Tunnel configuration, but if you're interested I suggest you read about Juniper SRX Overlay and Underlay VRF-seperated GRE Tunnels.

Ditto, in practice you'd want to Source NAT (SNAT) the traffic to a Public IPv4 to make it routable on the Internet, but again, for brevity I'm omitting that and focusing only on the PBR-required configuration itself - as you might want to adopt similar for Private-to-Private PBR requirements.

set routing-instances Internet-Bypass-ZIA instance-type forwarding
set routing-instances Internet-Bypass-ZIA routing-options static route 0.0.0.0/0 next-hop 203.0.113.1
set routing-instances Internet-Bypass-ZIA routing-options instance-import POL_EXPORT_FROM_Prod-VRF_TO_Internet-Bypass-ZIA
set routing-instances Internet-Bypass-ZIA routing-options instance-import POL_EXPORT_FROM_Internet_TO_Internet-Bypass-ZIA
!
set firewall family inet filter PBR-BYPASS-ZSCALER-ZIA term PBR-DIRECT-INTERNET from source-address 10.2.0.0/24
set firewall family inet filter PBR-BYPASS-ZSCALER-ZIA term PBR-DIRECT-INTERNET from destination-address 0.0.0.0/0
set firewall family inet filter PBR-BYPASS-ZSCALER-ZIA term PBR-DIRECT-INTERNET from destination-port https
set firewall family inet filter PBR-BYPASS-ZSCALER-ZIA term PBR-DIRECT-INTERNET then routing-instance Internet-Bypass-ZIA
set firewall family inet filter PBR-BYPASS-ZSCALER-ZIA term term-999 then accept
!
set policy-options prefix-list PFX_FROM_Prod_TO_Internet-Bypass-ZIA 10.99.99.0/30
set policy-options prefix-list PFX_FROM_Prod_TO_Internet-Bypass-ZIA 10.0.0.0/8
set policy-options policy-statement POL_EXPORT_FROM_Prod_TO_Internet-Bypass-ZIA term 10 from instance Prod
set policy-options policy-statement POL_EXPORT_FROM_Prod_TO_Internet-Bypass-ZIA term 10 from prefix-list PFX_FROM_Prod_TO_Internet-Bypass-ZIA
set policy-options policy-statement POL_EXPORT_FROM_Prod_TO_Internet-Bypass-ZIA term 10 then accept
set policy-options policy-statement POL_EXPORT_FROM_Prod_TO_Internet-Bypass-ZIA term 999 from instance Prod
set policy-options policy-statement POL_EXPORT_FROM_Prod_TO_Internet-Bypass-ZIA term 999 then reject
!
set policy-options prefix-list PFX_FROM_Internet_TO_Internet-Bypass-ZIA 203.0.113.0/30
set policy-options policy-statement POL_EXPORT_FROM_Internet_TO_Internet-Bypass-ZIA term 10 from instance Internet
set policy-options policy-statement POL_EXPORT_FROM_Internet_TO_Internet-Bypass-ZIA term 10 from prefix-list PFX_FROM_Internet_TO_Internet-Bypass-ZIA
set policy-options policy-statement POL_EXPORT_FROM_Internet_TO_Internet-Bypass-ZIA term 10 then accept
set policy-options policy-statement POL_EXPORT_FROM_Internet_TO_Internet-Bypass-ZIA term 999 from instance Internet
set policy-options policy-statement POL_EXPORT_FROM_Internet_TO_Internet-Bypass-ZIA term 999 then reject
!
set interfaces xe-0/0/1 unit 10 family inet filter input PBR-BYPASS-ZSCALER-ZIA

Break it down now y'all

That's quite a chunk to digest, so let's break it into chunks of what it achieves, using each exclamation mark (!) as a section divider:

  1. Create a new VRF Internet-Bypass-ZIA for the PBR to occur within; map the Internet-facing WAN and Prod-facing LAN Interfaces and next-hop Prefixes into it, and set a Static Default Route out of it via the Internet-facing WAN interface
  2. Define a FBF/PBR policy called PBR-BYPASS-ZSCALER-ZIA which has one term/policy statement called PBR-DIRECT-INTERNET that matches HTTPS requests from only the "Direct Internet access" Subnet 10.2.0.0/24, and bypasses (does not PBR) any other flows (i.e. leaves them as-was to flow via Zscaler Internet as if the PBR configuration didn't exist)
  3. Create a Routing Policy POL_EXPORT_FROM_Prod_TO_Internet-Bypass-ZIA to inter-VRF the LAN-facing 10.99.99.0/30 Handoff/Linknet and associated next-hop 10.0.0.0/8 Route from the Prod VRF into the PBR Internet-Bypass-ZIA VRF
  4. Create a Routing Policy POL_EXPORT_FROM_Internet_TO_Internet-Bypass-ZIA to inter-VRF the WAN-facing 203.0.113.0/30 Handoff/Linknet from the Internet VRF into the PBR Internet-Bypass-ZIA VRF
  5. Tie it all together by applying the FBF/PBR policy PBR-BYPASS-ZSCALER-ZIA to ingress LAN-facing interface xe-0/0/1.10
    1. Note if we changed the FBF/PBR policy PBR-BYPASS-ZSCALER-ZIA term-999 line from "...then accept" to "...then reject" - or just did not have that line - then any non-PBR'd traffic would not just not be PBR'd, it'd be dropped entirely - so be very careful that the PBR/FBF policy is 100% correct before you apply it to the ingress interface

One hop this time

Hopefully this has been a useful jaunt through the differences Junos SRX uses to perform what we Cisco folk would know as PBR, and shows how it works in practice, effectively using a "third" VRF - in our example - which acts as a PBR Container to glue it all together. In practice this is nice as it keeps it obvious which flows/Source IPs/Destination IPs are being PBR'd and which aren't, however it can be confusing to wrap your head around. I also found a few gotchas if you forget/mismatch the Linknets to "import" the WAN/LAN interfaces into what is effectively a "foreign VRF" from their respective "Prod" or "Internet" native VRFs.