WiPi Monitoring with a Raspberry Pi WLAN Device
The idea
I wanted to monitor the quality of the Wireless in my Office to emulate the real-world experience some of our End Users, as I was struggling to correlate what I was seeing on the Wireless LAN Controller (WLC)'s Access Point (AP) monitoring stats to the poor experience being reported. Simplistically, this is taking the magic of a Raspberry Pi, a bit of JSON and a Log Analysis Stack (I chose Splunk, I could have used ELK Stack, or Logz.io or...) that gives you pretty dashboards of the data.
The gear
- A Virtual Private Server (VPS) with at least 8 GB of Disk (I went with Vultr)
- A Raspberry Pi bundle (I went with a Raspberry Pi 3 Official Desktop Starter Kit (16Gb, Black))
- Splunk Free Edition
- Raspbian (Lite) OS
- A few hours of your spare time
The steps
Any values below <like_this>
are variables to show an example value, specific to your installation; for instance <vps-server.com>
is whatever DNS Domain Name/Dynamic DNS Domain Nam/Public IP Address that your VPS uses.
VPS Setup
We'll start with the VPS setup, from a high-level overview; if you're using a Provider like Vultr you should definitely harden your VPS and configure the Firewall/WAF to only allow your WiPi to send HTTP JSON feeds.
- Install your Linux distribution of choice when you boot your VPS (I'm a Debian man)
- Install Splunk onto your VPS, or more concisely:
- Download the RPM from Splunk's website onto your VPS (I ended up downloading locally, and then uploading the RPM via SCP to the VPS with a
scp splunk.rpm admin@vps-server.com:/tmp/
- Install the RPM from the temporary directory:
rpm -i /tmp/splunk.rpm
- Run Splunk and accept the Terms from the /opt/splunk directory:
/opt/splunk/bin/splunk start
- Download the RPM from Splunk's website onto your VPS (I ended up downloading locally, and then uploading the RPM via SCP to the VPS with a
- Setup your VPS Firewall/WAF to allow
tcp/8000
andtcp/8088
(and probablytcp/22
so you can SSH into it) - I'd suggest using a Domain Name to point at it, or a Free Dynamic DNS Service such as DDNSS.de and point an A Name at your VPS' Public IP
- Harden your VPS by installing Fail2Ban and other similar tools
Splunk Setup
I'm focusing on Splunk because that's what I used, but similar steps will exist for ELK or Logz.io. The main advantage of Splunk is that it's free and quick; the Free version does have limitations, however - such as 500 MB Indexed Data and a time-limit on concurrent Search Queries used in your Dashboards, i.e. number of widgets you can use. There's much more that you can do, but I'll get you going with some basics.
- Create an Index from Settings -> Data -> Index -> New Index
- I called mine
wipi_monitoring
and accepted the defaults - Indexes store the Events, so you can reference the data in this Index with a SQL-like Query, like this, from the Search & Reporting App on the Homepage:
index="wipi_monitoring"
- I called mine
- Create a HTTP Event Collector (HEC) to receive the JSON payload from the WiPi
- Go to Settings -> Data -> Data Inputs -> HTTP Event Collector -> Add new
- Give it a Name (I went for
WiPi Monitoring
) and on the Next screen, associate it with thewipi_monitoring
Index you made earlier- You can also tell it the Source Type is
_json
to speed processing up - Make a note of the API Key it generates, you'll need this later
- You can also tell it the Source Type is
- Turn on HEC (it doesn't auto-enable) from the Global Settings -> Enable -> Save option, next to the New Token option within Settings -> Data -> Data Inputs -> HTTP Event Collector
- Create a simple Splunk Dashboard, from Splunk -> Search & Reporting -> Dashboards -> Create New Dashboard
- Most of my Panels are either Line Chart or Single Values, here's some of the example Searches used for them, style them how you want:
- (Uptime Panel, Single Values) Search:
index="wipi_monitoring" | timechart max(uptime)
- (BBC.co.uk Ping RTT Panel, Line Chart) Search:
index="wipi_monitoring" | timechart max(ping_bbc_avg)
- (WLAN MAC Change Count Panel, Line Chart) Search:
index="wipi_monitoring" | timechart distinct_count(wlan_ap_mac)
- This returns a count of unique AP MAC Addresses seen (i.e. if it goes up from one, you've roamed between AP Coverage Areas)
- (Uptime Panel, Single Values) Search:
- Most of my Panels are either Line Chart or Single Values, here's some of the example Searches used for them, style them how you want:
You should now be able to login to your Splunk instance at http://<vps-server.com>:8000
.
Make sure you can send HTTP to your Splunk instance on Port 8088
(or whatever <HEC Port>
you picked otherwise); a quick way of this is using Telnet to see if it connects at all, i.e. telnet <vps-server.com> 8088
.
The script
The following script will performa a series of checks (i.e. Ping a host, or TCP-connect to a host); time how long it took and then consolidate the results into a a JSON payload, similar to the below, and finally push this to Splunk via the HTTP Event Collector as frequently as scheduled with your Cron job. It also scrapes information from the WiPi's WLAN interface, such as RTS/CTS issues; signal quality; current bitrate; AP MACs seen and so on.
You can customise the variables towards the top of the script with your values; don't forget to replace <vps-server.com>
with your VPS Server's IP Address/DNS Name; <your_location>
with a meaningful Location String to you and <your_splunk_api_token>
with your Splunk HEC API Token from earlier on.
JSON Payload example
{
"dns_ms_onedrive": 101,
"host": "wipi1",
"http_ms_teams": 220,
"http_google": 500,
"location": "Some House",
"ping_bbc_avg": 22,
"ping_bbc_loss": "0%"",
"ping_bbc_max": 24,
"ping_default_gateway_avg": 13,
"ping_default_gateway_loss": "0%",
"ping_default_gateway_max": 47,
"uptime": 417.7,
"wlan_ap_mac": "00:11:22:33:44:55",
"wlan_bitrate": "72.2 Mb/s",
"wlan_fragment_threshold": "off"
"wlan_invalid": 0,
"wlan_link_quality": "49/70",
"wlan_missed_beacon": 0,
"wlan_rts_threshold": "off",
"wlan_rx_invalid_crypt": 0,
"wlan_rx_invalid_frag": 0,
"wlan_rx_invalid_nwid": 0,
"wlan_signal": -61 dBm,
"wlan_tx_excessive_retries": 10,
"wlan_tx_power": 31 dBm,
}
Cron job example
Adding this to /etc/crontab
would cause the script to be run every 5 minutes. Change admin
to the Admin User of your Pi (which you should change from pi
factory username for security reasons):
# WiPi Monitoring report back to Splunk
*/5 * * * * admin python /opt/wipi-monitoring/main.py >> /opt/wipi-monitoring/main.log 2>&1
Python Script
This will require you to pip install netifaces dnspython
first.
# Author: notworkd.com
# Date: 19-Jun-2020
# Description: Monitor WiFi data and send back to WiPi Monitoring Dashboard
import netifaces
import os
import subprocess
import requests
import json
import dns.resolver
import time
import datetime
# Define constants
SPLUNK_SERVER = 'https://<vps-server.com>:8088'
SPLUNK_API_KEY = '<your_splunk_api_token>'
WIPI_LOCATION = '<your_location>'
# Functions
# Update Splunk HTTP Event Collector
def updateSplunkHec(**data):
url = SPLUNK_SERVER + '/services/collector'
post = {
"event": data
}
r = requests.post(url, json=post, headers={"Authorization":"Splunk "+SPLUNK_API_KEY}, verify=False)
print r.text
if r.status_code == 200:
return True
else:
return False
# Get Hostname of this WiPi
def getHostname():
hostname = os.uname()[1]
return hostname
# Get Default Gateway for WiFi Adapter
def getDefaultGateway():
gws = netifaces.gateways()
return gws['default'][netifaces.AF_INET][0]
# Get ICMP Ping response time
def getIcmpPing(host, count=5):
cmd = "ping -c {} {}".format(count, host).split(' ')
try:
output = subprocess.check_output(cmd).decode().strip()
lines = output.split("\n")
total = lines[-2].split(',')[3].split()[1]
loss = lines[-2].split(',')[2].split()[0]
timing = lines[-1].split()[3].split('/')
return {
'type': 'rtt',
'min': float(timing[0]),
'avg': float(timing[1]),
'max': float(timing[2]),
'mdev': float(timing[3]),
'total': str(total),
'loss': str(loss),
}
except Exception as e:
print(e)
return None
# Get HTTP Connect response time
def getHttpConnect(url, timeout):
r = requests.get(url, timeout=timeout)
return int(1000 * round(r.elapsed.total_seconds(), 2))
# Get DNS response time
def getDnsResolve(fqdn):
answers = dns.resolver.query(fqdn, 'a')
return int(1000 * answers.response.time)
# Get WLAN Interface Stats
def getWlanStats(adapter):
cmd = "iwconfig " + adapter
try:
output = subprocess.check_output(cmd, shell=True).decode().strip()
lines = output.split("\n")
frequency = lines[-7].split(' ')[6].split(':')[1]
ap = lines[-7].split(' ')[7].split(': ')[1]
bitrate = lines[-6].split(' ')[5].split('=')[1]
txpwr = lines[-6].split(' ')[6].split('=')[1]
rtsthrsh = lines[-5].split(' ')[6].split(':')[1]
frgthrsh = lines[-5].split(' ')[7].split(':')[1]
link = lines[-3].split(' ')[5].split('=')[1]
snr = lines[-3].split(' ')[6].split('=')[1]
rxinnw = lines[-2].split(' ')[5].split(':')[1]
rxincr = lines[-2].split(' ')[6].split(':')[1]
rxinfr = lines[-2].split(' ')[7].split(':')[1]
txrtry = lines[-1].split(' ')[5].split(':')[1]
invalid = lines[-1].split(' ')[6].split(':')[1]
missedbcn = lines[-1].split(' ')[7].split(':')[1]
return {
'frequency': frequency,
'access_point': ap,
'bitrate': bitrate,
'tx_power': txpwr,
'rts_threshold': rtsthrsh,
'fragment_threshold': frgthrsh,
'link_quality': link,
'signal': snr,
'rx_invalid_nwid': rxinnw,
'rx_invalid_crypt': rxincr,
'rx_invalid_frag': rxinfr,
'tx_excessive_retries': txrtry,
'invalid': invalid,
'missed_beacon': missedbcn
}
except Exception as e:
print(e)
return None
# Get Device Uptime
def getUptime():
cmd = "awk '{print $0/60;}' /proc/uptime"
try:
output = subprocess.check_output(cmd, shell=True).decode().strip()
return output
except Exception as e:
print(e)
return None
# Main program
# Initialise variables
output = {'host': getHostname(), 'location': WIPI_LOCATION, 'uptime': getUptime()}
# Ping Default Gateway (max, avg and loss)
ping = getIcmpPing(getDefaultGateway())
output.update({'ping_default_gateway_max': int(ping['max'])})
output.update({'ping_default_gateway_avg': int(ping['avg'])})
output.update({'ping_default_gateway_loss': str(ping['loss'])})
# Ping BBC
ping = getIcmpPing('www.bbc.co.uk')
output.update({'ping_bbc_max': int(ping['max'])})
output.update({'ping_bbc_avg': int(ping['avg'])})
output.update({'ping_bbc_loss': str(ping['loss'])})
# HTTP Connect Google
http = getHttpConnect('https://www.google.co.uk', 30)
output.update({'http_google': int(http)})
# HTTP Connect Microsoft Teams
http = getHttpConnect('https://teams.microsoft.com', 30)
output.update({'http_ms_teams': int(http)})
# DNS Resolve OneDrive
dns = getDnsResolve('sharepoint.com')
output.update({'dns_ms_onedrive': int(dns)})
# Get WiFi Stats
wifi = getWlanStats('wlan0')
output.update({'wlan_ap_mac': str(wifi['access_point'])})
output.update({'wlan_bitrate': str(wifi['bitrate'])})
output.update({'wlan_tx_power': str(wifi['tx_power'])})
output.update({'wlan_rts_threshold': str(wifi['rts_threshold'])})
output.update({'wlan_fragment_threshold': str(wifi['fragment_threshold'])})
output.update({'wlan_link_quality': str(wifi['link_quality'])})
output.update({'wlan_signal': str(wifi['signal'])})
output.update({'wlan_rx_invalid_nwid': int(wifi['rx_invalid_nwid'])})
output.update({'wlan_rx_invalid_crypt': int(wifi['rx_invalid_crypt'])})
output.update({'wlan_rx_invalid_frag': int(wifi['rx_invalid_frag'])})
output.update({'wlan_tx_excessive_retries': int(wifi['tx_excessive_retries'])})
output.update({'wlan_invalid': int(wifi['invalid'])})
output.update({'wlan_missed_beacon': int(wifi['missed_beacon'])})
# Update Splunk via REST API
print(datetime.datetime.now().replace(microsecond=0).isoformat())
updateSplunkHec(**output)
print("----")
The outcome
Once done, you'll get JSON payloads being sent into Splunk via the HEC every 5 minutes, and then go start to have lovely visual dashboards like this, showing you the real-world HTTP, DNS, Ping and WLAN stats a real-world User in that Office Location might actually be seeing.
Enjoy finally having a use for that Pi other than taking up space in your drawer :).