Thursday 1 October 2015

Cloud Networking and Monitoring with VyOs

Some useful hints and tips from my experience working with VyOs (http://vyos.net) routers - a fork of the, now Brocade-owned, Vyatta linux-based router OS - in AWS, KVM and Xen environments.

IPSec, BGP, AWS and VTI - tunnels dropping


I had (and still suffer with) an issue where periodically tunnels with AWS will drop, the ipsec and ike security associations will re-negotiate and establish a tunnel but the vti will not get brought back up.

This manifested as us seeing bgp peerings going offline and traffic no longer passing through some routers.

I've not been able to look in depth at the cause of the problem in VyOs yet - but if time ever becomes available I will do - but I use the following script as a hacky workaround, triggered from cron (specified using the "set system scheduled-task" commands) every 15 minutes, we should be quite well covered with this since we have a minimum of two routers connected to each VPC, each with two tunnels to the AWS VGW so we would have to lose 4 tunnels to see any actual outage:

#!/bin/vbash

down_ifs=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show interfaces | awk '$3 ~ "A/D" {print $1}')
failed=0

for iface in $down_ifs
do
  vpnpeer=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show interfaces vti $iface | awk '$3 ~ "peer" {print $NF}')
  vpnlocal=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show interfaces vti $iface brief | awk -v i=$iface '$1 ~ i {print $2}' | awk -F/ '{print $1}')
  bgpneigh=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show ip bgp neighbors | egrep -A 1 "^Local.*${vpnlocal}" | tail -n1 | awk '{print $3}' | sed 's/,//')
  ipsecstate=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show vpn ipsec sa peer $vpnpeer tunnel vti | awk '$1 ~ "vti" {print $2}')
  ikestate=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show vpn ike sa peer $vpnpeer | egrep 'up|down' | awk '{print $1}')

  if test "x$ipsecstate" == "xup" -a "x$ikestate" == "xup"
  then
    sudo /sbin/ip link set $iface up
  else
    echo "Interface $iface down, ike state $ikestate, tunnel state $ipsecstate, bgp neighbour $bgpneigh"
    failed=1
  fi
done

exit $failed

Monitoring in the cloud with DataDog


We monitor our cloud systems (and a number of on-premise systems as well) with DataDog, which has agents for all major Linux distributions as well as Windows and a few other unixes.  This is great - VyOs being debian-based it should just work, right? uh-oh - no dice.

VyOs handles updates and config rollbacks in a similar way to Juniper - its a neat solution, although the rollback can be a but cumbersome - however, it does so by mounting the filesystem with overlayfs.

This causes problems for datadog as unix sockets created on overlayfs are buggy at best, and cause problems for many different systems.  This affects the DataDog agent, and my solution was to use ansible to make the following changes to the agent configuration and init scripts as a work-around, given on VyOs the /tmp folder is mounted as tmpfs:

Replace in /etc/dd-agent/supervisor.conf:
sed -i 's!/opt/datadog-agent/run/datadog-supervisor.sock!/tmp/datadog-supervisor.sock!g' /etc/dd-agent/supervisor.conf

Replace in /etc/init.d/datadog-agent:
sed -i 's!^SUPERVISOR_SOCK.*$!SUPERVISOR_SOCK=/tmp/datadog-supervisor.sock!' /etc/init.d/datadog-agent

and thats it - you can configure the agent and start the process as normal now - watch updates to dd-agent and version updates of vyos as you will likely have to re-apply those changes.