IPSec, BGP, AWS and VTI - tunnels dropping
I had (and still suffer with) an issue where periodically tunnels with AWS will drop, the ipsec and ike security associations will re-negotiate and establish a tunnel but the vti will not get brought back up.
This manifested as us seeing bgp peerings going offline and traffic no longer passing through some routers.
I've not been able to look in depth at the cause of the problem in VyOs yet - but if time ever becomes available I will do - but I use the following script as a hacky workaround, triggered from cron (specified using the "set system scheduled-task" commands) every 15 minutes, we should be quite well covered with this since we have a minimum of two routers connected to each VPC, each with two tunnels to the AWS VGW so we would have to lose 4 tunnels to see any actual outage:
#!/bin/vbash
down_ifs=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show interfaces | awk '$3 ~ "A/D" {print $1}')
failed=0
for iface in $down_ifs
do
vpnpeer=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show interfaces vti $iface | awk '$3 ~ "peer" {print $NF}')
vpnlocal=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show interfaces vti $iface brief | awk -v i=$iface '$1 ~ i {print $2}' | awk -F/ '{print $1}')
bgpneigh=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show ip bgp neighbors | egrep -A 1 "^Local.*${vpnlocal}" | tail -n1 | awk '{print $3}' | sed 's/,//')
ipsecstate=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show vpn ipsec sa peer $vpnpeer tunnel vti | awk '$1 ~ "vti" {print $2}')
ikestate=$(/opt/vyatta/bin/vyatta-op-cmd-wrapper show vpn ike sa peer $vpnpeer | egrep 'up|down' | awk '{print $1}')
if test "x$ipsecstate" == "xup" -a "x$ikestate" == "xup"
then
sudo /sbin/ip link set $iface up
else
echo "Interface $iface down, ike state $ikestate, tunnel state $ipsecstate, bgp neighbour $bgpneigh"
failed=1
fi
done
exit $failed
Monitoring in the cloud with DataDog
We monitor our cloud systems (and a number of on-premise systems as well) with DataDog, which has agents for all major Linux distributions as well as Windows and a few other unixes. This is great - VyOs being debian-based it should just work, right? uh-oh - no dice.
VyOs handles updates and config rollbacks in a similar way to Juniper - its a neat solution, although the rollback can be a but cumbersome - however, it does so by mounting the filesystem with overlayfs.
This causes problems for datadog as unix sockets created on overlayfs are buggy at best, and cause problems for many different systems. This affects the DataDog agent, and my solution was to use ansible to make the following changes to the agent configuration and init scripts as a work-around, given on VyOs the /tmp folder is mounted as tmpfs:
Replace in /etc/dd-agent/supervisor.conf:
sed -i 's!/opt/datadog-agent/run/datadog-supervisor.sock!/tmp/datadog-supervisor.sock!g' /etc/dd-agent/supervisor.conf
Replace in /etc/init.d/datadog-agent:
sed -i 's!^SUPERVISOR_SOCK.*$!SUPERVISOR_SOCK=/tmp/datadog-supervisor.sock!' /etc/init.d/datadog-agent
and thats it - you can configure the agent and start the process as normal now - watch updates to dd-agent and version updates of vyos as you will likely have to re-apply those changes.
1 comment:
This is really great. I was just looking for monitoring for Vyatta and this works splendidly. I also didn't know about DataDog, so I really got two great outcomes from this article! I owe you a coke! :)
Post a Comment