Categories
OpenStack Tech

Testing Keepalived

Welcome back! I just finished my lethargic aestival period, emerging from the swimming pool like a manatee. Back to the (awaited) office routine, I had a clash with Keepalived; actually with around four hundred instances. Yes, 400 instances spawned to control the same number of high availability routers in OpenStack.

The OpenStack L3 agent is in charge, among other operations, of deploying the routing infrastructure, depending on the selected configuration. In case of high availability, a router instance is spawned in each HA physical host. A router instance, in OpenStack, is a namespace connected to the networks routed via itself where the packets are routed using netfilter rules.

Those router instances use Virtual Router Redundancy Protocol (VRRP), via Keepalived. This protocol ensures only one instance is in active mode, while the other ones are in backup state. When a router becomes active, Keepalived sets the virtual routes on the namespace and IP addresses on the interfaces. Precisely on this transition is when the problems start. But first, let me explain just a bit what VRRP and Keepalived are.

VRRP protocol and Keepalived.

VRRP protocol “is a computer networking protocol that provides for automatic assignment of available Internet Protocol (IP) routers to participating hosts”. That means, VRRP is an election protocol that will dynamically assign an IP address to a single router in a pool of then in the same network. The protocol defines a virtual router, that is an abstract representation of multiple physical routers. Outside this abstraction, other systems only see one single gateway (one single IP address), belonging at that moment to a single physical router interface.

A VRRP virtual router uses the MAC address 00:00:5E:00:01:XX, being XX the Virtual Router Identifier (VRID), which is different for each virtual router in the network. Physical routers within the virtual router (that means, each instance spawned in a host which is part of a virtual router) must communicate within themselves using packets with multicast address 224.0.0.18 and IP protocol 112. In a nutshell, this is what you’ll see capturing in the HA interface:

Active router is sending periodically a VRRP advertisement packet. When backup routers do not receive this packet in a period longer than three times the advertisement period, the active router is considered dead. The virtual router enters in a transition state to elect a new active router. The backup router with the highest priority becomes the new active router.

Keepalived is a service that provides load balancing and high-availability to Linux systems. It uses VRRP to provide the high-availability. I would recommend you to read the following articles to start digging into this topic:

The problem and how we mitigated (not solved) it.

The environment is an Openstack deployment consisting of three beefy controller servers and several several compute nodes. Because the virtual routers are hosted in the controller nodes, the compute hosts are irrelevant here. When the OpenStack L3 agents are started, the routers, if not present in the hosts, are spawned sequentially. In this deployment, with 400 routers, this process takes some minutes. During this time, the physical router instances (and their Keepalived processes) belonging to a virtual router are started in each controller node. When a Keepalived instance is started, the voting process starts too. Each Keepalived starts sending the VRRP advertisements with the configured priority, that are received by the other Keepalived instances of the virtual router; that will decide the active instance. Once one instance is declared active, the VIPs are configured in the router interfaces. To populate in the network the new location (MAC address) of those IP addresses, several GARPs are sent per IP address in a fixed interval.

Everything runs smoothly until a fail-over occurs. Just when we need Keepalived the most, it becomes the problem. When one of the controllers is disconnected, the active router instances spawned on it are removed from the virtual routers. A third of those 400 virtual routers start a new voting process, sending multicast VRRP messages and GARPs once the VIPs are configured. The network backend, OVS, was unable to handle the torrent of multicast VRRP messages (to vote for a new active instance) and ARP packets. The Keepalived processes were continuously sending VRRP advertisement messages increasing the softirq queues CPU consumption until the system becomes unresponsive. After a quick exorcism (“$sudo reboot”), all the defunct and zombie processes rested in peace.

To mitigate (not to solve) this problem, we tweaked some Keepalived configuration parameters:

  • The VRRP advertisement interval. This parameter defines the delay, in seconds, between two advertisement packets. Increasing this value we gave to the CPU more time to process all the incoming messages during a voting process. Of course, increasing this parameter implies that the transition time from backup to active is also increased. The virtual router response time to a failure is increased proportionally. The value of this parameter should be a compromise between response time, CPU power and the number of routers spawned in a single controller.
  • The GARP delay and the GARP repeat. The GARP for each VIP assigned to a router that becomes active is sent several times. The first parameter controls the time waited between those sending; the second one controls how many times the GARP is sent. Increasing the delay and reducing the number of times the GARP for a single VIP is sent, reduces the burden of the network backend and the CPU.

Two easy tweaks to solve a difficult problem we had, but very effective. Please take a look at the Keepalived configuration section to realize the flexibility of this process and in how many ways it can be configured. For example, if, as we expected initially, the problem is on the multicast packets, you can configure Keepalived to run on unicast.

How to test Keepalived in your own computer.

And now the funny part, messing with the system. This section provides a quick way to test Keepalived and check experimentally how the VRRP protocol works in this implementation.

To test Keepalived you will need at least two instances; one will be the active one and the other one the backup (if there is communication between both). I’ve created a small environment with three namespaces, one to play the network switch role and the other two to host the Keepalived instances. Once you finish testing, you can clean up your environment just deleting the three namespaces.

To build it you can execute the following commands:

ip netns add ns01
ip netns add ns02
ip netns add ns03

ip link add veth01_02 type veth peer name veth02_01
ip link set dev veth01_02 netns ns01
ip link set dev veth02_01 netns ns02
ip link add veth01_03 type veth peer name veth03_01
ip link set dev veth01_03 netns ns01
ip link set dev veth03_01 netns ns03

ip netns exec ns01 ip link set veth01_02 up
ip netns exec ns01 ip link set veth01_03 up
ip netns exec ns02 ip link set veth02_01 up
ip netns exec ns03 ip link set veth03_01 up

ip netns exec ns01 add bridge0 type bridge
ip netns exec ns01 ip link add bridge0 type bridge
ip netns exec ns01 ip link set dev bridge0 up
ip netns exec ns01 ip a add 10.1.0.1/24 dev bridge0

ip netns exec ns01 ip link set veth01_02 master bridge0
ip netns exec ns01 ip link set veth01_03 master bridge0

ip netns exec ns02 ip a add 10.1.0.2/24 dev veth02_01
ip netns exec ns03 ip a add 10.1.0.3/24 dev veth03_01

Now you have you isolated namespaces to run the Keepalived instances that will act as compute nodes in this simulation. The next step is to create the configuration files. There are plenty of examples in the internet; the following ones contain the minimum set of parameters to run.

$ cat /tmp/keep02.conf
vrrp_instance VI_2 {
    state BACKUP
    interface veth02_01
    virtual_router_id 51
    priority 202
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass pass
    }
    virtual_ipaddress {
        10.1.0.10/24
    }
}

$ cat /tmp/keep03.conf
vrrp_instance VI_3 {
    state BACKUP
    interface veth03_01
    virtual_router_id 51
    priority 203
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass pass
    }
    virtual_ipaddress {
        10.1.0.10/24
    }
}

Notice the small differences between both configs: the instance name, the interface and the priority.

“Start your engines”.

To execute both Keepalived instances you’ll need to execute the following commands:

$ ip netns exec ns02 keepalived -f /tmp/keep02.conf -p /var/run/keep02.pid -r /var/run/keep02_vrrp.pid -c /var/run/keep02_checkers.pid
$ ip netns exec ns03 keepalived -f /tmp/keep03.conf -p /var/run/keep03.pid -r /var/run/keep03_vrrp.pid -c /var/run/keep03_checkers.pid

Now you can capture the traffic in “bridge01” interface. You will see how, for example, both instances send the VRRP announcement packets to the same multicast IP address 224.0.0.18, as commented before. You’ll see too how VI_3, once it wins the voting process because of the higher priority, becomes master and sends the GARPs associated to the configured virtual IP address.

This is an example of traffic capture with the previous configuration:

289 288.147840219    10.1.0.2 → 224.0.0.18   VRRP 54 Announcement (v2)
290 289.149170757    10.1.0.2 → 224.0.0.18   VRRP 54 Announcement (v2)
291 290.150459728    10.1.0.2 → 224.0.0.18   VRRP 54 Announcement (v2)
292 290.781150191    10.1.0.3 → 224.0.0.22   IGMPv3 54 Membership Report / Leave group 224.0.0.18
293 290.833163104    10.1.0.3 → 224.0.0.22   IGMPv3 54 Membership Report / Join group 224.0.0.18 for any sources
294 291.151769259    10.1.0.2 → 224.0.0.18   VRRP 54 Announcement (v2)
295 291.269178990    10.1.0.3 → 224.0.0.22   IGMPv3 54 Membership Report / Join group 224.0.0.18 for any sources
296 292.152456908    10.1.0.2 → 224.0.0.18   VRRP 54 Announcement (v2)
298 293.153391474    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
299 294.154561143 ce:b8:43:a3:f4:6e → Broadcast   ARP 42 Gratuitous ARP for 10.1.0.10 (Request)
304 294.154852499    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
305 295.155467814    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
306 296.156314649    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
307 297.156955655    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
308 298.157513686    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
309 299.155671882 ce:b8:43:a3:f4:6e → Broadcast   ARP 42 Gratuitous ARP for 10.1.0.10 (Request)
314 299.157843563    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
315 300.158573557    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
316 301.159233499    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
317 302.159935692    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)
318 303.160624007    10.1.0.3 → 224.0.0.18   VRRP 54 Announcement (v2)

With this small environment in place, you can play with the instance configuration files, changing the virtual router priority, the set of virtual IP addresses, the GARP parameters, etc. To reload an instance configuration, once you finished editing its config file, you need to send SIGHUP to the process.

I hope this small guide helps you to understand a bit better how the VRRP protocol and Keepalived work.

Leave a Reply

Your email address will not be published. Required fields are marked *