Introduction to IP Failover for your Server
When managing any type of server, it’s generally wise to have a solution in place for when a server fails or when the server needs to be taken offline. When working on a high availability server setup, the question is how to quickly transfer activity from the failed or offline server to the backup server. One method is to use IP failover, which is the focus of this post. In short, IP failover is the method of moving the IP address of the failed server to the backup server, allowing requests to be handled by the backup server until the main server is once again functional.
IP failover is mostly a failover solution for servers on the same subnetwork, but it’s possible to use IP failover for servers on different networks, albeit there are other much better solutions (e.g. proxy or load balancing server). When the main server goes offline, the backup server will bring up a new network interface and configure the new interface to have the same static IP address as the main server. The backup server will then send ARP announcements to update the ARP cache on the router and on neighboring devices, allowing future traffic to be routed to the backup server. Once the main server is functional, the reverse happens and the main server creates a virtual network interface with its static IP address and the backup server will remove its own virtual network interface. The main server will then broadcast a gratuitous ARP to once again update where network packets should be sent.
In a IP failover setup of two servers, there will be three IP addresses that are actually used. Each server will have it’s own static IP address inside of the subnetwork that isn’t removed or switched between the servers. The third IP address is the one that floats between the two servers depending on what server is available. There are multiple reasons why we dedicate IP addresses to each server, even if it’s offline. First, we can use the IP address (or better yet use a hostname) to verify if the server is online or not. Secondly, having the servers still connected to the network allows remote maintenance on a server while being able to still change what server should handle requests to the floating IP address / failover IP address.
There are multiple methods to determine if the main server is offline. The most basic solution is to continuously ping the main server from the backup server until requests start to timeout. Only using PING can bring up a few issues. The first problem is a running service could fail on the main server, causing errors to the end user, but PING requests will continue to respond normally and the backup server will not be used. The second problem with only using PING is it’s more difficult to manually switch off the main server and redirect traffic to the backup server. A better solution is to create a small program that responds to requests from the backup server and alerts if the backup server should take over requests. Your program will be able to check if all the services are running properly (checking logs and the status of each service) of if the server is going to have maintenance performed and then tell the backup server to switch the floating IP even if the main server isn’t offline. There are also programs such as heartbeart that will perform a similar service.
VPS IP Failover
Multiple VPS and dedicated server providers have implemented IP failover by using a floating IP address. Through the hosting provider’s website or API, you will configure your “floating IP address” to be assigned to one of your servers. These VPS and hosting providers generally include load balancing solutions in conjunction with the floating IP address since the technology is basically the same.
A few VPS providers that offer Floating IP addresses are UpCloud, CloudVPS, LunaNode, and DigitalOcean. Each company implements the feature differently, but they essentially all work the same by having an IP address switch between servers that you want. Amazon (AWS) also offers a similar solution called Elastic IP, and OpenStack servers apparently support native IP failover.
IP Failover Solutions
Depending on your server environment, there might be many or very few solutions you can actually choose. If you’re running a Virtual Private Server, you will either have to find a VPS provider that supports IP Failover or a Floating IP Address. If your provider doesn’t offer these services, you can also setup an additional server as a load balancer. The load balancer will accept all of the normal requests to your server and then route them to one of your other servers. This will help with keeping your service always available and could also help with the performance of your application by distributing the requests amount multiple servers.
Another solution is to use Dyanmic DNS, which is slower than IP Failover but still gets the job done for less critical services. When your main server goes down, you will update the Dynamic DNS records to point to your backup server’s IP address. Due to DNS propagation and improper caching on some DNS servers, the time it takes for requests to be redirected to the backup server could take anywhere from 2 to 120 seconds, possibly longer depending on TTL settings. An answer on Super User goes into depth on How long DNS records are cached.
If you have full control of your server environment, you can use any tool you find fit. The tools heartbeat and pacemaker are very popular when creating a high availability server setup. But for small setups where a simple solution is all you are looking for, a basic shell script might be all that you need. You can easily add and delete new virtual networks using ip
or ifconfig
, and check if a server is responsive with PING or FPING. Also looking into load balancing is a great solution that will distribute your requests among all of your servers along with keeping your service running smoothly if one of the servers becomes unresponsive or is taken offline for maintenance.