Improving CloudForms VMDB failover with keepalived and a virtual IP

Out of the box CloudForms comes with the ability to deploy PostgreSQL appliances that can be configured into a primary/standby relationship. If the primary fails, the standby takes over automatically.

Your non-database appliances are hardcoded to reference the primary via it’s IP address. Unfortunately, when the primary fails over to a standby this IP has changed but your appliances aren’t immediately aware. A watchdog service running on each appliance keeps an eye on the database and identifies when the primary has failed over. After a set period of time the watchdog updates the hardcoded database IP to the new primary and then restarts your evmserverd process to make the change take effect.

This occurs on every non-database appliance and so a primary failover event means an unavoidable outage across your entire region. Not good. But what if we could at least reduce the outage duration, perhaps by avoiding the restart of your main CloudForms service?

This post discusses one technique that doesn’t require CloudForms service restarts – use a virtual IP for your database. This VIP will live on the database that is the current primary and move when the role of primary fails over. With no more need to restart your CloudForms services recovery time from failover events is substantially reduced.

Read more “Improving CloudForms VMDB failover with keepalived and a virtual IP”