Improving CloudForms VMDB failover with keepalived and a virtual IP

Out of the box CloudForms comes with the ability to deploy PostgreSQL appliances that can be configured into a primary/standby relationship. If the primary fails, the standby takes over automatically.

Your non-database appliances are hardcoded to reference the primary via it’s IP address. Unfortunately, when the primary fails over to a standby this IP has changed but your appliances aren’t immediately aware. A watchdog service running on each appliance keeps an eye on the database and identifies when the primary has failed over. After a set period of time the watchdog updates the hardcoded database IP to the new primary and then restarts your evmserverd process to make the change take effect.

This occurs on every non-database appliance and so a primary failover event means an unavoidable outage across your entire region. Not good. But what if we could at least reduce the outage duration, perhaps by avoiding the restart of your main CloudForms service?

This post discusses one technique that doesn’t require CloudForms service restarts – use a virtual IP for your database. This VIP will live on the database that is the current primary and move when the role of primary fails over. With no more need to restart your CloudForms services recovery time from failover events is substantially reduced.

Read more “Improving CloudForms VMDB failover with keepalived and a virtual IP”

Configuration domains in CloudForms/ManageIQ – part deux

In the last post on this topic I discussed a technique of separating your configuration from your code in CloudForms’ Automation Engine. The technique relied on creating class overrides in a higher priority domain, enabling attributes on those higher priority classes to be passed down.

It works but it’s…clunky, and it requires a duplicate of every class in a higher domain. Not the most scalable technique!

Below I present two other methods – one that uses $evm.instance_get, and another that loads the config directly onto the root object.

Read more “Configuration domains in CloudForms/ManageIQ – part deux”

PKI sign-on to CloudForms using RH SSO 7.2 – Part 2 of 2

In the previous post, we looked at configuring SSO 7.2 for mutual TLS, requesting a user certificate that is validated against a configured trust store.

In this post we’ll look at the second half of that task – configuring CloudForms for SAML authentication and enabling the X.509 Browser Flow in SSO.

Read more “PKI sign-on to CloudForms using RH SSO 7.2 – Part 2 of 2”

PKI sign-on to CloudForms using RH SSO 7.2 – Part 1 of 2

(Part 2 is available here!)

With the advent of Public Key Infrastructure across organisations, it became possible to authenticate a user based on the certificate they provide. Red Hat Single Sign On 7.2 is able to authenticate users based on a provided certificate, matching some value from the certificate (e.g. CN, email) against RH SSO’s internal database of users.

When combined with the Security Assertion Markup Language (SAML) authentication out-of-the-box in CloudForms, we can achieve passwordless, certificate-based sign on to CloudForms.

There are three main areas to this configuration::

  1. Configuring RH SSO 7.2 for mutual TLS, requesting a client certificate.
  2. Configuring CloudForms for SAML against RH SSO 7.2.
  3. Enable the X.509 browser authentication flow in RH SSO 7.2.

Step 1 is the focus of this blog post. Steps 2 and 3 will follow in the next post.

Read more “PKI sign-on to CloudForms using RH SSO 7.2 – Part 1 of 2”

Configuration Domains with CloudForms/ManageIQ

The CloudForms/ManageIQ automation engine gives us great power to create classes and methods that we can execute in response to various actions (e.g. button presses, events in the environment, REST API calls, etc).

Using Ruby and Ansible we can execute custom code to customise and enhance the CloudForms experience for our customers and users – we can add buttons to interfaces, override provisioning workflows, override approval workflows…etc.

Normally, we’ll have at least two environments – a Quality Assurance environment, and a Production environment. We don’t want those two to mix, and quite often key settings will be different between the two environments. For example,  the FQDNs and credentials used to access remote services (e.g. Single Sign On, corporate DNS, etc) may differ between QA and Production. But the underlying logic in the code remains the same – it’s just environment-specific configuration that changes.

To get to a full Continuous Integration/Continuous Deployment model we need to be able to promote Automate code between environments cleanly. Having to execute find/replace across our automate code to replace QA settings with Production settings is not clean.

So how do we keep our common automation logic from intermixing with ou environment-specific settings and configuration?

The answer is a configuration domain. Read more “Configuration Domains with CloudForms/ManageIQ”

Building resilient state machines with CloudForms/ManageIQ

State Machines are a powerful feature of CloudForms automation. If you are unfamiliar with the concept, a state machine in the context of CloudForms automation is a series of steps that are executed sequentially by the Automation Engine.

In particular, state machines give us:

  • The ability to retry steps if they fail.
  • Jump between steps by name, or skip the immediately following step.
  • Exceute code on enter or exit of a step, or if the step returns an error.
  • Store state variables in earlier steps that can be referenced in later steps.
  • Since CloudForms 4.6, we now have the ability to execute Ansible playbooks as steps in a state machine.

State machines are already used heavily for the provisioning workflows that ship with CloudForms out-of-the-box. If you’d like to know more about creating state machines, have a look at Mastering Automation in CloudForms and Manage IQ, available here on the customer portal.

State machines are undoubtedly powerful – when they work start to finish.

What happens if a state machine fails before it’s complete?

Read more “Building resilient state machines with CloudForms/ManageIQ”

Automation of CloudForms appliance setup with Ansible

CloudForms ships as an appliance as a means of greatly minimising the deployment and configuration required. Whilst this deployment method removes a substantial amount of complexity by shipping with all packages and configuration needed to get a working appliance in a very short time, it isn’t entirely without human intervention.

At a minimum you will need to:

  1. Set hostname and network configuration, particularly if you wish to use a static IP address.
  2. Create a new Virtual Management Database (VMDB) and associated Region, or join an existing one.
  3. Configure encryption keys, particularly if you are joining an existing region.
  4. Set up external authentication via IPA, if your deployment method calls for it.
  5. Start the EVM server processes.

These steps can all be performed using the appliance console that ships with the appliance. Unfortunately, this menu-based interface doesn’t lend itself to automation (unless you want to get your hands dirty with expect).

If you’ve got one or two appliances that’s not a big impost. But if you’ve got 5? 10? Then we start to look at Ansible and think “I wonder if I could automate this?”

Turns out, you can!

Read more “Automation of CloudForms appliance setup with Ansible”