OpenStack, Identity Management/IPA and TLS-Everywhere

novajoin is a WSGI application that serves dynamic vendordata to overcloud nodes (and instances, if you wish) via the cloud-init process. it’s purpose is to register a host to an IPA server and create any necessary services in IPA so certificates for them can be created on the hosts.

Thie purpose of this host is to describe how the “TLS Everywhere” functionality in OSP13 onwards operates. In particular, I wanted to answer these questions:

  • What does novajoin do?
  • How and when does novajoin register hosts in IPA?
  • What changes does novajoin make to IPA?
  • When does the host enrol to IPA, and how does it get its configuration?
Read more “OpenStack, Identity Management/IPA and TLS-Everywhere”

Improving CloudForms VMDB failover with keepalived and a virtual IP

Out of the box CloudForms comes with the ability to deploy PostgreSQL appliances that can be configured into a primary/standby relationship. If the primary fails, the standby takes over automatically.

Your non-database appliances are hardcoded to reference the primary via it’s IP address. Unfortunately, when the primary fails over to a standby this IP has changed but your appliances aren’t immediately aware. A watchdog service running on each appliance keeps an eye on the database and identifies when the primary has failed over. After a set period of time the watchdog updates the hardcoded database IP to the new primary and then restarts your evmserverd process to make the change take effect.

This occurs on every non-database appliance and so a primary failover event means an unavoidable outage across your entire region. Not good. But what if we could at least reduce the outage duration, perhaps by avoiding the restart of your main CloudForms service?

This post discusses one technique that doesn’t require CloudForms service restarts – use a virtual IP for your database. This VIP will live on the database that is the current primary and move when the role of primary fails over. With no more need to restart your CloudForms services recovery time from failover events is substantially reduced.

Read more “Improving CloudForms VMDB failover with keepalived and a virtual IP”

ConnectionError: (‘Connection aborted.’, BadStatusLine(“””)) – OpenStack

I’m attempting to run through the OSP10 -> OSP13 fast-forward upgrade process on my home lab. Unfortunately I kept running into an error when prepping for the fast forward upgrade:

Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. 
Execution ID: 3fb62e82-9025-4057-a5ff-8e2189e42a99
Plan updated.
Processing templates in the directory /tmp/tripleoclient-rjkHaX/tripleo-heat-templates
Unable to establish connection to https://192.168.0.162:13989/v2/action_executions: ('Connection aborted.', BadStatusLine("''",))

This error arises from httplib in the Python standard library. In short, it’s telling you that the remote end of the connection terminated without sending an HTTP status code. In this case it’s reporting an empty string.

Read more “ConnectionError: (‘Connection aborted.’, BadStatusLine(“””)) – OpenStack”

Neutron security groups and OVS, Part 3: tracing OVS’s hooks and claws…

So far we’ve looked at:

  • What tap interfaces are and why the VMs require them for network connectivity. (Part 1)
  • How security groups are implemented as iptables rules. (Part 2)
  • The implementation detail that iptables is just a front-end to the netfilter framework within the kernel, a framework that operates at layer 3.

None of that explains why we need the linux bridge in the middle, however.

Read more “Neutron security groups and OVS, Part 3: tracing OVS’s hooks and claws…”

Neutron security groups and OVS, part 2: security groups implementation

Security groups provide IP traffic filtering for your VM instances. You can specify ingress and egress rules and filter traffic based on port, source address, destination address, etc. Here’s a shot from my lab, with some basic security group rules assigned to my demo project (click to enlarge):

Under the hood, when using the iptables_hybrid firewall driver, these are all implemented as iptables rules on every compute node where an instance is running with this security group assigned.

Read more “Neutron security groups and OVS, part 2: security groups implementation”

Neutron security groups and OVS, part 1: tap interfaces and VM connectivity

Today I did a little digging into the implementation of security groups when using OpenVSwitch. In particular, I was curious about this: why is it that security groups require the creation of a linux bridge on the compute node? Why can’t we just attach the VM directly to the OVS integration bridge (br-int) and set iptables rules on the VM interface like we otherwise would?

This applies when using the iptables_hybrid firewall driver for Neutron with the ML2+OVS subsystem. If you use the openvswitch firewall driver, these firewall rules are implemented entirely by OpenFlow rules that use the conntrack module in the Kernel.

This was originally going to be one post but I ended up rambling on for so long I opted to split it into a few related posts. This is the first!

Read more “Neutron security groups and OVS, part 1: tap interfaces and VM connectivity”

OpenStack Heat vs Ansible

You’ve almost certainly heard of Ansible – the uber-simple IT automation engine developed by Red Hat. Perhaps you’ve also heard of OpenStack Heat, the orchestration engine built into the OpenStack platform.

In this post I’m going to try and summarise the major differences between these two technologies (and there are many). Mostly, however, I’m aiming to show how these two fantastic technologies can be combined to enable powerful, flawless orchestration and configuration of infrastructure deployed on OpenStack.

Read more “OpenStack Heat vs Ansible”

OSP13: clean up old images in a local container registry

Red Hat OpenStack Platform 13 (upstream Queens) is a fully containerised solution, meaning that all of its components are deployed as containers, rather than traditional RPM-based packages.

You have a few options for how you obtain these container images – you can point directly at the Red Hat Container Catalog, you can point to a container registry elsewhere in your environment, or you can create and use a registry on the undercloud. All of these options are covered in the documentation, but for this post I’m assuming you use a local registry on the undercloud.

As you update the overcloud and new container versions arrive, older versions remain in the registry consuming valuable disk space on the undercloud. Better to clean out older versions of images once your updates are successful! Unfortunately there’s no simple method to remove old images, so (with a little help from some Googling) I’ve developed a simple script to do just that.

Read more “OSP13: clean up old images in a local container registry”

Rebooting a host with Ansible using reboot hints

The need to configure a host, reboot it if needed, then wait for it return, is an extremely common pattern in Ansible – so common it will (finally!) become theĀ reboot module in Ansible 2.7.

For those of us using Ansible 2.6 or earlier, we need a way to reboot a host. There’s no shortage of suggestions out there and mine below hopes to add to those. The method I propose will:

  1. Verify if a reboot is required (using /bin/needs-restarting – CentOS/RHEL only).
  2. If necessary, reboot the host.
  3. Wait for SSH to disappear, meaning the host has progressed far into its reboot process.
  4. Wait for SSH to return, meaning the host is alive and ready for the play to continue.

Read more “Rebooting a host with Ansible using reboot hints”