OpenStack Queens, Split Deployment Gotcha #2: role ordering for split control plane

The last post to this described how without the ‘primary’ and ‘controller’ tags on your role running haproxy, your overcloud_endpoint.pem certificate file is never copied to those nodes, causing haproxy startup to fail.

This post documents a second gotcha – the ordering of your split roles in roles_data.yaml determines if some of your bootstrap tasks are run.

The problem I ran into was that my stack deploy failed at Step 5, because my Compute never successfully connected to the Placement API. After much debugging, I discovered it was because there was nothing defined within Keystone – no services, no endpoints, no service users. It’s almost like the bootstrap tasks designed to take place in Step 3 didn’t run.

So naturally the authentication to the placement API kept failing. I could see this because I saw lines like this on my Keystone nodes:

Mar 16 14:44:18 os1-ospservices-0.os1.home.ajg.id.au os-collect-config[12343]: TASK [Write docker-puppet-tasks json files] ************************************
Mar 16 14:44:18 os1-ospservices-0.os1.home.ajg.id.au os-collect-config[12343]: skipping: [localhost] => (item={'value': [{u'puppet_tags': u'keystone_config,keystone_domain_config,

Notice the ‘skipped’. It skips when the current server’s UUID is not the UUID of the designated ‘bootstrap’ server.

The bootstrap server is determined as the first host in the ‘primary role’.

From the last post, we know that the primary role is determined by finding the last role in roles_data.yaml that has the following stanza:

  tags:
    - primary
    - controller

Here’s where the bite comes in. My OS::TripleO::Services::Keystone service is defined on my ControllerOpenstack role.

But my bootstrap node (for the purposes of running Ansible) is one of my Pacemaker nodes – which doesn’t have the Keystone service defined on it. My Pacemaker role was defined last in roles_data.yaml.

As a result the Keystone docker-puppet-tasks are never performed, meaning that all of my Keystone setup tasks never happen, and authentication fails across the board.

Now there’s a problem.

I can re-order my roles_data.yaml to place my ControllerOpenstack role last and ensure that one of those servers is selected as the Ansible bootstrap node.

The unknown is what Pacemaker-managed services might be impacted by doing this. That will require some more analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *