Keystone, LDAP domains, and “An Error Occurred Authenticating”

When integrating LDAP with OpenStack Keystone, you might see an error like so when you attempt to sign in with Horizon:

“An error occurred authenticating. Please try again later”

You would also see HTTP 500 response codes when attempting to use the CLI.

This has bitten me a couple of times. Three things to check:

  • If using LDAPS, is the CA for the LDAP server present in the Keystone container? To do so, ensure it’s part of the CAMap that is copied onto the host.
  • Does the password for Keystone’s LDAP user (i.e. the one it binds with to conduct searches) have any $ symbols? These are considered as replacement variables for Oslo Config, so when it attempts to read the password from /etc/keystone/domains/keystone.<domain>.conf it will trigger an exception. Escape any $ symbols in your TripleO template like so: “pa\\$sw0rd”. Note: unescaped dollar signs will cause a failure to authenticate for any domain, so even if you aren’t attempting to sign into an LDAP-backed domain, check this anyway.
  • Are the credentials for the Keystone user correct? Attempt an authenticated bind, similar to the below, to be sure. -W to prompt for password, -D to specify the distinguished name you are binding with, -H for the host with protocol, and then the search at the end:
ldapsearch -W -H ldaps://my.ldap.host -D "uid=openstack,cn=users,cn=accounts,dc=my,dc=ldap,dc=host" "uid=someuser"

IP helpers, native VLANs, and PXE boot with Satellite 6.5

I’ve started a bit of an upgrade to my home lab. Well, not sure if you’d call it an upgrade, but something different.

I’m building a cloud, based on Intel NUC hardware. Specifically I’m using this NUC, with 16GB RAM, a 120GB M.2 SSD for the OS, and a 480GB SSD as an OSD for Ceph storage.

More on that later though. This post is about my wrestling with my Cisco 3560 L3 switch, and taking a crash course in its use.

Read more “IP helpers, native VLANs, and PXE boot with Satellite 6.5”

Tips from the Trenches vol 3: openssl s_client and SNI

Need to check what TLS certificate is served by a server?

Having trouble with your certificates for your web service? Need to determine exactly what certificate is being served?

Use openssl s_client, and use it to dump the server-side certificate in a format that’s easy to read:

[agoossen@agoossen ~]$ openssl s_client -connect ajg.id.au:443 -showcerts | openssl x509 -noout -text                  
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = RapidSSL RSA CA 2018
verify return:1
depth=0 CN = www.ajg.id.au
verify return:1
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            02:5c:fe:60:69:26:9d:f5:5c:86:c7:a8:ed:42:1f:e7
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = DigiCert Inc, OU = www.digicert.com, CN = RapidSSL RSA CA 2018
        Validity
            Not Before: Dec  8 00:00:00 2018 GMT
            Not After : Feb  6 12:00:00 2020 GMT

Going through the commands here, we’re using the client available in openssl to connect to ajg.id.au on port 443. We’re then instructing it to show the certificate, and piping the result back into openssl. This time, we’re using the x509 function to dump the resulting certificate back to a text format that we can easily view.

A word of warning: if you are connecting to a host that uses virtual hosts, i.e. you serve multiple domain names from the same IP address, you may find you do not get the certificate you require. To solve this, add the -servername parameter with the domain name you are connecting to.

A good example of this gotcha is attempting to verify the certificate your TLS-passthrough OpenShift service is using. Without -servername, you will get back the certificate of the router pod – not what you want to see!

This is due to a TLS extension called Server Name Indication (SNI) – the focus of the next part of this post.

Server Name Indication – solving the TLS chicken and the egg problem

When your browser connects to a site it resolves the domain name and connects to the IP address of the server. Most web hosting providers, at least the cheap ones, carry multiple virtual hosts on a single server.

The problem is that all you have is an IP address to connect to, so the server does not know which virtual host you’re actually connecting to. The browser will pass along the Host HTTP header; this is what holds the domain you are connecting to. The server takes this, maps it to a virtual host, and serves up the content.

When you wrap that session in TLS, we now have a problem – each virtual host will, presumably, have a different TLS certificate that corresponds to its domain name. The server needs to know which virtual host you’re after so it can serve the correct certificate.

Which domain you are connecting to is contained in the HTTP Host header…except that isn’t provided to the server until the TLS session is established. So you can’t serve up the correct certificate until you get the Host header, which you won’t get until the connection is established…this is the chicken-and-the-egg problem.

To solve this problem Server Name Indication (SNI) was established as an extension to TLS. You pass along the detail of the server name you are connecting to, in the clear, as part of the TLS negotiation. The server can then use this detail to identify which virtual host you require and serve the correct certificate.

Taking our OpenShift example, the proxy service running on the router pods will need SNI when conducting TLS passthrough – otherwise, it doesn’t know which route you’re connecting to, and therefore doesn’t know which service to proxy your connection to.

You can specify the SNI hostname to openssl s_client by using the -servername parameter:

openssl s_client -connect ajg.id.au:443 -servername ajg.id.au | openssl x509 -text -noout


Using Containers for Day 2 Operations

There is no shortage of material discussing deployment applications in containers. They’re small, lightweight, and a completely self-contained environment purpose built for running your application.

Those same features – in particular the purpose-built environment – makes containers excellent for performing Day 2 operations functions. Load your tools into a container customised specifically for them, then administer your application or environment.

In this blog post I’ll show you one technique – we’ll build a container using Buildah, copying in some playbooks, run it using Podman, mounting in a vault password, then execute the playbooks.

Read more “Using Containers for Day 2 Operations”

OpenStack Queens split control plane gotcha #3 – split OVN services

So here’s the third gotcha I’ve run into. For background, I’m trying to deploy a ‘split’ OpenStack control plane in OpenStack Queens using ML2+OVN – 3x nodes running all Pacemaker-managed services, and 3x nodes running the non-Pacemaker services (e.g. Keystone, Neutron Server, etc).

The problem is that the Neutron API service – which deploys the neutron-server-ovn image – requires the NeutronCorePlugin service. This in turn runs a puppet manifest that will fail because of a missing piece of hieradata – ovn::northbound::port.

neutron-server-ovn (the OS::TripleO::Services::NeutronAPI service) needs the northbound port in order to set up the logical flows in the northbound DB. When running in a monolithic controller model this is already available, thanks to the OS::TripleO::Services::OVNDBs service.

In a split model OVNDBs is split away from NeutronAPI, necessitating a change.

Read more “OpenStack Queens split control plane gotcha #3 – split OVN services”

OpenStack Queens, Split Deployment Gotcha #2: role ordering for split control plane

The last post to this described how without the ‘primary’ and ‘controller’ tags on your role running haproxy, your overcloud_endpoint.pem certificate file is never copied to those nodes, causing haproxy startup to fail.

This post documents a second gotcha – the ordering of your split roles in roles_data.yaml determines if some of your bootstrap tasks are run.

Read more “OpenStack Queens, Split Deployment Gotcha #2: role ordering for split control plane”

OpenStack role tags: ‘primary’ and ‘controller’

You’ll see this in the roles_data.yaml file and might be wondering what they’re for. This post answers that question, but also outlines a ‘gotcha’ where the NodeTLSData resource will not be created for a role if that roles does not have the primary and controller tags set.

This applies to OpenStack Queens – in Rocky the NodeTLSData resource was changed to use Ansible for deployment of the public TLS certificate, and therefore this restriction doesn’t apply anymore.

Read more “OpenStack role tags: ‘primary’ and ‘controller’”

Applying TLS Everywhere to an existing OpenStack 13 (Queens) cloud

TLS-Everywhere was introduced in the Queens cycle to provide TLS security over pretty much all communication paths within OpenStack. Not just the public endpoints – that’s been present for a while – but also the internal endpoints, admin endpoints, RabbitMQ bus and Galera replication/connections too.

Unfortunately, out of the box you cannot apply the TLS everywhere environment files on an existing OSP13 cloud and expect it to just work. The TLS everywhere feature in Queens, and indeed Rocky, is based on the assumption that you are deploying a fresh cloud.

After some work over the last few days with some colleagues, there’s a solution to applying TLS-everywhere retrospectively on an OSP13 deployment. But be warned: it’s messy.

Read more “Applying TLS Everywhere to an existing OpenStack 13 (Queens) cloud”