I’m attempting to run through the OSP10 -> OSP13 fast-forward upgrade process on my home lab. Unfortunately I kept running into an error when prepping for the fast forward upgrade:
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan.
Execution ID: 3fb62e82-9025-4057-a5ff-8e2189e42a99
Processing templates in the directory /tmp/tripleoclient-rjkHaX/tripleo-heat-templates
Unable to establish connection to https://192.168.0.162:13989/v2/action_executions: ('Connection aborted.', BadStatusLine("''",))
This error arises from httplib in the Python standard library. In short, it’s telling you that the remote end of the connection terminated without sending an HTTP status code. In this case it’s reporting an empty string.
The culprit in this instance is haproxy and its timeout for the server side of the connection. It happens because (in this case) the Mistral API is waiting for a response to a Rabbit message passed to the Mistral engine. Being a virtual undercloud running on a heavily subscribed KVM host the Mistral engine takes longer than the haproxy socket timeout to finish and return a response to the Mistral API.
haproxy then times out waiting to receive data from the server and so kills the client connection as well. The client interprets this as ‘Connection aborted’.
The solution: increase the server timeout in /etc/haproxy/haproxy.cfg. It defaults to 2m; on my deployment (KVM-based) things take longer to run, so I’ve pumped it up to 5m timeout:
timeout http-request 10s
timeout queue 2m
timeout connect 10s
timeout client 2m
timeout server 5m
timeout check 10s
Reload the haproxy service (systemctl reload haproxy) to make the config change take effect. haproxy will now wait five minutes for the server side of a connection to return data to the client before terminating the connection.
You may also need to increase the DEFAULT/rpc_response_timeout parameter in /etc/mistral/mistral.conf; mine is set to 240 seconds (default is 60 seconds).
Started Mistral Workflow tripleo.package_update.v1.package_update_plan. Execution ID: 1dafecc3-aa65-4ad9-b9fb-5fd74ef2c463 Waiting for messages on queue 'tripleo' with no timeout. 2019-01-29 10:27:19Z [DeploymentServerBlacklistDict]: CREATE_COMPLETE state changed 2019-01-29 10:27:20Z [overcloud-ServiceNetMap-a45lzxobzujw]: UPDATE_IN_PROGRESS Stack UPDATE started 2019-01-29 10:27:20Z [RabbitCookie]: UPDATE_IN_PROGRESS state changed 2019-01-29 10:27:20Z [RabbitCookie]: UPDATE_COMPLETE state changed