First experience installing OpenStack

Over the last two days I’ve been installing OpenStack Pike, following the installation guide for self-service networks. The installation guide was pretty straightforward and everything mostly worked fine.

There’s a couple of quirks about my installation that caught me. For context, I’m following a basic installation with one controller node and one compute node. Both of these are VMs running on a single hypervisor (AMD Ryzen 5 [6 core], 64 GB RAM, 500GB SSD). My OSP VMs are therefore VMs inside VMs – nested virt – which caused the first problem

Read more “First experience installing OpenStack”

Unique Set Size now used in ManageIQ

Interesting! I didn’t see it when I wrote the post last night, but 18 days ago a commit went into ManageIQ that specifically addresses the issue of proportional set size (see here). Notably, they draw the same conclusions I did about increased PSS and the lifetime of the main server process, so it’s nice to have some confirmation I wasn’t off on a tangent!

In a nutshell, ManageIQ will now use the unique set size (USS) to calculate the memory consumption of a worker to decide whether to kill it. USS (introduced at the same time as PSS – see the Wikipedia article) is designed to give a guarantee that the returned value is physical memory that is exclusively owned by the process and will not include shared memory.

Using this new method workers will no longer be ‘penalised’ for having a large PSS (as occurs under a leaky miqserver process). The calculations will be reflective of the actual memory usage of the worker, hopefully making worker kills and restarts a more infrequent occurrence.

See also bugzilla 1479356.

Nice!

ManageIQ workers, proportional set size (PSS) and workers out of memory

At work we support an installation of multiple CloudForms appliances, where each one is customised for a particular set of roles on each appliance (e.g. web services, user interface, EMS operations, etc). It’s not uncommon to see alerts for “evm_worker_memory_exceeded” in the log files, followed by more log data that shows the worker process being terminated and re-created again.

The longer the appliance lives for, the more frequently this will tend to occur. In order to try and understand why, I took a dig through the ManageIQ (upstream CloudForms) code-base to see what’s going on. How can a small worker process possibly chew up 500MB of RAM so quickly?

Below is my theory.

Read more “ManageIQ workers, proportional set size (PSS) and workers out of memory”