Kernel panic with Packstack and AMD Ryzen on instance boot

I configured my AMD Ryzen box as a Packstack deployment today and ran into a perplexing problem. When I’d try to boot any instance – cirros, CentOS, doesn’t matter – I’d see lines like this in the log:

[    0.329569] ---[ end trace 8761dba085238f6f ]---
[    0.330876] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.332674] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

The issue ended up being the CPU model being exposed to the VM via libvirt.

A full stack trace from a failed cirros VM looks like so

[    0.269064] divide error: 0000 [#1] SMP 
[    0.270140] Modules linked in:
[    0.270948] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-28-generic #47-Ubuntu
[    0.272585] Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
[    0.274421] task: ffffffff81e11500 ti: ffffffff81e00000 task.ti: ffffffff81e00000
[    0.276575] RIP: 0010:[<ffffffff81042ea0>]  [<ffffffff81042ea0>] init_amd+0x300/0x750
[    0.278941] RSP: 0000:ffffffff81e03e88  EFLAGS: 00010246
[    0.280366] RAX: 0000000000000000 RBX: ffffffff81f33540 RCX: 0000000000000000
[    0.282089] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    0.284047] RBP: ffffffff81e03ef0 R08: 0000000000000000 R09: 0000000000000004
[    0.285984] R10: ffff88001f4a3440 R11: 000000000001ab28 R12: ffffffff81f33562
[    0.287926] R13: 0000000000000000 R14: ffffffff81e03e9c R15: 0000000000000000
[    0.289848] FS:  0000000000000000(0000) GS:ffff88001f800000(0000) knlGS:0000000000000000
[    0.292153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.293765] CR2: 00000000ffffffff CR3: 0000000001e0a000 CR4: 00000000000406b0
[    0.295678] Stack:
[    0.296389]  0000000000000001 0000034000000007 0000000000000340 ffffffff81e03ef0
[    0.298995]  ffffffff81040a84 0000080000600f20 0000000000000000 09791aa21a806920
[    0.301521]  ffffffff81f33540 ffffffff81f335c1 ffffffff82003920 ffffffff820102e0
[    0.304022] Call Trace:
[    0.304859]  [<ffffffff81040a84>] ? get_cpu_cap+0x2a4/0x310
[    0.306489]  [<ffffffff81040d32>] identify_cpu+0x242/0x3d0
[    0.307799]  [<ffffffff81f6b988>] identify_boot_cpu+0x10/0x7a
[    0.309442]  [<ffffffff81f6ba26>] check_bugs+0x9/0x2d
[    0.310822]  [<ffffffff81f5afe8>] start_kernel+0x458/0x4a2
[    0.312152]  [<ffffffff81f5a120>] ? early_idt_handler_array+0x120/0x120
[    0.313635]  [<ffffffff81f5a339>] x86_64_start_reservations+0x2a/0x2c
[    0.315128]  [<ffffffff81f5a485>] x86_64_start_kernel+0x14a/0x16d
[    0.316767] Code: 0f b6 ff f7 35 72 5a de 00 48 8b 14 cd 00 4a f3 81 89 c6 48 c7 c0 20 a0 00 00 66 89 3c 02 0f b7 83 d6 00 00 00 31 d2 44 0f af c6 <41> f7 f0 0f b6 83 d8 00 00 00 66 89 93 d6 00 00 00 31 d2 f7 f6 
[    0.326845] RIP  [<ffffffff81042ea0>] init_amd+0x300/0x750
[    0.328495]  RSP <ffffffff81e03e88>
[    0.329569] ---[ end trace 8761dba085238f6f ]---
[    0.330876] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.332674] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

After a bit of random Googling, I eventually changed the value of my libvirt.cpu_mode in nova.conf from the Packstack default of “host-model” to “host-passthrough”:

# Possible values:
# host-model - <No description provided>
# host-passthrough - <No description provided>
# custom - <No description provided>
# none - <No description provided>
#cpu_mode=<None>
cpu_mode=host-passthrough

Problem solved.

Taking a read of https://wiki.openstack.org/wiki/LibvirtXMLCPUModel (which provides an explanation of the difference between host-model and host-passthrough), we can see the following warning under host-model:

Beware, due to the way libvirt detects host CPU, CPU configuration created using host-model may not work as expected. The guest CPU may confuse guest OS (i.e. even cause a kernel panic) by using a combination of CPU features and other parameters (such as CPUID level) that don’t work.”

It seems this is what was causing the issue. host-model grabs the best match from the list in cpu_map.xml, then requests additional features from the CPU, then sends th result to the VM. In other words there’s some ‘educated guessing’ from libvirt.

Using host-passthrough gives the best performance by simply presenting the underlying CPU as-is. The caveat is that it makes it harder to migrate VMs to another compute if that compute doesn’t have exactly the same CPU configuration.

I’m not fussed since this is an all-in-one packstack solution. But if I was cobbling together compute from a variety of different architectures (e.g. Intel, AMD, plus various generations between them), that would likely cause some substantial headaches when trying to migrate VMs (i.e. it wouldn’t work).

In a situation like that the best solution is possibly to specify a more general CPU model using the libvirt.host-model parameter in nova.conf, something that all the hardware matches. You get reduced performance of the VM, but you get the ability to migrate. Perhaps something like kvm64 would be a good start to try.

Full disclosure, I’m running the EPEL mainline kernel 4.16.3, rather than the stock version that ships with RHEL 7.5, so that may have contributed to the underlying issue as well.

Leave a Reply

Your email address will not be published. Required fields are marked *