Test selective enabling/disabling of CPU flags in Nova

Table of Contents

Setup

My host processor is "Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz": https://ark.intel.com/content/www/us/en/ark/products/81897/intel-xeon-processor-e5-2609-v3-15m-cache-1-90-ghz.html

The hardware itself does not support Intel TSX. (Refer to annex at the bottom for `lscpu` output.)

My Compute "host" is an all-in-one DevStack (Fedora 32), running in a level-1 VM, with 'host-passthrugh'; so a Nova instance will be a nested (i.e. level-2) guest.

And Nova is running with this patch: https://review.opendev.org/c/openstack/nova/+/774240. The git describe output:

[nova] $> git describe
22.0.0-483-gd897ce2a54

Test procedure

  1. Assuming you've got default DevStack configured with:

    ENABLED_SERVICES=key,n-api,n-cpu,n-cond,n-sch,n-novnc,n-api-meta,placement-api,placement-client,g-api,g-reg,q-svc,q-dhcp,q-meta,q-agt,q-l3,horizon,rabbit,mysql
    

    And an SSH keypair generated, a CirrOS image imported to Glance)

  2. Edit nova.conf as shown in one of the several test variants
  3. Restart Nova service: sudo systemctl restart "devstack@n-cpu"
  4. Launch a "nano" instance: openstack server create test_vm1 –flavor m1.nano –key-name mykey1 –image cirros-0.5.1-x86_64-disk –net private
  5. Observe the CPU flags the guest gets

Test-1: Enable PCID; disable SSBD

nova.conf was configured with:

[libvirt]
cpu_models = Nehalem-IBRS
cpu_model_extra_flags = +pcid,-ssbd
cpu_mode = custom
virt_type = kvm

NOTE: The Nehalem-IBRS model also automatically includes tow

Resulting (live) guest XML:

[...]
<cpu mode='custom' match='exact' check='full'>
  <model fallback='forbid'>Nehalem-IBRS</model>
  <topology sockets='1' dies='1' cores='1' threads='1'/>
  <feature policy='require' name='pcid'/>
  <feature policy='disable' name=ssbd'/>     
  [...]
</cpu>
[...]

Result: Only the PCID flag is enabled ('require' in libvirt parlance), but the SSBD flag is disabled.

Test-2: Enable three flags ('md-clear', 'pcid', and 'ssbd') but disable two ('pdpe1gb' and 'mtrr')

nova.conf was configured with:

$ grep "\[libvirt\]" -A5 /etc/nova/nova-cpu.conf
[libvirt]
cpu_models = Nehalem-IBRS
cpu_model_extra_flags = +md-clear,+pcid,ssbd,-pdpe1gb,-mtrr
cpu_mode = custom
virt_type = kvm

Resulting (live) guest XML:

[...]
<cpu mode='custom' match='exact' check='full'>
  <model fallback='forbid'>Nehalem-IBRS</model>
  <topology sockets='1' dies='1' cores='1' threads='1'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='pcid'/>
  <feature policy='require' name='ssbd'/>
  <feature policy='disable' name='pdpe1gb'/>
  <feature policy='disable' name='mtrr'/>
  [...]
</cpu>

Observe:

  • The guest correctly gets (as noticed with the 'require' XML attribute) the three CPU flags: 'md-clear', 'pcid', and 'ssbd' (this one enabled was because it was specified with neither "+" nor "-" prefix – so it gets enabled)
  • But the "pdpe1gb" nor "mtrr" CPU flags are marked as 'disable'

And the QEMU command-line (notice the -cpu bit):

[...] libvirt version: 6.1.0, package: 4.fc32 (Fedora Project, 2020-06-02-17:50:10, ), qemu version: 4.2.1qemu-4.2.1-1.fc32, kernel: 5.10.13-100.fc32.x86_64, hostname: dstack-f32
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-5-instance-00000005 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000005/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000005/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-instance-00000005/.config \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-x86_64 \
-name guest=instance-00000005,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-instance-00000005/master-key.aes \
-machine pc-i440fx-4.2,accel=kvm,usb=off,dump-guest-core=off \
-cpu Nehalem-IBRS,md-clear=on,pcid=on,ssbd=on,pdpe1gb=off,mtrr=off \
-m 64 \
-overcommit mem-lock=off \
-smp 1,sockets=1,dies=1,cores=1,threads=1 \
-uuid c69e8c13-6b84-4347-8661-89ee03527af5 \
-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=c69e8c13-6b84-4347-8661-89ee03527af5,uuid=c69e8c13-6b84-4347-8661-89ee03527af5,family=Virtual Machine' \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=46,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-boot strict=on \
-blockdev '{"driver":"file","filename":"/home/stack/src/cloud/data/nova/instances/_base/8e147458643d240e4b578acf7e84b6785aa4225c","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
-blockdev '{"driver":"file","filename":"/home/stack/src/cloud/data/nova/instances/c69e8c13-6b84-4347-8661-89ee03527af5/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=50,id=hostnet0,vhost=on,vhostfd=51 \
-device virtio-net-pci,host_mtu=1450,netdev=hostnet0,id=net0,mac=fa:16:3e:91:bf:c4,bus=pci.0,addr=0x3 \
-add-fd set=3,fd=53 \
-chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on \
-device isa-serial,chardev=charserial0,id=serial0 \
-vnc 0.0.0.0:4 \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/6 (label charserial0)

Annex: `lscpu` from host (level-0) and compute node (level-1)

Baremetal host (L0)

[root@taroxhost ~]# lscpu 
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              1
Core(s) per socket:              6
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           63
Model name:                      Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz
Stepping:                        2
CPU MHz:                         1200.001
CPU max MHz:                     1900.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        3800.10
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        3 MiB
L3 cache:                        30 MiB
NUMA node0 CPU(s):               0-5
NUMA node1 CPU(s):               6-11
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc c
                                 puid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_sin
                                 gle pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d

Compute node (L1)

NB: This VM is using "host-passthrough".

[stack@dstack-f32 devstack]$ lscpu 
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   40 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           63
Model name:                      Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz
Stepping:                        2
CPU MHz:                         1899.999
BogoMIPS:                        3799.99
Virtualization:                  VT-x
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       32 KiB
L1i cache:                       32 KiB
L2 cache:                        256 KiB
L3 cache:                        15 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX flush not necessary, SMT disabled
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ss
                                 se3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsg
                                 sbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat umip md_clear arch_capabilities