KVM scalability ---------------- - Big VMs need - users want to move workloads which are "big" on bare-metal into "the cloud" - (note to self: try an 80vCPU guest) - NUMA, Locking, Virtual IO - NUMA: - NUMA topology within a guest is important for: - Promoting a CPU-Memory locality - Data Partitioning - No. of NUMA nodes inside a guest directly effects the guest's performance. - Try: Kernel compile on 80vCPU guest - AutoNUMA & SchedNUMA are two solutions currently being developed AutoNUMA -- good at grouping vCPU threads & memory on 5,10 & 20-vCPU VMs SchedNUMA -- Typically majority of VM memory ends up in single host NUMA node, but vCPU is spread out. - AutoNUMA & SchedNUMA should do the automatic VCPU & memory placement. - NUMA & CPU-memory Locality - Locking - When vCPUs do not have simultaneous executin, spin time can be significantly increased. - The more vCPUs we have, the more likely there'll be lock contention. ================================================================================================= Integrated testing in QEMU --------------------------- - qtest - all vcpu <-> hardware communicatino happens over a domain socket (pio, mmio, interrupts supported today; extensible to hypercalls; commands to control vm_clock progress) - simple line-based protocol. - test case runs as a separate process - written in C compiled natively - Full access to libc - qtest uses gtester (a test framework for glib) - - libqos - miniOS library for writing qtest cases - More complicated devices require interaction with PCI bus & APIC - qemu test ================================================================================================= Spice status update ------------------- - Open remote computing. - Showed a demo of: - using a webcam inside the guest - Multiple monitors inside the guest ================================================================================================= New features in libguestfs & virt-tools --------------------------------------- - virtio-scsi (allows to add 255 devices) - uses 'libvirt' to invoke libguestfs - sVirt/SELinux - Further work related to - snapshot=on (finer control) - faster 'qemu-img' create - safer qemu-img - TRIM/SCSI UNMAP from guest to host - Nested vmx - More detailed 9pfs configuration ================================================================================================= Application sandboxes on top of KVM or LXC using libvirt --------------------------------------- ================================================================================================= day2: (thurs) ----- kvm & memory management updates -------------------------------- - EPT Accesssed & Dirty Bits - 1GB hugepages - Interface to hugetlbfs; - Balloon vs Transparent Huge Pages - Memory Overcommit - Automatic Ballooning - Ballooning & QOS ================================================================================================= A block layer overview ---------------------- - virtual devices - ide, virtio-blk - Backend - Block drivers - raw, qcow2 - file, nbd, iscsi, gluster - I/O throttling, copy on read - Block jobs - streaming, mirroring, commit - Virtual Devices - IDE, - AHCI, - SCSI - Floppy, CD-ROM,USB Storage - Paravirt Devices - Virtio-blk - Virtio-scsi (uses SCSI command set) - SCSI passthrough (scsi-generic/scsi-block) - Needs a real block device, not just an image file qemu -device ide-drive,help question: - qemu-nbd? when we have guestmount Image format performance: - Pain point of image snapshots is initial writes. (cluster allocation) Backing storage - file: nfs, local storage - block device: whole disk or partition, logical volume, external implementation of iscsi, NBD... - nbd, glusterfs - Direct KVM/QEMU implementation is faster than FUSE on host Cache options - writeback, none, writethrough, directsync - default mode is 'writeback' (since 1.2) - writecache enabled(WCE) is safe for correct guest OS - cache=unsafe (ex:for installation) Host page cache: - usually doesn't make sense: as guest already has a page cache; data would be duplicated - waste of memory - makes sense if many guests sharing a single host cache - short lived-guests - MUST bypass for host page cache for live migration AIO mode - --drive aio=threads (performs better on file systems) ; - aio=native (performs better on block device) - blockdev will enable driver-specific command line options Block jobs: - Long running background jobs on block devices - Live storage migration - Deleting/merging external snapshots - Started & controlled using monitor commands - block-stream; blockjob-complete; block-job-cancel; etc... Image streaming: - Pulls data from backing files into active layer. - Usecases: copy an image from a slow source in the background while running the vm; delete top-most external snapshots Live Commit: - Apply delta to a bcking file Image mirroring: - Live storage migration - Full chain or only active layer Builtin NBD server - Allows storage migration w/o shared storage - Destination QEMYU starts NBD server - Source QEMU mirrors its image using an NBD connection ================================================================================================= QEMU Live Block operations: Snapshots, merging & Mirroring ---------------------------------------------------------- ARM virt --------- - KVM virtualizes CPU & Memory - QEMU virtualizes platform & devices VGIC & Timer - I/O Virtualization -- All I/O is MMIO Openstack: ---------- Details: Boot: kernel+initrd, HD or CDROM CPU config: host-model, custom, default Disks: root, ephemeral,swap,config drive qcow2,virtio Volumes iscsi,RBD,Sheepdog,Netapp Multiple NICs virtio,priviate network,filtering GNOME Boxes: ----------- libosinfo; SPICE A nice illustration of git commits.