# Virtualisation & Containers This lecture will focus on running multiple operating systems on the same physical computer. Until now, we have always assumed that the operating system (in particular the kernel) has direct control over physical resources. This week, we will see that this does not always need to be the case (in fact, it is increasingly rare in production systems). Instead, we will see that multiple operating systems may share a single computer in a manner similar to how multiple applications (processes) co-exist within an operating system. We will also explore a compromise approach, known as «containers», where only the user-space parts of the operating system are duplicated and isolated from each other, while the kernel remains shared and retains direct control of the underlying machine. │ Lecture Overview │ │ 1. Hypervisors │ 2. Containers │ 3. Management The lecture is split into 3 parts: first part will introduce full-blown virtualisation and the concept of a «hypervisor», while the second part will discuss «containers». Finally, we will look at a few topics which are common to both systems, and in some sense are also relevant when managing networks of physical computers. ## Hypervisors In the domain of hardware-accelerated virtualisation, a «hypervisor» is the part of the VM software that is roughly equivalent to an operating system kernel. │ What is a Hypervisor │ │ • also known as a Virtual Machine Monitor │ • allows execution of «multiple operating systems» │ • like a kernel that runs kernels │ • improves «hardware utilisation» While hypervisor itself behaves a bit like a kernel, standing as it does between the hardware and the virtualised operating systems, the virtualised operating systems running on top are, in a sense, like processes (including their kernels). In particular, they are isolated in physical memory (by using either regular MMU and a bit of software magic, or using an MMU capable of second-level translation) and they time-share on the available processors. │ Motivation │ │ • OS-level sharing is tricky │ ◦ «user isolation» is often «insufficient» │ ◦ only ‹root› can install software │ • the hypervisor/OS interface is «simple» │ ◦ compared to OS-application interfaces Virtualised operating systems allow a degree of autonomy that is not usually possible when multiple users share a single operating system. This is partially due to the simplicity of the interface between the hypervisor and the operating system: there are no file systems, in fact no communication between the operating systems (other than through standard networking), no user management and so on. Virtual machines simply bundle up some resources and make them available to the operating system. │ Virtualisation in General │ │ • many resources are “virtualised” │ ◦ physical «memory» by the MMU │ ◦ «peripherals» by the OS │ • makes «resource management» easier │ • enables «isolation» of components Operating systems (or computers, if you prefer) are of course not the only thing that can be (or is) virtualised. If you think about it, a lot of operating system itself is built around some sort of virtualisation: virtual memory, file systems, network stack, device drivers – they all, in some sense, virtualise hardware resources. This in turn makes it possible for multiple programs, and multiple users, to share those resources safely and fairly. │ Hypervisor Types │ │ • type 1: bare metal │ ◦ standalone, microkernel-like │ • type 2: hosted │ ◦ runs on top of normal OS │ ◦ usually need «kernel support» There are two basic types of hypervisors, based on how the overall system is layered. In type 1, the hypervisor is at the bottom of the stack (just above hardware), and is responsible for management of the basic resources (a bit like a simple microkernel): processor and RAM (scheduling and memory management, respectively). On the other hand, type 2 hypervisors run on top of an operating system and reuse its scheduler and memory management: the virtual machines appear as actual processes of the host system. │ Type 1 (Bare Metal) │ │ • IBM z/VM │ • (Citrix) Xen │ • Microsoft Hyper-V │ • VMWare ESX │ │ Type 2 (Hosted) │ │ • VMWare (Workstation, Player) │ • Oracle VirtualBox │ • Linux KVM │ • FreeBSD bhyve │ • OpenBSD vmm │ History │ │ • started with mainframe computers │ • IBM CP/CMS: 1968 │ • IBM VM/370: 1972 │ • IBM z/VM: 2000 The first foray into running multiple operating systems on the same hardware was made by IBM in the late 60s and was made, on big iron, a rather standard feature soon after. │ Desktop Virtualisation │ │ • ‹x86› hardware lacks «virtual supervisor mode» │ • «software-only» solutions viable since late 90s │ ◦ Bochs: 1994 │ ◦ VMWare Workstation: 1999 │ ◦ QEMU: 2003 Small (personal) computers, for a long time, did not offer any OS virtualisation capabilities. Performance of PC processors became sufficient to do PC-on-PC emulation in mid-90s, but the performance penalty was initially huge and was only suitable to run legacy software (which was designed for much slower hardware). │ Paravirtualisation │ │ • introduced as VMI in 2005 by VMWare │ • alternative approach in Xen in 2006 │ • relies on «modification» of the «guest OS» │ • near-native speed without HW support A decade later, VMWare has made a breakthrough in software-based virtualisation technology, by inventing paravirtualisation: this required modifications to the guest operating system, but by the time, open-source operating systems were gaining a foothold – and porting open-source systems to a paravirtualising hypervisor was not too hard. │ The Virtual ‹x86› Revolution │ │ • 2005: virtualisation extensions on ‹x86› │ • 2008: MMU virtualisation │ • «unmodified» guest at near-native speed │ • most «software-only» solutions became «obsolete» Around the same time, vendors of desktop CPUs started to incorporate virtualization extensions, which in turn made it unnecessary to modify the guest operating system (at least in principle). By 2008, mainstream desktop processors offered MMU virtualisation, further simplifying x86 hypervisor design (and making it more efficient at the same time). │ Paravirtual Devices │ │ • special «drivers» for «virtualised devices» │ ◦ block storage, network, console │ ◦ random number generator │ • «faster» and «simpler» than emulation │ ◦ orthogonal to CPU/MMU virtualisation However, paravirtualisation made a quick and dramatic comeback: while virtualisation of CPU and memory was, for the most part, handled by the hardware itself, a hardware-based approach is not economical for virtualisation of peripherals. Additionally, paravirtualised peripherals do not need changes in the guest operating system: all that is required is a quite regular device driver that targets the respective protocol. The virtual peripherals offered by the host system then simply appear as regular devices through an appropriate device driver running in the guest. │ Virtual Computers │ │ • usually known as Virtual Machines │ • everything in the computer is virtual │ ◦ either via hardware (VT-x, EPT) │ ◦ or software (QEMU, ‹virtio›, ...) │ • much «easier to manage» than actual hardware The entire system running under a virtualised operating system is known as a virtual machine (or, sometimes, a virtual computer), not to be confused with program-level VMs like the Java Virtual Machine. │ Essential Resources │ │ • the CPU and RAM │ • persistent (block) storage │ • network connection │ • a console device A typical virtual machine will offer at least a processor, memory, block storage (on which the operating system will store a file system), a network connection and a console for management. While other peripherals are possible, they are not very common, at least not on servers. │ CPU Sharing │ │ • same principle as normal «processes» │ • there is a «scheduler» in the hypervisor │ ◦ simpler, with different trade-offs │ • privileged instructions are trapped Most instructions (specifically those available to user-space programs) are simply executed without additional overhead by the host CPU, without direct involvement of the hypervisor. However, the hypervisor does manage the virtualised MMU. However, just as importantly, when the CPU encounters certain types of privileged instructions, it will invoke the hypervisor to perform the required actions in software. │ RAM Sharing │ │ • very similar to standard «paging» │ • software (shadow paging) │ • or hardware (second-level translation) │ • fixed amount of RAM for each VM Like CPU virtualisation, memory sharing is built on the same basic principles that standard operating systems use to isolate processes from each other. Memory is sliced into pages and the MMU does the heavy lifting of address translation. │ Shadow Page Tables │ │ • the «guest» system «cannot» access the MMU │ • set up «shadow table», invisible to the guest │ • guest page tables are sync'd to the sPT by VMM │ • the gPT can be made read-only to cause traps The trap can then synchronise the gPT with the sPT, which are translated versions of each other. The 'physical' addresses stored in the gPT are virtual addresses of the hypervisor. The sPT stores real physical addresses, since it is used by the real MMU. │ Second-Level Translation │ │ • hardware-assisted MMU virtualisation │ • adds guest-physical to host-physical layer │ • greatly «simplifies» the VMM │ • also much «faster» than shadow page tables Shadow page tables cause a lot of overhead, trapping every change of the guest page table into the hypervisor. Unfortunately, page tables are rearranged by the guest operating system rather often (on real hardware, this is comparatively cheap). However, modern processors offer another level of translation, which is inaccessible to the guest operating system. Since the MMU is aware of virtualisation, the guest can directly modify its page tables, without compromising isolation of VMs from each other (and from the hypervisor). │ Network Sharing │ │ • usually a paravirtualised NIC │ ◦ transports «frames» between guest and host │ ◦ usually connected to a «SW bridge» in the host │ ◦ alternatives: routing, NAT │ • a single physical NIC is used by everyone In contemporary virtualisation solutions, networking uses a paravirtual NIC (network interface card) which is connected to an Ethernet tunnel pseudo-device in the host system (essentially a virtual network interface card that handles Ethernet frames). The frames sent on the paravirtual device appear on the virtual NIC in the host and vice versa. The pseudo-device is then either software-bridged to the hardware NIC (and hence to the outside ethernet), or alternatively, routing (layer 3) is set up between the pseudo-device and the hardware NIC. │ Virtual Block Devices │ │ • usually also paravirtualised │ • often backed by normal «files» │ ◦ maybe in a special format │ ◦ e.g. based on «copy-on-write» │ • but can be a real «block device» Like networking, block storage is typically based on paravirtualisation. In this case, the host side of the device is either backed by a regular file in the file system of the host, or sometimes it is backed by a block device on the same (often virtualised, e.g. through LVM/device-mapper or similar technology, but sometimes backed directly by a hardware block device). │ Special Resources │ │ • mainly useful in «desktop systems» │ • GPU / graphics hardware │ • audio equipment │ • printers, scanners, ... Now that we have covered the essentials, let's briefly look at other classes of hardware. However, with the possible exception of compute GPUs, peripherals are only useful on desktop systems, which are a tiny market compared to server virtualisation. │ PCI Passthrough │ │ • an anti-virtualisation technology │ • based on an IO-MMU (VT-d, AMD-Vi) │ • a «virtual» OS can touch «real» hardware │ ◦ only one OS at a time, of course Let's first mention a very generic, but very un-virtualisation method of giving hardware access to a virtual machine, that is, exposing a PCI device to the guest operating system directly, via IO-MMU-mapped memory. An IO-MMU must be involved, because otherwise the guest OS could direct the hardware to overwrite physical memory that belongs to the host, or to another VM running on the same system. With that covered, though, there is nothing that stops the host system from handing over control of specific PCI endpoints to a guest (of course, the host system must not attempt to communicate with those devices though its own drivers, else chaos would ensue). │ GPUs and Virtualisation │ │ • can be «assigned» (via VT-d) to a «single OS» │ • or «time-shared» using native drivers (GVT-g) │ • paravirtualised │ • shared by other means (X11, SPICE, RDP) Of course, since a GPU is attached through PCI, it can be shared using the IO-MMU (VT-d) approach described above. However, modern GPUs all support time-sharing (i.e. they allow contexts to be suspended and resumed, just like threads and processes on a CPU). For this to work, the hypervisor (or the host OS) must provide drivers for the GPU in question, so that it can mediate access to individual VMs. Another solution, is paravirtualisation: the guest uses a vendor-neutral protocol to send a command stream to the driver running in the hypervisor, which in turn does the multiplexing. The guest system still needs the userspace part of the GPU driver to generate the command stream and to compile shaders. Finally, existing network graphics protocols can be, of course, used between a guest and the host, though they are never quite as efficient as one of the specialised options. │ Peripherals │ │ • useful either via «passthrough» │ ◦ audio, webcams, ... │ • or «standard sharing» technology │ ◦ network printers & scanners │ ◦ networked audio servers Finally, there is a wide array of peripherals that can be attached to a PC. Some of them, like printers and scanners, and in some cases (or rather, in some operating systems) audio hardware, can be shared over standard networks, and hence also between guests and the host over a virtual network. For this type of peripherals, there is either no loss in performance (printers, scanners) or possibly a small increase in latency (this mainly affects audio devices). │ Peripheral Passthrough │ │ • «virtual» PCI, USB or SATA bus │ • «forwarding» to a real device │ ◦ e.g. a single USB stick │ ◦ or a single SATA drive Of course, network-based sharing is not always practical. Fortunately, most peripherals attach to the host system through a handful of standard buses, which are not hard to either pass through, or paravirtualise. The devices then appear as endpoints on the virtual bus of the requisite type exposed to the guest operating system. │ Suspend & Resume │ │ • the VM can be quite easily «stopped» │ • the RAM of a stopped VM can be «copied» │ ◦ e.g. to a «file» in the host filesystem │ ◦ along with «registers» and other state │ • and also later «loaded» and «resumed» An important feature available in most virtualisation solutions is the ability to suspend the execution of a VM and store its state in a file (i.e. create an image of the running virtualised OS). Of course this is only useful if the image can later be loaded and resumed ‘as if nothing happened’. On the outside, this looks rather like what happens when a laptop's lid is closed: the computer stops (in this case to save energy) and when it is opened again, continues where it left off. An important difference here is that in a VM, the guest operating system does not need to cooperate, or even be aware of the suspend/resume operation. │ Migration Basics │ │ • the stored state can be «sent over network» │ • and resumed on a «different host» │ • as long as the virtual environment is same │ • this is known as «paused» migration Of course, if an image can be stored in a file, it can just as well be sent over a network. Resuming an image on a different host is called a ‘paused’ migration, since the VM is paused for the duration of the network transfer: depending on the size of the image, this can be long enough to time out TCP connections or application-level protocols. Of course, even if this does not happen, there will be a noticeable lag for any interactive use of such a system. Of course, the operation is predicated on the requirement that the supporting environment «on the outside» of the VM is sufficiently compatible between the hosts: in particular, the backing storage for virtualised block storage, and the virtual networking infrastructure need to match. │ Live Migration │ │ • uses «asynchronous» memory snapshots │ • host copies pages and marks them read-only │ • the snapshot is sent as it is constructed │ • changed pages are sent at the end Live migration is an improvement over paused migration above in that it does not cause noticeable lag and does not endanger TCP or other stateful connections that use timeouts to detect broken connections. The main idea that enables live migration is that the VM can continue to run as normal while its memory is being copied, with the provision that any subsequent writes must be tracked by the hypervisor: this is achieved through the standard ‘copy-on-write’ trick, where pages are marked read-only right before they are copied, and the hypervisor traps faults. As appropriate, it allows the write to proceed, but also marks the page as dirty. When the initial sweep is finished, another pass is made but this time only through dirty pages, marking them as clean. │ Live Migration Handoff │ │ • the VM is then paused │ • registers and last few pages are sent │ • the VM is «resumed» at the remote end │ • usually within a «few milliseconds» When the number of dirty pages is sufficiently small at the end of an iteration, the VM is paused, the remaining dirty pages and the CPU context are copied over and the VM is immediately resumed. Since the last transfer is only a few hundred kilobytes, the switchover latency is almost negligible. │ Memory Ballooning │ │ • how to «deallocate» “physical” memory? │ ◦ i.e. return it to the hypervisor │ • this is often desirable in virtualisation │ • needs a special host/guest interface One final consideration is that the hypervisor allocates memory to the guest VMs on demand, but normally, operating systems don't have a concept of ‘deallocating’ physical memory that they are not actively using. In these circumstances, if the VM sees a spike in memory use, this memory will be indefinitely locked by that VM, even though it has no use for it. A commonly employed solution is a so-called ‘memory ballooning driver’ which runs on the guest side and returns unmapped ‘physical’ (from the point of view of the guest) memory to the host operating system. The memory is unmapped on the host side (i.e. the content of the memory is lost to the guest) and later mapped again if the demand arises. ## Containers While hardware-accelerated virtualisation is rather efficient when it comes to CPU overhead, there are other costs associated. Some of them can be mitigated by clever tricks (like memory ballooning, TRIM, copy-on-write disk images, etc.) but others are harder to eliminate. When maximal resource utilization is a requirement, containers can often outperform full virtualisation, without significantly compromising other aspects, like maintainability, isolation, or security. │ What are Containers? │ │ • OS-level virtualisation │ ◦ e.g. virtualised «network stack» │ ◦ or restricted «file system» access │ • «not» a complete virtual computer │ • turbocharged processes Containers use virtualisation (in the broad sense of the word) already built into the operating system, mainly based on processes. This is augmented with additional separation, where groups of processes can share, for instance, a network stack which is separate from the network stack available to a different set of processes. While both stacks use the same hardware, they have separate IP addresses, separate routing tables, and so on. Likewise, access to the file system is partitioned (e.g. with ‹chroot›), the user mapping is separated, as are process tables. │ Why Containers │ │ • virtual machines take a while to boot │ • each VM needs its «own kernel» │ ◦ this adds up if you need many VMs │ • easier to «share memory» efficiently │ • easier to cut down the OS image There are two main selling points of containers: 1. so-called ‘provisioning speed’ – the time it takes from ‘I want a fresh system’ to having one booted, 2. more efficient resource use. Both are in large part enabled by sharing a kernel between the containers: in the first case, there is no need to initialize (boot) a new kernel, which saves non-negligible amount of time. For the second point, this is even more important: within a single kernel, containers can share files (e.g. through common mounts) and processes across containers can still share memory – especially executable images and shared libraries that are backed by common files. Achieving the same effect with virtual machines is quite impossible. │ Kernel Sharing │ │ • multiple containers share a «single kernel» │ • but not user tables, process tables, ... │ • the kernel must explicitly support this │ • another level of «isolation» (process, user, container) Of course, since a single kernel serves multiple containers, the kernel in question must support an additional isolation level (on top of processes and users), where separate containers have also separate process tables and so on. │ Boot Time │ │ • a light virtual machine takes a second or two │ • a container can take under 50ms │ • but VMs can be suspended and resumed │ • but dormant VMs take up a lot more space Even discounting issues like preparation of disk images, on boot time alone, a container can be 20 times faster than a conventional virtual machine (discounting exokernels and similar tiny operating systems). │ ‹chroot› │ │ • the mother of all container systems │ • not very sophisticated or secure │ • but allows multiple OS images under 1 kernel │ • everything else is shared The ‹chroot› system call can be (ab)used to run multiple OS images (the user-space parts thereof, to be more specific) under a single kernel. However, since everything besides the file system is fully shared, we cannot really speak about containers yet. │ ‹chroot›-based ‘Containers’ │ │ • process tables, network, etc. are shared │ • the superuser must also be shared │ • containers have their «own view» of the filesystem │ ◦ including «system libraries» and «utilities» Since the process tables, networking and other important services are shared across the images, there is a lot of interference. For instance, it is impossible to run two independent web servers from two different ‹chroot› pseudo-containers, since only one can bind to the (shared) port 80 (or 443 if you are feeling modern). Another implication is that the role of the super-user in the container is not contained: the ‹root› on the inside can easily become ‹root› on the outside. │ BSD Jails │ │ • an evolution of the ‹chroot› container │ • adds «user» and «process table» separation │ • and a virtualised network stack │ ◦ each jail can get its own IP address │ • ‹root› in the jail has limited power The jail mechanism on FreeBSD is an evolution of ‹chroot› that adds what is missing: separation users, process tables and network stacks. The jail also limits what the ‘inside’ ‹root› can do (and prevents them from gaining privileges outside the jail). It is one of the oldest open-source containerisation solutions. │ Linux VServer │ │ • like BSD jails but on Linux │ ◦ FreeBSD jail 2000, VServer 2001 │ • not part of the mainline kernel │ • jailed ‹root› user is partially isolated Similar work was done on the Linux kernel a year later, but was not accepted into the official version of the kernel and was long distributed as a set of third-party patches. │ Namespaces │ │ • «visibility» compartments in the Linux kernel │ • virtualizes common OS resources │ ◦ the filesystem hierarchy (including mounts) │ ◦ process tables │ ◦ networking (IP address) The solution that was eventually added to official Linux kernels is based around «namespaces» which handle each aspect of containerisation separately: when a new process is created (with a ‹fork›-like system call, called ‹clone›), the parent can specify which aspects are to be shared with the parent, and which are to be separated. │ ‹cgroups› │ │ • controls «HW resource allocation» in Linux │ • a CPU group is a fair scheduling unit │ • a memory group sets limits on memory use │ • mostly orthogonal to namespaces The other important component in Linux containers are ‘control groups‘ which limit resource usage of a process sub-tree (which can coincide with the process sub-tree that belongs to a single container). This allows containers to be isolated not only with respect to their access to OS-level objects, but also with respect to resource consumption. │ LXC │ │ • mainline Linux way to do containers │ • based on namespaces and ‹cgroups› │ • relative newcomer (2008, 7 years after vserver) │ • feature set similar to VServer, OpenVZ &c. LXC is a suite of user-space tools for management of containers based on Linux namespaces and control groups. Since version 1.0 (circa 2014), LXC also offers separation of the in-container super user, and also unprivileged containers which can be created and managed by regular users (limitations apply). │ User-Mode Linux │ │ • halfway between a container and a virtual machine │ • an early fully paravirtualised system │ • a Linux kernel runs as a process on another Linux │ • integrated in Linux 2.6 in 2003 Ports of kernels ‘to themselves’ so to speak: a regime where the kernel runs as an ordinary user-space process on top of a different configuration the same kernel, are somewhere between containers and full virtual machines. They rely quite heavily on paravirtualisation techniques, although in a rather unusual fashion: since the kernel is a standard process, it can directly access the POSIX API of the host operating system, for instance directly sharing the host file system. │ DragonFlyBSD Virtual Kernels │ │ • very similar to User-Mode Linux │ • part of DFlyBSD since 2007 │ • uses standard ‹libc›, unlike UML │ • paravirtual ethernet, storage and console Another example of the same approach is known as ‘virtual kernels’ in DragonFlyBSD. In this case, the user-mode port of kernel even uses the standard ‹libc›, just like any other program. Unfortunately, no direct access to the host file system is possible, making this approach closer to standard VMs. │ User Mode Kernels │ │ • easier to retrofit securely │ ◦ uses existing security mechanisms │ ◦ for the host, mostly a standard process │ • the kernel needs to be ported though │ ◦ analogous to a new hardware platform When it comes to implementation effort, user-mode kernels are simpler than containers, and offer better host-side security, since they appear as regular processes, without special status. │ Migration │ │ • not widely supported, unlike in hypervisors │ • process state is much harder to serialise │ ◦ file descriptors, network connections &c. │ • somewhat mitigated by fast shutdown/boot time One major drawback of both containers and user-mode kernels is lack of support for suspend and resume, and hence for migration. In both cases, this comes down to the much more complex state of a process, as opposed to a virtual machine, though the issue is considerably more serious for containers (the user-mode kernel is often just a single process on the host, whereas processes in containers are, in fact, real host-side processes). ## Management │ Disk Images │ │ • disk image is the embodiment of the VM │ • the virtual OS needs to be installed │ • the image can be a simple file │ • or a dedicated block device on the host │ Snapshots │ │ • making a copy of the image = snapshot │ • can be done more efficiently: copy on write │ • alternative to OS installation │ ◦ make copies of the «freshly installed» image │ ◦ and run updates after cloning the image │ Duplication │ │ • each image will have a copy of the system │ • copy-on-write snapshots can help │ ◦ most of the base system will not change │ ◦ regression as images are updated separately │ • block-level de-duplication is expensive │ File Systems │ │ • disk images contain entire file systems │ • the virtual disk is of (apparently) fixed size │ • sparse images: unwritten area is not stored │ • initially only filesystem metadata is allocated │ Overcommit │ │ • the host can allocate more resources than it has │ • this works as long as not many VMs reach limits │ • enabled by sparse images and CoW snapshots │ • also applies to available RAM │ Thin Provisioning │ │ • the act of obtaining resources on demand │ • the host system can be extended as needed │ ◦ to keep pace with growing guest demands │ • alternatively, VMs can be migrated out │ • improves resource utilisation │ Configuration │ │ • each OS has its own configuration files │ • same methods apply as for physical networks │ ◦ software configuration management │ • bundled services are deployed to VMs │ Bundling vs Sharing │ │ • bundling makes deployment easier │ • the bundled components have known behaviour │ • but updates are much trickier │ • this also prevents resource sharing │ Security │ │ • hypervisors have a decent track record │ ◦ security here means protection of host from guest │ ◦ breaking out is still possible sometimes │ • containers are more of a mixed bag │ ◦ many hooks are needed into the kernel │ Updates │ │ • each system needs to be updated separately │ ◦ this also applies to containers │ • blocks coming from a common ancestor are shared │ ◦ but updating images means loss of sharing │ Container vs VM Updates │ │ • de-duplication may be easier in containers │ ◦ shared file system – e.g. link farming │ • kernel updates: containers and type 2 hypervisors │ ◦ can be mitigated by live migration │ • type 1 hypervisors need less downtime │ Docker │ │ • automated container image management │ • mainly a service deployment tool │ • containers share a single Linux kernel │ ◦ the kernel itself can run in a VM │ • rides on a wave of bundling resurgence │ The Cloud │ │ • public virtualisation infrastructure │ • “someone else's computer” │ • the guests are «not» secure against the host │ ◦ entire memory is exposed, including secret keys │ ◦ host compromise is fatal │ • the host is mostly secure from the guests │ Review Questions │ │ 41. What is a hypervisor? │ 42. What is paravirtualisation? │ 43. How are VMs suspended and migrated? │ 44. What is a container?