# Virtualisation & Containers

This lecture will focus on running multiple operating systems on the
same physical computer. Until now, we have always assumed that the
operating system (in particular the kernel) has direct control over
physical resources. This week, we will see that this does not always
need to be the case (in fact, it is increasingly rare in production
systems). Instead, we will see that multiple operating systems may
share a single computer in a manner similar to how multiple
applications (processes) co-exist within an operating system.

We will also explore a compromise approach, known as «containers»,
where only the user-space parts of the operating system are
duplicated and isolated from each other, while the kernel remains
shared and retains direct control of the underlying machine.

│ Lecture Overview
│
│  1. Hypervisors
│  2. Containers
│  3. Management

The lecture is split into 3 parts: first part will introduce
full-blown virtualisation and the concept of a «hypervisor», while
the second part will discuss «containers». Finally, we will look at
a few topics which are common to both systems, and in some sense are
also relevant when managing networks of physical computers.

## Hypervisors

In the domain of hardware-accelerated virtualisation, a «hypervisor»
is the part of the VM software that is roughly equivalent to an
operating system kernel.

│ What is a Hypervisor
│
│  • also known as a Virtual Machine Monitor
│  • allows execution of «multiple operating systems»
│  • like a kernel that runs kernels
│  • improves «hardware utilisation»

While hypervisor itself behaves a bit like a kernel, standing as it
does between the hardware and the virtualised operating systems, the
virtualised operating systems running on top are, in a sense, like
processes (including their kernels). In particular, they are
isolated in physical memory (by using either regular MMU and a bit
of software magic, or using an MMU capable of second-level
translation) and they time-share on the available processors. 

│ Motivation
│
│  • OS-level sharing is tricky
│    ◦ «user isolation» is often «insufficient»
│    ◦ only ‹root› can install software
│  • the hypervisor/OS interface is «simple»
│    ◦ compared to OS-application interfaces

Virtualised operating systems allow a degree of autonomy that is not
usually possible when multiple users share a single operating
system. This is partially due to the simplicity of the interface
between the hypervisor and the operating system: there are no file
systems, in fact no communication between the operating systems
(other than through standard networking), no user management and so
on. Virtual machines simply bundle up some resources and make them
available to the operating system.

│ Virtualisation in General
│
│  • many resources are “virtualised”
│    ◦ physical «memory» by the MMU
│    ◦ «peripherals» by the OS
│  • makes «resource management» easier
│  • enables «isolation» of components

Operating systems (or computers, if you prefer) are of course not
the only thing that can be (or is) virtualised. If you think about
it, a lot of operating system itself is built around some sort of
virtualisation: virtual memory, file systems, network stack, device
drivers – they all, in some sense, virtualise hardware resources.
This in turn makes it possible for multiple programs, and multiple
users, to share those resources safely and fairly.

│ Hypervisor Types
│
│  • type 1: bare metal
│    ◦ standalone, microkernel-like
│  • type 2: hosted
│    ◦ runs on top of normal OS
│    ◦ usually need «kernel support»

There are two basic types of hypervisors, based on how the overall
system is layered. In type 1, the hypervisor is at the bottom of the
stack (just above hardware), and is responsible for management of
the basic resources (a bit like a simple microkernel): processor
and RAM (scheduling and memory management, respectively).

On the other hand, type 2 hypervisors run on top of an operating
system and reuse its scheduler and memory management: the virtual
machines appear as actual processes of the host system.

│ Type 1 (Bare Metal)
│
│  • IBM z/VM
│  • (Citrix) Xen
│  • Microsoft Hyper-V
│  • VMWare ESX

│     
│ Type 2 (Hosted)
│
│  • VMWare (Workstation, Player)
│  • Oracle VirtualBox
│  • Linux KVM
│  • FreeBSD bhyve
│  • OpenBSD vmm


│ History
│
│  • started with mainframe computers
│  • IBM CP/CMS: 1968
│  • IBM VM/370: 1972
│  • IBM z/VM: 2000

The first foray into running multiple operating systems on the same
hardware was made by IBM in the late 60s and was made, on big iron,
a rather standard feature soon after.

│ Desktop Virtualisation
│
│  • ‹x86› hardware lacks «virtual supervisor mode»
│  • «software-only» solutions viable since late 90s
│    ◦ Bochs: 1994
│    ◦ VMWare Workstation: 1999
│    ◦ QEMU: 2003

Small (personal) computers, for a long time, did not offer any
OS virtualisation capabilities. Performance of PC processors became
sufficient to do PC-on-PC emulation in mid-90s, but the performance
penalty was initially huge and was only suitable to run legacy
software (which was designed for much slower hardware).

│ Paravirtualisation
│
│  • introduced as VMI in 2005 by VMWare
│  • alternative approach in Xen in 2006
│  • relies on «modification» of the «guest OS»
│  • near-native speed without HW support

A decade later, VMWare has made a breakthrough in software-based
virtualisation technology, by inventing paravirtualisation: this
required modifications to the guest operating system, but by the
time, open-source operating systems were gaining a foothold – and
porting open-source systems to a paravirtualising hypervisor was not
too hard.

│ The Virtual ‹x86› Revolution
│
│  • 2005: virtualisation extensions on ‹x86›
│  • 2008: MMU virtualisation
│  • «unmodified» guest at near-native speed
│  • most «software-only» solutions became «obsolete»

Around the same time, vendors of desktop CPUs started to incorporate
virtualization extensions, which in turn made it unnecessary to
modify the guest operating system (at least in principle). By 2008,
mainstream desktop processors offered MMU virtualisation, further
simplifying x86 hypervisor design (and making it more efficient at
the same time).

│ Paravirtual Devices
│
│  • special «drivers» for «virtualised devices»
│    ◦ block storage, network, console
│    ◦ random number generator
│  • «faster» and «simpler» than emulation
│    ◦ orthogonal to CPU/MMU virtualisation

However, paravirtualisation made a quick and dramatic comeback:
while virtualisation of CPU and memory was, for the most part,
handled by the hardware itself, a hardware-based approach is not
economical for virtualisation of peripherals.

Additionally, paravirtualised peripherals do not need changes in the
guest operating system: all that is required is a quite regular
device driver that targets the respective protocol. The virtual
peripherals offered by the host system then simply appear as regular
devices through an appropriate device driver running in the guest.

│ Virtual Computers
│
│  • usually known as Virtual Machines
│  • everything in the computer is virtual
│    ◦ either via hardware (VT-x, EPT)
│    ◦ or software (QEMU, ‹virtio›, ...)
│  • much «easier to manage» than actual hardware

The entire system running under a virtualised operating system is
known as a virtual machine (or, sometimes, a virtual computer), not
to be confused with program-level VMs like the Java Virtual Machine.

│ Essential Resources
│
│  • the CPU and RAM
│  • persistent (block) storage
│  • network connection
│  • a console device

A typical virtual machine will offer at least a processor, memory,
block storage (on which the operating system will store a file
system), a network connection and a console for management. While
other peripherals are possible, they are not very common, at least
not on servers.

│ CPU Sharing
│
│  • same principle as normal «processes»
│  • there is a «scheduler» in the hypervisor
│    ◦ simpler, with different trade-offs
│  • privileged instructions are trapped

Most instructions (specifically those available to user-space
programs) are simply executed without additional overhead by the
host CPU, without direct involvement of the hypervisor. However, the
hypervisor does manage the virtualised MMU. However, just as
importantly, when the CPU encounters certain types of privileged
instructions, it will invoke the hypervisor to perform the required
actions in software.

│ RAM Sharing
│
│  • very similar to standard «paging»
│  • software (shadow paging)
│  • or hardware (second-level translation)
│  • fixed amount of RAM for each VM

Like CPU virtualisation, memory sharing is built on the same basic
principles that standard operating systems use to isolate processes
from each other. Memory is sliced into pages and the MMU does the
heavy lifting of address translation.

│ Shadow Page Tables
│
│  • the «guest» system «cannot» access the MMU
│  • set up «shadow table», invisible to the guest
│  • guest page tables are sync'd to the sPT by VMM
│  • the gPT can be made read-only to cause traps

The trap can then synchronise the gPT with the sPT, which are
translated versions of each other. The 'physical' addresses stored
in the gPT are virtual addresses of the hypervisor. The sPT stores
real physical addresses, since it is used by the real MMU.

│ Second-Level Translation
│
│  • hardware-assisted MMU virtualisation
│  • adds guest-physical to host-physical layer
│  • greatly «simplifies» the VMM
│  • also much «faster» than shadow page tables

Shadow page tables cause a lot of overhead, trapping every change of
the guest page table into the hypervisor. Unfortunately, page tables
are rearranged by the guest operating system rather often (on real
hardware, this is comparatively cheap).

However, modern processors offer another level of translation, which
is inaccessible to the guest operating system. Since the MMU is
aware of virtualisation, the guest can directly modify its page
tables, without compromising isolation of VMs from each other (and
from the hypervisor).

│ Network Sharing
│
│  • usually a paravirtualised NIC
│    ◦ transports «frames» between guest and host
│    ◦ usually connected to a «SW bridge» in the host
│    ◦ alternatives: routing, NAT
│  • a single physical NIC is used by everyone

In contemporary virtualisation solutions, networking uses a
paravirtual NIC (network interface card) which is connected to an
Ethernet tunnel pseudo-device in the host system (essentially a
virtual network interface card that handles Ethernet frames). The
frames sent on the paravirtual device appear on the virtual NIC in
the host and vice versa. The pseudo-device is then either
software-bridged to the hardware NIC (and hence to the outside
ethernet), or alternatively, routing (layer 3) is set up between the
pseudo-device and the hardware NIC.

│ Virtual Block Devices
│
│  • usually also paravirtualised
│  • often backed by normal «files»
│    ◦ maybe in a special format
│    ◦ e.g. based on «copy-on-write»
│  • but can be a real «block device»

Like networking, block storage is typically based on
paravirtualisation. In this case, the host side of the device is
either backed by a regular file in the file system of the host, or
sometimes it is backed by a block device on the same (often
virtualised, e.g. through LVM/device-mapper or similar technology,
but sometimes backed directly by a hardware block device).

│ Special Resources
│
│  • mainly useful in «desktop systems»
│  • GPU / graphics hardware
│  • audio equipment
│  • printers, scanners, ...

Now that we have covered the essentials, let's briefly look at other
classes of hardware. However, with the possible exception of compute
GPUs, peripherals are only useful on desktop systems, which are a
tiny market compared to server virtualisation.

│ PCI Passthrough
│
│  • an anti-virtualisation technology
│  • based on an IO-MMU (VT-d, AMD-Vi)
│  • a «virtual» OS can touch «real» hardware
│    ◦ only one OS at a time, of course

Let's first mention a very generic, but very un-virtualisation
method of giving hardware access to a virtual machine, that is,
exposing a PCI device to the guest operating system directly, via
IO-MMU-mapped memory. An IO-MMU must be involved, because otherwise
the guest OS could direct the hardware to overwrite physical memory
that belongs to the host, or to another VM running on the same
system. With that covered, though, there is nothing that stops the
host system from handing over control of specific PCI endpoints to a
guest (of course, the host system must not attempt to communicate
with those devices though its own drivers, else chaos would ensue).

│ GPUs and Virtualisation
│
│  • can be «assigned» (via VT-d) to a «single OS»
│  • or «time-shared» using native drivers (GVT-g)
│  • paravirtualised
│  • shared by other means (X11, SPICE, RDP)

Of course, since a GPU is attached through PCI, it can be shared
using the IO-MMU (VT-d) approach described above. However, modern
GPUs all support time-sharing (i.e. they allow contexts to be
suspended and resumed, just like threads and processes on a CPU).
For this to work, the hypervisor (or the host OS) must provide
drivers for the GPU in question, so that it can mediate access to
individual VMs.

Another solution, is paravirtualisation: the guest uses a
vendor-neutral protocol to send a command stream to the driver
running in the hypervisor, which in turn does the multiplexing. The
guest system still needs the userspace part of the GPU driver to
generate the command stream and to compile shaders.

Finally, existing network graphics protocols can be, of course, used
between a guest and the host, though they are never quite as
efficient as one of the specialised options.

│ Peripherals
│
│  • useful either via «passthrough»
│    ◦ audio, webcams, ...
│  • or «standard sharing» technology
│    ◦ network printers & scanners
│    ◦ networked audio servers

Finally, there is a wide array of peripherals that can be attached
to a PC. Some of them, like printers and scanners, and in some cases
(or rather, in some operating systems) audio hardware, can be shared
over standard networks, and hence also between guests and the host
over a virtual network. For this type of peripherals, there is
either no loss in performance (printers, scanners) or possibly a
small increase in latency (this mainly affects audio devices).

│ Peripheral Passthrough
│
│  • «virtual» PCI, USB or SATA bus
│  • «forwarding» to a real device
│    ◦ e.g. a single USB stick
│    ◦ or a single SATA drive

Of course, network-based sharing is not always practical.
Fortunately, most peripherals attach to the host system through a
handful of standard buses, which are not hard to either pass
through, or paravirtualise. The devices then appear as endpoints on
the virtual bus of the requisite type exposed to the guest operating
system.

│ Suspend & Resume
│
│  • the VM can be quite easily «stopped»
│  • the RAM of a stopped VM can be «copied»
│    ◦ e.g. to a «file» in the host filesystem
│    ◦ along with «registers» and other state
│  • and also later «loaded» and «resumed»

An important feature available in most virtualisation solutions is
the ability to suspend the execution of a VM and store its state in
a file (i.e. create an image of the running virtualised OS). Of
course this is only useful if the image can later be loaded and
resumed ‘as if nothing happened’.

On the outside, this looks rather like what happens when a laptop's
lid is closed: the computer stops (in this case to save energy) and
when it is opened again, continues where it left off. An important
difference here is that in a VM, the guest operating system does not
need to cooperate, or even be aware of the suspend/resume operation.

│ Migration Basics
│
│  • the stored state can be «sent over network»
│  • and resumed on a «different host»
│  • as long as the virtual environment is same
│  • this is known as «paused» migration

Of course, if an image can be stored in a file, it can just as well
be sent over a network. Resuming an image on a different host is
called a ‘paused’ migration, since the VM is paused for the duration
of the network transfer: depending on the size of the image, this
can be long enough to time out TCP connections or application-level
protocols. Of course, even if this does not happen, there will be a
noticeable lag for any interactive use of such a system.

Of course, the operation is predicated on the requirement that the
supporting environment «on the outside» of the VM is sufficiently
compatible between the hosts: in particular, the backing storage for
virtualised block storage, and the virtual networking infrastructure
need to match.

│ Live Migration
│
│  • uses «asynchronous» memory snapshots
│  • host copies pages and marks them read-only
│  • the snapshot is sent as it is constructed
│  • changed pages are sent at the end

Live migration is an improvement over paused migration above in that
it does not cause noticeable lag and does not endanger TCP or other
stateful connections that use timeouts to detect broken connections.

The main idea that enables live migration is that the VM can
continue to run as normal while its memory is being copied, with the
provision that any subsequent writes must be tracked by the
hypervisor: this is achieved through the standard ‘copy-on-write’
trick, where pages are marked read-only right before they are
copied, and the hypervisor traps faults. As appropriate, it allows
the write to proceed, but also marks the page as dirty. When the
initial sweep is finished, another pass is made but this time only
through dirty pages, marking them as clean.

│ Live Migration Handoff
│
│  • the VM is then paused
│  • registers and last few pages are sent
│  • the VM is «resumed» at the remote end
│  • usually within a «few milliseconds»

When the number of dirty pages is sufficiently small at the end of
an iteration, the VM is paused, the remaining dirty pages and the
CPU context are copied over and the VM is immediately resumed.
Since the last transfer is only a few hundred kilobytes, the
switchover latency is almost negligible.

│ Memory Ballooning
│
│  • how to «deallocate» “physical” memory?
│    ◦ i.e. return it to the hypervisor
│  • this is often desirable in virtualisation
│  • needs a special host/guest interface

One final consideration is that the hypervisor allocates memory to
the guest VMs on demand, but normally, operating systems don't have
a concept of ‘deallocating’ physical memory that they are not
actively using. In these circumstances, if the VM sees a spike in
memory use, this memory will be indefinitely locked by that VM, even
though it has no use for it.

A commonly employed solution is a so-called ‘memory ballooning
driver’ which runs on the guest side and returns unmapped ‘physical’
(from the point of view of the guest) memory to the host operating
system. The memory is unmapped on the host side (i.e. the content of
the memory is lost to the guest) and later mapped again if the
demand arises.

## Containers

While hardware-accelerated virtualisation is rather efficient when
it comes to CPU overhead, there are other costs associated. Some of
them can be mitigated by clever tricks (like memory ballooning,
TRIM, copy-on-write disk images, etc.) but others are harder to
eliminate. When maximal resource utilization is a requirement,
containers can often outperform full virtualisation, without
significantly compromising other aspects, like maintainability,
isolation, or security.

│ What are Containers?
│  
│  • OS-level virtualisation
│    ◦ e.g. virtualised «network stack»
│    ◦ or restricted «file system» access
│  • «not» a complete virtual computer
│  • turbocharged processes

Containers use virtualisation (in the broad sense of the word)
already built into the operating system, mainly based on processes.
This is augmented with additional separation, where groups of
processes can share, for instance, a network stack which is separate
from the network stack available to a different set of processes.
While both stacks use the same hardware, they have separate IP
addresses, separate routing tables, and so on. Likewise, access to
the file system is partitioned (e.g. with ‹chroot›), the user
mapping is separated, as are process tables.

│ Why Containers
│
│  • virtual machines take a while to boot
│  • each VM needs its «own kernel»
│    ◦ this adds up if you need many VMs
│  • easier to «share memory» efficiently
│  • easier to cut down the OS image

There are two main selling points of containers:

 1. so-called ‘provisioning speed’ – the time it takes from ‘I want
    a fresh system’ to having one booted,
 2. more efficient resource use.

Both are in large part enabled by sharing a kernel between the
containers: in the first case, there is no need to initialize (boot)
a new kernel, which saves non-negligible amount of time. For the
second point, this is even more important: within a single kernel,
containers can share files (e.g. through common mounts) and
processes across containers can still share memory – especially
executable images and shared libraries that are backed by common
files. Achieving the same effect with virtual machines is quite
impossible.

│ Kernel Sharing
│
│  • multiple containers share a «single kernel»
│  • but not user tables, process tables, ...
│  • the kernel must explicitly support this
│  • another level of «isolation» (process, user, container)

Of course, since a single kernel serves multiple containers, the
kernel in question must support an additional isolation level (on
top of processes and users), where separate containers have also
separate process tables and so on.

│ Boot Time
│
│  • a light virtual machine takes a second or two
│  • a container can take under 50ms
│  • but VMs can be suspended and resumed
│  • but dormant VMs take up a lot more space

Even discounting issues like preparation of disk images, on boot
time alone, a container can be 20 times faster than a conventional
virtual machine (discounting exokernels and similar tiny operating
systems).

│ ‹chroot›
│
│  • the mother of all container systems
│  • not very sophisticated or secure
│  • but allows multiple OS images under 1 kernel
│  • everything else is shared

The ‹chroot› system call can be (ab)used to run multiple OS images
(the user-space parts thereof, to be more specific) under a single
kernel. However, since everything besides the file system is fully
shared, we cannot really speak about containers yet.

│ ‹chroot›-based ‘Containers’
│
│  • process tables, network, etc. are shared
│  • the superuser must also be shared
│  • containers have their «own view» of the filesystem
│    ◦ including «system libraries» and «utilities»

Since the process tables, networking and other important services
are shared across the images, there is a lot of interference. For
instance, it is impossible to run two independent web servers from
two different ‹chroot› pseudo-containers, since only one can bind to
the (shared) port 80 (or 443 if you are feeling modern).

Another implication is that the role of the super-user in the
container is not contained: the ‹root› on the inside can easily
become ‹root› on the outside.

│ BSD Jails
│
│  • an evolution of the ‹chroot› container
│  • adds «user» and «process table» separation
│  • and a virtualised network stack
│    ◦ each jail can get its own IP address
│  • ‹root› in the jail has limited power

The jail mechanism on FreeBSD is an evolution of ‹chroot› that adds
what is missing: separation users, process tables and network
stacks. The jail also limits what the ‘inside’ ‹root› can do (and
prevents them from gaining privileges outside the jail). It is one
of the oldest open-source containerisation solutions.

│ Linux VServer
│
│  • like BSD jails but on Linux
│    ◦ FreeBSD jail 2000, VServer 2001
│  • not part of the mainline kernel
│  • jailed ‹root› user is partially isolated

Similar work was done on the Linux kernel a year later, but was not
accepted into the official version of the kernel and was long
distributed as a set of third-party patches.

│ Namespaces
│
│  • «visibility» compartments in the Linux kernel
│  • virtualizes common OS resources
│    ◦ the filesystem hierarchy (including mounts)
│    ◦ process tables
│    ◦ networking (IP address)

The solution that was eventually added to official Linux kernels is
based around «namespaces» which handle each aspect of
containerisation separately: when a new process is created (with a
‹fork›-like system call, called ‹clone›), the parent can specify
which aspects are to be shared with the parent, and which are to be
separated.

│ ‹cgroups›
│
│  • controls «HW resource allocation» in Linux
│  • a CPU group is a fair scheduling unit
│  • a memory group sets limits on memory use
│  • mostly orthogonal to namespaces

The other important component in Linux containers are ‘control
groups‘ which limit resource usage of a process sub-tree (which can
coincide with the process sub-tree that belongs to a single
container). This allows containers to be isolated not only with
respect to their access to OS-level objects, but also with respect
to resource consumption.

│ LXC
│
│  • mainline Linux way to do containers
│  • based on namespaces and ‹cgroups›
│  • relative newcomer (2008, 7 years after vserver)
│  • feature set similar to VServer, OpenVZ &c.

LXC is a suite of user-space tools for management of containers
based on Linux namespaces and control groups. Since version 1.0
(circa 2014), LXC also offers separation of the in-container super
user, and also unprivileged containers which can be created and
managed by regular users (limitations apply).

│ User-Mode Linux
│
│  • halfway between a container and a virtual machine
│  • an early fully paravirtualised system
│  • a Linux kernel runs as a process on another Linux
│  • integrated in Linux 2.6 in 2003

Ports of kernels ‘to themselves’ so to speak: a regime where the
kernel runs as an ordinary user-space process on top of a different
configuration the same kernel, are somewhere between containers and
full virtual machines. They rely quite heavily on paravirtualisation
techniques, although in a rather unusual fashion: since the kernel
is a standard process, it can directly access the POSIX API of the
host operating system, for instance directly sharing the host file
system.

│ DragonFlyBSD Virtual Kernels
│
│  • very similar to User-Mode Linux
│  • part of DFlyBSD since 2007
│  • uses standard ‹libc›, unlike UML
│  • paravirtual ethernet, storage and console

Another example of the same approach is known as ‘virtual kernels’
in DragonFlyBSD. In this case, the user-mode port of kernel even
uses the standard ‹libc›, just like any other program.
Unfortunately, no direct access to the host file system is possible,
making this approach closer to standard VMs.

│ User Mode Kernels
│
│  • easier to retrofit securely
│    ◦ uses existing security mechanisms
│    ◦ for the host, mostly a standard process
│  • the kernel needs to be ported though
│    ◦ analogous to a new hardware platform

When it comes to implementation effort, user-mode kernels are
simpler than containers, and offer better host-side security, since
they appear as regular processes, without special status.

│ Migration
│
│  • not widely supported, unlike in hypervisors
│  • process state is much harder to serialise
│    ◦ file descriptors, network connections &c.
│  • somewhat mitigated by fast shutdown/boot time

One major drawback of both containers and user-mode kernels is lack
of support for suspend and resume, and hence for migration. In both
cases, this comes down to the much more complex state of a process,
as opposed to a virtual machine, though the issue is considerably
more serious for containers (the user-mode kernel is often just a
single process on the host, whereas processes in containers are, in
fact, real host-side processes).

## Management

│ Disk Images
│
│  • disk image is the embodiment of the VM
│  • the virtual OS needs to be installed
│  • the image can be a simple file
│  • or a dedicated block device on the host


│ Snapshots
│
│  • making a copy of the image = snapshot
│  • can be done more efficiently: copy on write
│  • alternative to OS installation
│    ◦ make copies of the «freshly installed» image
│    ◦ and run updates after cloning the image


│ Duplication
│
│  • each image will have a copy of the system
│  • copy-on-write snapshots can help
│    ◦ most of the base system will not change
│    ◦ regression as images are updated separately
│  • block-level de-duplication is expensive


│ File Systems
│
│  • disk images contain entire file systems
│  • the virtual disk is of (apparently) fixed size
│  • sparse images: unwritten area is not stored
│  • initially only filesystem metadata is allocated


│ Overcommit
│
│  • the host can allocate more resources than it has
│  • this works as long as not many VMs reach limits
│  • enabled by sparse images and CoW snapshots
│  • also applies to available RAM


│ Thin Provisioning
│
│  • the act of obtaining resources on demand
│  • the host system can be extended as needed
│    ◦ to keep pace with growing guest demands
│  • alternatively, VMs can be migrated out
│  • improves resource utilisation


│ Configuration
│
│  • each OS has its own configuration files
│  • same methods apply as for physical networks
│    ◦ software configuration management
│  • bundled services are deployed to VMs


│ Bundling vs Sharing
│
│  • bundling makes deployment easier
│  • the bundled components have known behaviour
│  • but updates are much trickier
│  • this also prevents resource sharing


│ Security
│
│  • hypervisors have a decent track record
│    ◦ security here means protection of host from guest
│    ◦ breaking out is still possible sometimes
│  • containers are more of a mixed bag
│    ◦ many hooks are needed into the kernel


│ Updates
│
│  • each system needs to be updated separately
│    ◦ this also applies to containers
│  • blocks coming from a common ancestor are shared
│    ◦ but updating images means loss of sharing


│ Container vs VM Updates
│
│  • de-duplication may be easier in containers
│    ◦ shared file system – e.g. link farming
│  • kernel updates: containers and type 2 hypervisors
│    ◦ can be mitigated by live migration
│  • type 1 hypervisors need less downtime


│ Docker
│
│  • automated container image management
│  • mainly a service deployment tool
│  • containers share a single Linux kernel
│    ◦ the kernel itself can run in a VM
│  • rides on a wave of bundling resurgence


│ The Cloud
│
│  • public virtualisation infrastructure
│  • “someone else's computer”
│  • the guests are «not» secure against the host
│    ◦ entire memory is exposed, including secret keys
│    ◦ host compromise is fatal
│  • the host is mostly secure from the guests


│ Review Questions
│
│  41. What is a hypervisor?
│  42. What is paravirtualisation?
│  43. How are VMs suspended and migrated?
│  44. What is a container?