# Concurrency and Locking


This lecture will deal with the issues that arise from running multiple threads
and processes at the same time, both using time-sharing of a single processor
and by executing on multiple physical CPU cores.


│ Lecture Overview
│
│  1. Inter-Process Communication
│  2. Synchronisation
│  3. Deadlocks


In the first part, we will explore the basic why's and how's of inter-process
and inter-thread communication. This will naturally lead to questions about
shared resources, and to the topic of thread synchronisation, mutual exclusion
and so on. Finally, we will deal with waiting and deadlocks, which arise
whenever multiple threads can wait for each other.

│ What is Concurrency?
│
│  • events that can happen at the same time
│  • it is not important if it «does», only that it «can»
│  • events can be given a «happens-before» partial order
│  • they are «concurrent» if unordered by «happens-before»

Generally speaking, an event «happens-before» another event when
they are causally connected: the first event makes the other event
possible, or directly causes it. However, just as often, events are
«not» causally connected: they might happen in any order. We say
that such events are «concurrent»: they can happen in either order,
or they can happen, figuratively speaking, at the same time.

│ Why Concurrency?
│
│  • problem decomposition
│    ◦ different «tasks» can be largely independent
│  • reflecting external concurrency
│    ◦ serving «multiple clients» at once
│  • performance and hardware limitations
│    ◦ «higher throughput» on multicore computers

Concurrency arises, in software systems, for a number of reasons:
first, and perhaps most important, is problem decomposition: it is
much easier to design a large system without explicitly ordering all
possible events. A complex system will perform a large number of
naturally independent – concurrent -- tasks. While it would be
certainly possible to impose an artificial order on such tasks, it
is not a very practical approach.

Another important reason is that the concurrency came from outside:
if there are external events that are concurrent, it is hard to
imagine that the responses of the system would be strictly ordered:
normally, the actual order of concurrent, external events cannot be
predicted, and imposing an order on reactions would mean that at
least some of the reactions would be unacceptably delayed.

Finally, concurrency in software allows the underlying hardware to
perform better: concurrent instructions can be executed in parallel,
without much concern about their relative ordering – which allows
the hardware to use its limited resources more efficiently.

│ Parallel Hardware
│
│  • hardware is inherently parallel
│  • software is inherently sequential
│  • something has to give
│    ◦ hint: it's not going to be hardware

Unlike software, which usually consists of a «sequence» of
instructions that are executed in order, hardware consists of
spatially organized circuits. Most of the circuitry can operate
independently of other circuits. For instance, a typical CPU will
contain circuits for multiplying numbers, and other circuits for
adding numbers. Those circuits are almost entirely independent, and
can operate at the same time, on different pairs of numbers. If the
processor is adding numbers, it could be multiplying some other
numbers at the same time, without any ill effect on the process of
addition happening in another part of the processor.

Of course, multi-core processors are an extreme manifestation of
this: the separate cores are simply (mostly) independent copies of
the same (very complicated) circuit.

Since we cannot make sequential execution much faster than it
already is, the best we can hope for is leveraging concurrency to
execute as many things in parallel (on parallel hardware) as we can.


## Inter-Process Communication

Communication is an important part of all but the most trivial
software. While the mechanisms described in this section are
traditionally known as inter-«process» communication (IPC), we will
also consider cases where threads of a single process use those
mechanisms (and will not use a special name, even though the
communication is, in fact, «intra»-process in those cases).


│ Reminder: What is a Thread
│
│  • thread is a «sequence» of instructions
│  • each instruction «happens-before» the next
│    ◦ or: happens-before is a total order on the thread
│  • basic unit of «scheduling»

Before we go any further, we need to recall, from the last lecture,
that a thread is a sequence of instructions: as such, each
instruction «happens before» the next instruction (for the more
mathematically inclined, this simply means that on the instructions
of a single thread, happens-before is a linear ordering).
Additionally, threads are a basic unit of scheduling: each processor
core is running, at any given time, a single thread.

│ Reminder: What is a Process
│
│  • the basic unit of «resource ownership»
│    ◦ primarily «memory», but also open files &c.
│  • may contain one or more «threads»
│  • processes are «isolated» from each other
│    ◦ IPC creates gaps in that isolation

A process, then, is a unit of resource ownership – processes own
memory (virtual address spaces), open files, network connections and
so on. Each process has at least one (but possibly multiple)
threads, which carry out the computation. Since each process is
isolated in its own virtual address space, there is no direct way
that threads from different processes can interact (communicate,
synchronize). However, some degree of communication is desirable,
and required: the operating system therefore provides a few
primitives, which allow processes to talk to each other, creating
controlled gaps in the isolation.

│ I/O vs Communication
│
│  • take standard input and output
│    ◦ imagine process A «writes» a file
│    ◦ later, process B «reads» that file
│  • «communication» happens in «real time»
│    ◦ between two running threads / processes
│    ◦ automatic: without user intervention

Another term that we want to delineate is that of «communication»:
in particular, we need to differentiate it from ‘offline’ input and
output. Technically speaking, reading a file from disk is
communication, because some program wrote that file earlier.
However, this is not the kind of thing that we are interested in: we
are only interested in the instances that happen in real time. More
formally, there should be alternation of actions from two threads,
ordered by happens-before, i.e. at least one instruction of thread A
is happens-before-sandwiched between instructions of thread B.

│ Direction
│
│  • «bidirectional» communication is typical
│    ◦ this is analogous to a conversation
│  • but unidirectional communication also makes sense
│    ◦ e.g. sending commands to a child process
│    ◦ do acknowledgments count as communication?

Strictly speaking, using the above definition, all communication is
bidirectional. However, truly unidirectional communication is quite
rare: even semantically unidirectional communication often includes
an acknowledgement or other form of reply, satisfying the definition
above.

│ Communication Example
│
│  • «network services» are a typical example
│  • take a web server and a web browser
│  • the browser «sends a request» for a web page
│  • the server «responds» by sending data

An intuitive, if technically complicated, example of communication
would be the interaction of a web browser with a web server. In this
case, the browser will use a network connection to send a request,
which will be processed by the server. Finally, the server will send
a reply, for which the browser is waiting: it is easy to see the
causal connections: the request happens-before the reply which
happens-before the browser showing the content to the user.

│ Files
│
│  • it is possible to communicate through «files»
│  • multiple processes can open the «same file»
│  • one can write data and another can process it
│    ◦ the original program picks up the results
│    ◦ typical when using «programs as modules»

Multiple programs (processes) can open the same file, and read or
write data into it at the same time. With a little care, this can be
done safely. It is not hard to imagine, then, that the two programs
could use this capability for communication, as defined above: one
of the programs writes some data on at a particular offset, the
other program reads them and perhaps confirms this, possibly using a
different mechanism.

│ A File-Based IPC Example
│
│  • files are used e.g. when you run ‹cc file.c›
│    ◦ it first runs a preprocessor: ‹cpp -o file.i file.c›
│    ◦ then the compiler proper: ‹cc1 -o file.o file.i›
│    ◦ and finally a linker: ‹ld file.o crt.o -lc›
│  • the «intermediate» files may be hidden in ‹/tmp›
│    ◦ and deleted when the task is completed

As an example, even if a little specific, let's recall how the
compiler driver communicates with the individual compilation stages
(the compilation process has been covered in the second lecture, in
case you need to review it).

Each step looks the same: the driver arranges an intermediate form
of the program to be written to a file, then invokes a subprocess
which reads that file and writes its output into another file.
Clearly, the invocation of next stage «happens after» the output
of the previous has been written.

│ Directories
│
│  • communication by «placing» files or links
│  • typical use: a «spool» directory
│    ◦ clients drop files into the directory for processing
│    ◦ a server periodically «picks up» files from there
│  • used for e.g. «printing» and «email»

Another approach for communication through the file system leverages
directories: a daemon sets up a directory, into which its clients
can drop files. These files are then picked up and processed by the
daemon, and then unlinked from the directory.

This approach is often seen in print spooling: the client program
drops a file (in suitable format) that it wants printed into a
«spool directory», where it is picked up by the printer daemon,
which arranges for the file to be printed and removed from the
queue.

Email systems use an analogous process, where mail is either dropped
into spool directories as part of internal processing (most mail
servers are built from multiple services connected by IPC), or for
pickup by individual users.

│ Pipes
│
│  • a device for moving «bytes» in a «stream»
│    ◦ note the difference from messages
│  • one process «writes», the other «reads»
│  • the reader «blocks» if the pipe is «empty»
│  • the writer «blocks» if the pipe buffer is «full»

We have already talked about pipes earlier: recall that pipes move
bytes from one process to another, using the standard file system
API: data is sent using ‹write› and received using ‹read›.

An important consideration is that each pipe has a (finite) buffer
attached to it: when the buffer is full, a process trying to ‹write›
data will be blocked, until the other process issues a ‹read›,
removing some of the data from the buffer. Likewise, when the buffer
is empty, attempting to ‹read› from the pipe will block, until the
other side issues a ‹write› that and hence provides some data for
‹read› to return.

│ UNIX and Pipes
│
│  • pipes are used extensively in UNIX
│  • «pipelines» built via the shell's ‹|› operator
│  • e.g. ‹ls | grep hello.c›
│  • most useful for processing data in «stages»

Especially in the case of user-setup pipes (via shell pipelines),
the boundary between IPC and “standard IO” is rather blurry.
Programs in UNIX are often written in a style, where they read
input, process it and write the result as their output. This makes
them amenable for use in pipelines. While pipes are arguably “more
automatic” than dropping the output in a file and running another
command to process the file, there is also a degree of manual
intervention. Another way to look at the difference may be that a
pipe is primarily an IPC mechanism, while a file is primarily a
storage mechanism.

│ Sockets
│
│  • similar to, but «more capable» than pipes
│  • allows one «server» to talk to many clients
│  • each «connection» acts like a bidirectional pipe
│  • could be local but also connected via a «network»

You might remember that sockets are pipe-like devices, but more
powerful (and more complicated). A socket allows a single server to
open communication channels to multiple clients at the same time
(possibly over a network), providing a bidirectional channel between
each of the clients and the server.

Additionally, sockets also come in a ‘datagram’ flavour, in which
case they do not establish connections and do not behave like pipes
at all: instead, in this mode, they can be used to send messages.

│ Shared Memory
│
│  • memory is «shared» when multiple threads can access it
│    ◦ happens naturally for «threads» of a single process
│    ◦ the primary means of inter-thread communication
│  • many processes can map the same physical location
│    ◦ this is the more traditional setting
│    ◦ hence also allows inter-«process» communication

Shared memory is, in some sense, the most straightforward
communication mechanism possible: memory access is, after all, a
very common operation in a program. Among threads of the same
process, all memory is ‘shared’ in the sense that all threads can
read and write all memory (since they share the same page table). Of
course, in most programs, only a small, carefully delineated part of
the memory is used for exchanging data between threads.

Additionally, processes can set up a piece of shared memory between
them. However, since there is no shared virtual address space,
each process may have the shared block of memory mapped at different
virtual address (let's call the relevant virtual addresses A and B).
However, there is only one block of physical memory involved: the
shared memory resides at the physical address P. Inter-process
shared memory is then set up in such a way, that for the first
process, A translates to P and in the second process, B translates
to P.

  ┌─┬─┬─┬─┬─┐
  │ │ │A│ │B│  process 1
  └─┴─┴─┴─┴─┘
   │   ╰───╮   
   ▼       ▼
  ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
  │ │ │ │ │P│ │ │ │ │ │ │  physical memory
  └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
     ▲     ▲    ▲
   ╭─╯ ╭───┼────╯
  ┌─┬─┬─┬─┬─┐
  │ │ │A│ │B│  process 2
  └─┴─┴─┴─┴─┘

The address B may or may not be a valid address in the first
process, and the same holds for A in the second process.

│ Message Passing
│
│  • communication using discrete «messages»
│  • we may or may not care about «delivery order»
│  • we can decide to tolerate «message loss»
│  • often used across a «network»
│  • can be implemented on top of sockets

While it is hard to beat the efficiency of shared memory as a
communication mechanism, message passing has other advantages: first
of all, it is much safer (communication through shared memory can be
quite tricky to get right) and, for some use cases, considerably
easier to use. Moreover, message passing can be done over networks,
which makes it especially suitable for distributed systems.


## Synchronisation

In this section, we will deal with a specific form of communication,
where no (application) data is exchanged. The purpose of
synchronisation primitives is to ensure that actions of multiple
threads or processes proceed in a correct order (of which, there are
usually many). Our main motivation will be prevention of «race
conditions», which are, essentially, the «wrong orderings» of
«concurrent actions» (for a suitable value of ‘wrong’).

│ Shared Variables
│
│  • structured view of «shared memory»
│  • typical in «multi-threaded» programs
│  • e.g. any «global» variable in a program
│  • but may also live in memory from ‹malloc›

Before we proceed to talk about synchronisation proper, we will have
a look at a proper communication device, which we will use as an
example: a «shared variable» is simply a named piece of shared
memory, available in multiple threads under the same name.

│ Shared Heap Variable
│
│     void *thread( int *x ) { *x = 7; } /* C */
│     int main()
│     {
│         pthread_t id;
│         int *x = malloc( sizeof( int ) );
│         pthread_create( &id, NULL, thread, x );
│     }

It is also possible to have anonymous shared variables, allocated
dynamically on the heap. They are accessed through a pointer.

│ Race Condition: Example
│
│  • consider a «shared counter», ‹i›
│  • and the following «two threads»
│
│     int i = 0; /* C */
│     void thread1() { i = i + 1; }
│     void thread2() { i = i - 1; }
│
│ What is the value of ‹i› after both finish?

Before we attempt to define a race condition, let's look at an
example. In the above program, two threads modify the same shared
variable: one adds, and another subtracts, one. Try to think about
the possible outcomes, before you look ahead.

│ Race on a Variable
│
│  • memory access is «not» atomic
│  • take ‹i = i + 1› / ‹i = i - 1›
│
│     a₀ ← load i  | b₀ ← load i
│     a₁ ← a₀ + 1  | b₁ ← b₀ - 1
│     store a₁ i   | store b₁ i

To understand why the innocent-looking code from the previous slide
fails, we need to look at a low-level description of the program (at
the level of, essentially, machine instructions). Notice that each
of the two statement translates into three separate instructions:
load a value from memory, perform the arithmetic operation, and
store the result back into memory. At this level of detail, it
should be easy to see how the instructions from the two threads can
be ordered to give a wrong (or at least unexpected) result, without
violating the happens-before relation (remember that instructions of
each thread separately are linearly ordered).

│ Critical Section
│
│  • any section of code that must «not» be «interrupted»
│  • the statement ‹x = x + 1› «could» be a critical section
│  • what is a critical section is «domain-dependent»
│    ◦ another example could be a bank transaction
│    ◦ or an insertion of an element into a linked list

If we want the result of the above program to be always 0, we must
demand that each of the two statements is an instruction sequence
that belongs to a common «critical section»: that is, a span of code
that must not be interrupted by any instruction that belongs
to the same critical section. In this case, this means that once
either of the ‹load› instructions starts executing, the thread which
did so must first get all the way to the corresponding ‹store›
before the other thread can execute its ‹load›.

Critical sections do not modify the happens-before relation: the
critical sections, as units, can happen in either order (they are
still concurrent). However, the entirety of the critical section
behaves, with regard to other instances of the same critical
section, as a single atomic instruction.

│ Race Condition: Definition
│
│  • (anomalous) behaviour that «depends on timing»
│  • typically among «multiple threads» or processes
│  • an «unexpected sequence» of events happens
│  • recall that ordering is not guaranteed

We can now attempt a definition: a «race condition» is a behaviour
which

 1. depends on timing (in the sense that it depends on a specific
    ordering of concurrent events),
 2. was not anticipated by the programmer.

With the above definition, a race condition can be benign, and the
term is often used in this sense. In our study of the phenomenon,
however, it is more useful to also include a third clause:

 3. the behaviour is erroneous.

Race conditions often lurk in places where there is a lot of
concurrency: threaded programs are a prime example (almost all
instructions are concurrent with a large number of other
instructions coming from other threads).

However, it is important to keep in mind, that some form of
communication is necessary for a race condition to arise –
concurrency alone is not sufficient. This is because «completely
concurrent» processes cannot influence each other, and hence their
behaviour cannot depend on the particular ordering of the events
that are concurrent among them.

│ Races in a Filesystem
│
│  • the file system is also a «shared resource»
│  • and as such, prone to race conditions
│  • e.g. two threads both try to create the «same file»
│    ◦ what happens if they both succeed?
│    ◦ if both write data, the result will be garbled

Since file system is a resource that is shared by otherwise
unrelated processes, it creates an environment prone to race
conditions: there is ample concurrency, but also a lot of (sometimes
accidental) communication.

│ Mutual Exclusion
│
│  • context: only one thread can access a resource at once
│  • ensured by a «mutual exclusion device» (a.k.a «mutex»)
│  • a mutex has 2 operations: ‹lock› and ‹unlock›
│  • ‹lock› may need to wait until another thread ‹unlocks›

Now that we have basic understanding of the problems associated with
concurrency, let's look at some of the available remedies. The
first, and in some sense simplest synchronisation primitive is a
«mutual exclusion device», designed to enforce «critical sections».

It is often the case that the critical section is associated with a
«resource»: that is, all the code between each «acquisition» and the
corresponding «release» of the resource is part of a single critical
section. This essentially means that one of the steps of the
acquisition process is «locking a mutex» and, analogously, the mutex
is «unlocked» as part of the release process associated with the
resource.

│ Semaphore
│
│  • somewhat «more general» than a mutex
│  • allows «multiple» interchangeable «instances of a resource»
│    ◦ consider there are N identical printers
│    ◦ then N processes can be printing at any given time
│  • basically an atomic counter

Semaphores are, in a sense, a generalisation of a mutex – however,
they are only useful when dealing with resources (i.e. it can only
be used to guard a general critical section in the case it is
exactly equivalent to a mutex). The difference between a mutex and a
semaphore is that multiple threads can enter the section of code
guarded by a semaphore, but the number is capped by a constant. If
the constant is set to 1, then the semaphore is the same as a mutex.

│ Monitors
│
│  • a programming «language» device (not OS-provided)
│  • internally uses standard «mutual exclusion»
│  • data of the monitor is only accessible to its methods
│  • only «one thread» can enter the monitor at once

Essentially, a monitor is a fully encapsulated «object» (as defined
in object oriented programming), with one additional restriction:
each instance of a monitor has a critical section associated with
it, and each of its method is, in its entirety, part of this
critical section.

As long each public method leaves the object in a consistent state
(which is normally required of all objects), a monitor is
automatically thread-safe, that is, multiple threads can call its
methods without risking race conditions.

│ Condition Variables
│
│  • what if the monitor needs to «wait» for something?
│  • imagine a bounded queue implemented as a monitor
│    ◦ what happens if it becomes «full»?
│    ◦ the writer must be «suspended»
│  • condition variables have ‹wait› and ‹signal› operations

Since critical sections are often associated with communication, it
may happen that code currently in a critical section cannot proceed
until some other thread performs a particular action. This use case
is served by another synchronisation device, known as a «condition
variable»: the variable can be «waited for», which blocks the
calling thread, and «signalled», which wakes up the waiting thread
and allows it to proceed.

│ Spinlocks
│
│  • a «spinlock» is the simplest form of a «mutex»
│  • the ‹lock› method repeatedly tries to acquire the lock
│    ◦ this means it is taking up «processor time»
│    ◦ also known as «busy waiting»
│  • spinlocks contention on the «same CPU» is very «bad»
│    ◦ but can be very efficient «between» CPUs

A spinlock is a particularly simple «implementation» of the abstract
mutex concept. The state of the device is a single bit, the lock
operation uses an atomic compare and swap to change the bit from 0
to 1 «atomically» (i.e. in such a way that the operation fails in
case some other thread succeeded in changing it to 1 while the
operation was running), looping until it succeeds. The unlock
operation simply resets the bit to 0.

There are a few upsides: the implementation is very simple, the
state is very compact and the latency is as good as it gets,
assuming that the contention is against another CPU core. The
uncontended case is pretty much optimal.

There is, however, one serious (and in many scenarios, fatal)
drawback: contention on the same CPU core has the worst possible
behaviour (maximal latency and minimal throughput). Another
important problem, though less severe, is that throughput strongly
depends on the average length of the protected critical section:
locks that are only held for a short time perform well, but longer
critical sections will waste significant resources by holding up a
CPU core in the busy-waiting loop.

Overall, spinlocks are useful for protecting critical sections which
are short and which are guaranteed to be contended by threads
running on different CPU cores. This often happens inside the
kernel, but only rarely in user-space programs.

│ Suspending Mutexes
│
│  • these need cooperation from the OS «scheduler»
│  • when lock acquisition fails, the thread «sleeps»
│    ◦ it is put on a «waiting» queue in the scheduler
│  • «unlocking» the mutex will «wake up» the waiting thread
│  • needs a system call → «slow» compared to a spinlock

A different, much more complicated, but also much more versatile,
implementation of a mutex is a «suspending mutex». The main
difference lies in the contended case of the «lock» operation:
whenever there is an attempt to lock an already-locked mutex, the
unlucky thread is suspended and placed on a wait queue of the
scheduler. The «unlock» operation, of course, needs to be augmented
to check if there are any threads waiting for the mutex, and if yes,
wake up one of them.

The interaction with the scheduler means that both the «lock» and
«unlock» operations must perform a system call, at least in some
cases. Compared to atomic instructions, system calls are much more
expensive, increasing the overhead of the mutex considerably.

A common implementation technique uses a combination of a spinlock
and a suspending mutex: the lock operation runs a small number of
loops trying to acquire the mutex, and suspends the thread if this
fails. This approach combines the good behaviours of both, but is
the most complicated implementation-wise, and the state of the mutex
is also at least as big as that of a suspending mutex.

│ Condition Variables Revisited
│
│  • same principle as a «suspending» mutex
│  • the waiting thread goes into a wait queue
│  • ‹signal› moves the thread back to a run queue
│  • the busy-wait version is known as «polling»

The usual way to implement a condition variable is to interact with
the scheduler, allowing the waiting thread to free up the CPU core
it was occupying. A busy-waiting, spinlock-like alternative is
possible, though not commonly used.

│ Barrier
│
│  • sometimes, «parallel» computation proceeds in «phases»
│    ◦ «all» threads must finish phase 1
│    ◦ before «any» can start phase 2
│  • this is achieved with a barrier
│    ◦ blocks all threads until the «last one arrives»
│    ◦ waiting threads are usually «suspended»

Another synchronisation device that we will consider is a barrier.
While the devices that we have described so far can be used in
scenarios with more than two threads, their behaviour is always
governed by pairwise interactions.

A barrier is different: it synchronises a number of threads at a
single point. Only when all the participating threads gather at the
barrier are they allowed to continue.

│ Readers and Writers
│
│  • imagine a «shared database»
│  • many threads can read the database at once
│  • but if one is writing, no other can read nor write
│  • what if there are always some readers?

Let us consider another synchronisation problem, again involving
more than two threads: there is a piece of data (a database if you
like) that many threads need to consult, and sometimes update.

The naive solution would be to simply wrap all access to the data
structure in a critical section: this is certainly correct, but
wasteful, since multiple readers do not interact with each other,
but still need to queue up to interact with the data. 

The synchronisation device that solves the problem is called an
rwlock, or read-write lock, with 3 operations: «lock for reading»,
«lock for writing» and «unlock». The invariant is that if the rwlock
is locked for writing, it is locked by exactly one thread.
Otherwise, there might be multiple threads holding a read lock
(which of course prevents any write locks from being taken until the
read locks are all released).

│ Read-Copy-Update
│
│  • the fastest lock is «no lock»
│  • RCU allows «readers» to work while «updates» are done
│    ◦ make a copy and «update» the copy
│    ◦ point «new readers» to the updated copy
│  • when is it safe to «reclaim memory»?

In some cases, the readers & writers scenario can be solved without
taking any locks at all (not even a critical section is required).
The way this works is that a writer, instead of modifying the data
in place (and therefore interfering with concurrent reads), makes a
copy of the data (which is a read operation, and hence can be safely
performed concurrently), performs the updates in this new copy of
the data, and then instructs all future readers to use this copy
instead, usually by updating a pointer.

## Deadlocks and Starvation

│ Dining Philosophers
│
│   ╭────────────╮ ┌──────┐ ╭────────────╮ 
│   │  Aristotle │ │ fork │ │     Plato  │ 
│   ╰────────────╯ └──────┘ ╰────────────╯
│         ┌──────┐          ┌──────┐
│         │ fork │   bowl   │ fork │
│         └──────┘          └──────┘
│   ╭────────────╮ ┌──────┐ ╭────────────╮ 
│   │ Pythagoras │ │ fork │ │ Socrates   │
│   ╰────────────╯ └──────┘ ╰────────────╯

In the ‘dining philosophers’ problem, there is a bowl of food in the
middle of a table, a number of philosophers seated around the table,
and between each pair of philosophers, there is a single fork. To
eat, each philosopher needs to use two forks at the same time. The
philosophers sit and think, but when they become hungry, they pick
up the forks and eat.

This scenario is used to illustrate the problem of deadlocks and
starvation in concurrent systems. Can you come up with instructions
for the philosophers that would ensure that each hungry philosopher
gets to eat (in finite time)? Consider why simple algorithms fail.

│ Shared Resources
│
│  • hardware comes in a «limited» number of «instances»
│  • many devices can only do «one» thing at a time
│  • think «printers», DVD writers, tape drives, ...
│  • we want to use the devices «efficiently» → «sharing»
│  • resources can be «acquired» and «released»

The most natural setting to contemplate deadlocks is one in which
multiple threads (and/or processes) compete for finite «resources».
In this case, a resource is an abstract object, which can be
acquired, used, and released. A resource cannot be used before it
has been acquired, and can no longer be used after being released.
The thread that has acquired (but not yet released) a resource is
said to «own» it.

│ Network-based Sharing
│
│  • «sharing» is not limited to processes on one computer
│  • printers and scanners can be «network-attached»
│  • the entire network may need to «coordinate access»
│    ◦ this could lead to multi-computer «deadlocks»

In practice, some resources are ‘remote’ – made available to
processes on computer A by another computer B attached to the same
network. This does not make any practical difference to our
consideration (other than perhaps realizing, that a single deadlock
may span multiple computers).

│ Locks as Resources
│
│  • we explored «locks» in the previous section 
│  • locks (mutexes) are also a form of «resource»
│    ◦ a mutex can be acquired (locked) and released
│    ◦ a locked mutex «belongs» to a particular thread
│  • locks are «proxy» (stand-in) resources

In the following, we will consider locks (or even critical sections
as such) to be simply a form of resource: the operations on an
abstract resource neatly map to operations on an (abstract) lock.

│ Preemptable Resources
│
│  • sometimes, held resources can be «taken away»
│  • this is the case with e.g. «physical memory»
│    ◦ a process can be «swapped» to disk if need be
│  • preemtability may also depend on «context» 
│    ◦ maybe paging is not available

Some resources can be taken away from their owner temporarily,
without causing ill effects, or at an acceptable cost (in terms of
latency, loss of throughput, or resource-specific considerations,
such as waste of material). A canonic example would be a «page of
memory»: even though it is acquired by a particular process, the
system might take it away without cooperation from the process, and
transparently give it back whenever it is actually used (this
technique is known as swapping).

│ Non-preemptable Resources
│
│  • those resources «cannot» be (easily) taken away
│  • think photo printer in the middle of a page
│  • or a DVD burner in the middle of writing
│  • «non»-preemptable resources can cause «deadlocks»

Many resources are, however, practically non-preemptable; that is,
once acquired by a thread or a process, they cannot be unilaterally
taken away without considerable damage, e.g. killing the owning
process or irreparably damaging its output through the device.

│ Resource Acquisition
│
│  • a process needs to «request access» to a resource
│  • this is called an «acquisition»
│  • when the request is granted, it can use the device
│  • after it is done, it must «release» the device
│    ◦ this makes it available for other processes

Now that we have discussed resources in a bit more detail, let's
just quickly revisit the protocol for their acquisition and release.
In the following, we operate on the assumption that only a limited
number of processes can own a particular resource at any given time
(most often just one, in fact).

│ Waiting
│
│  • what to do if we wish to «acquire» a «busy» resource?
│  • unless we don't really need it, we have to «wait»
│  • this is the same as waiting for a «mutex»
│  • the thread is moved to a wait queue

We will also assume that acquisition is «blocking»: if a resource is
busy (owned by another process), the process will wait until it is
released before continuing execution. This does not always need to
be the case, but it is the most common approach.

│ Resource Deadlock
│
│  • two resources, A and B
│  • two threads (processes), P and Q
│  • P «acquires» A, Q «acquires» B
│  • P tries to «acquire» B but has to «wait» for Q
│  • Q tries to «acquire» A but has to «wait» for P

Now we are finally equipped to look at (resource) «deadlocks».  We
will look at the simplest possible case, with two threads and two
resources. After the above sequence of events, there can be no
further progress: without an outside intervention, both P and Q will
be blocked forever. This is what we call a «deadlock».  Please note
that it is «usually» the case that both acquisitions in P are
concurrent with both acquisitions in Q.

Of course, the above situation can be generalized to any number of
threads and resources (as long as there are at least 2 of each).

│ Resource Deadlock Conditions
│
│  1. mutual exclusion
│  2. hold and wait condition
│  3. non-preemtability
│  4. circular wait
│
│ Deadlock is only possible if all 4 are present.

A number of sound design decisions conspire to create the conditions
for a deadlock to occur. It is certainly natural that a resource
should only be used by a single thread at any given time – that is,
after all, the nature of resources.

The «hold and wait» condition arises when a single thread can
request multiple resources at once, performing the acquisitions in a
sequence. It is hard to object here too, since it fits the linear
nature of most programs – the program (or rather each thread of a
program) is essentially a sequence of instructions, and each
instruction is only performed after the previous has finished.
Resource acquisition cannot finish while a resource is busy, and the
only way to have more than a single resource is to acquire each in
turn.

In some cases, non-preemptability is not even negotiable: it is a
property of the resource in question. There is some maneuvering
space with critical sections that cause no interaction with any
external entity (resource, other threads) with the exception of
their associated lock. In this case, it might be possible to roll
back the effects of the critical section and attempt to restart it.
This is, however, not easy to achieve.

Let us consider a static «resource dependency graph», in which nodes
are resources and edges indicate that a hold-and-wait condition
exists between them, that is, an edge A → B is present in the graph
if there is a thread which attempts to acquire B while holding A.
The «circular wait» condition is satisfied whenever there is a cycle
in this graph.

│ Non-Resource Deadlocks
│
│  • not all deadlocks are due to «resource» contention
│  • imagine a «message-passing» system
│  • process A is «waiting» for a message
│  • process B sends a message to A and «waits» for reply
│  • the message is «lost» in transit

Quite importantly, the above four conditions only pertain to
«resource deadlocks»: there are other types of deadlocks with a
different set of conditions. This means that even if we can
eliminate one of the 4 conditions, and hence prevent resource
deadlocks, this does not mean, unfortunately, that our system will
be deadlock-free.

│ Example: Pipe Deadlock
│
│  • recall that both the «reader» and «writer» can «block»
│  • what if we create a pipe in «each direction»?
│  • process A writes data and tries to read a reply
│    ◦ it blocks because the «opposite» pipe is «empty»
│  • process B reads the data but «waits for more» → deadlock

A typical example of a non-resource deadlock has to do with pipes:
remember that an empty pipe blocks on read, while a full pipe blocks
on write. There is a number of ways this can go wrong, if there is
more than one pipe involved. You can probably spot similarities
between this type of deadlock, and the resource deadlock we have
discussed at length.

│ Deadlocks: Do We Care?
│
│  • deadlocks can be very «hard to debug»
│  • they can also be exceedingly «rare»
│  • we may find the risk of a deadlock «acceptable»
│  • just «reboot» everything if we hit a deadlock
│    ◦ also known as the ostrich algorithm

Many (probably most) deadlocks arise from race conditions, and hence
will not happen very often. The so-called «ostrich algorithm» takes
advantage of this rarity: if a deadlock happens, kill all the
involved processes, or possibly just reboot the entire system. Of
course, deciding that a deadlock has happened can be tricky.

│ Deadlock Detection
│
│  • we can at least try to «detect» deadlocks
│  • usually by checking the «circular wait» condition
│  • keep a graph of ownership vs waiting
│  • if there is a «loop» in the graph → «deadlock»

One way to detect deadlocks is to use a dynamic counterpart of the
static «circular wait» condition: in this case, the dependency graph
has two types of nodes: threads and resources. Edges are always
between a thread T and a resource R: R → T exists if thread T
currently owns resource R, and T → R exists, if thread T is waiting
for the acquisition of R. If a cycle exists in this graph, the
system is in a deadlock (and the threads which lie on the cycle are
all blocked). Let's revisit the example with two threads, P and Q,
and two resources, A and B that we saw earlier:

 1. P successfully «acquires» A (giving rise to edge A → P)
 2. likewise, Q «acquires» B, meaning B → Q
 3. P tries to «acquire» B but has to «wait», hence P → B
 4. Q tries to «acquire» A but has to «wait», hence Q → A

This is the resulting graph, with an obvious cycle in it:

 ┌───┐  ╭───╮
 │ P │◀─│ A │
 └───┘  ╰───╯
   ▼      ▲
 ╭───╮  ┌───┐
 │ B │─▶│ Q │
 ╰───╯  └───┘

│ Deadlock Recovery
│
│  • if a preemptable resource is involved, «reassign» it
│  • otherwise, it may be possible to do a «rollback»
│    ◦ this needs elaborate «checkpointing» mechanisms
│  • all else failing, «kill» some of the processes
│    ◦ the devices may need to be «re-initialised»

A deadlock involving at least one preemptable resource can still
happen, but unlike in the ‘standard’ case, it is possible to recover
and continue computation without forcibly terminating any of the
threads or processes involved. Instead, one of the resources is
preempted, breaking the cycle and allowing the system to make
progress again.

If the deadlock instead includes a restartable critical section (as
outlined earlier), then this critical section can be rolled back and
restarted, again allowing other threads to make progress.

Finally, if all else fails, the system can pick a victim thread on
the cycle and terminate it, releasing any resources it might have
been holding. Like the other cases, this allows the system to
progress again.

│ Deadlock Avoidance
│
│  • we can possibly «deny acquisitions» to avoid deadlocks
│  • must know the «maximum» resources for each process
│  • avoidance relies on «safe states»
│    ◦ worst case: «all processes» ask for «maximum resources»
│    ◦ «safe» means deadlocks are «avoided» in the «worst» case

In general, deadlock avoidance is an approach where the system will
deny resource acquisitions if they might later lead to a deadlock.
The best-known instance of deadlock avoidance is Banker's Algorithm,
invented by E. Dijkstra. This algorithm applies in cases where each
resource has a number of fungible instances greater than any single
thread might request at once. The input to the algorithm is the
maximal number of instances of a resource that can be requested by
each thread. The «safety» invariant is this: 
 
 1. there is a thread T, such that the available resources are
    sufficient to satisfy its maximal resource allocation,
 2. when the thread T terminates (returning all its currently
    allocated resources), the invariant still holds.

Any resource acquisitions which would violate this invariant are
denied. The initial conditions imply that, at the outset, we are in
a safe state; thus, we have demonstrated that the algorithm is
correct. The algorithm assumes that nothing apart from resource
acquisitions might block any of the threads. The following table
illustrates the algorithm (P, Q and R are threads, the numbers are
resources held / maximal allocation):

│ step │  P  │  Q  │  R  │ available │        action │
├┄┄┄┄┄┄┼┄┄┄┄▶┤┄┄┄┄▶┤┄┄┄┄▶┤┄┄┄┄┄┄┄┄┄┄▶┤┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┤
│    1 │ 0/3 │ 0/2 │ 0/4 │       5/5 │ P acquires  1 │
│    2 │ 1/3 │ 0/2 │ 0/4 │       4/5 │ R acquires  2 │
│    3 │ 1/3 │ 0/2 │ 2/4 │       2/5 │ Q acquires  1 │
│    4 │ 1/3 │ 1/2 │ 2/4 │       1/5 │ P is denied 1 │
│    5 │ 1/3 │ 1/2 │ 2/4 │       1/5 │ Q acquires  1 │
│    6 │ 1/3 │ 2/2 │ 2/4 │       0/5 │ Q terminates  │
│    7 │ 1/3 │   – │ 2/4 │       2/5 │ P acquires  1 │
│    8 │ 2/3 │   – │ 2/4 │       1/5 │               │

Notice that in step 4, the request by P has been denied, because
there would no longer be any thread guaranteed to be able to make
sufficient progress to release resources.

│ Deadlock Prevention
│
│  • deadlock avoidance is typically «impractical»
│  • there are 4 «conditions» for deadlocks to exist
│  • we can try attacking those conditions
│  • if we can remove one of them, deadlocks are «prevented»

Deadlock avoidance is, unfortunately, not very practical in
general-purpose operating systems: it is often the case, that a
particular resource has exactly 1 instance available, and the above
algorithm would then force all programs which might use the
resource to run one after another. Moreover, the assumption that
there are no other interactions between threads is not very
realistic either.

│ Prevention via Spooling
│
│  • this attacks the «mutual exclusion» property
│  • multiple programs could write to a printer
│  • the data is «collected» by a spooling daemon
│  • which then sends the jobs to the printer «in sequence»

This approach can trade a deadlock on the printer for a deadlock on
disk space. However, disk space is much more likely to be
preemptable in this scenario, since a job blocked by a full disk can
be canceled (and erased from disk) and later retried (unlike a
half-printed page).
 
│ Prevention via Reservation
│
│  • we can also try removing «hold-and-wait»
│  • for instance, we can only allow «batch acquisition»
│    ◦ the process must request everything at once
│    ◦ this is usually «impractical»
│  • alternative: release and «re-acquire»

It is certainly conceivable that we might require that all resource
acquisitions are done in batches, always requesting, as a single
atomic action, all the resources required in a particular section of
code. The resources may be released one at a time. If an additional
resource is required while other resources are already held, the
program must first release everything it holds before requesting the
new batch.

│ Prevention via Ordering
│
│  • this approach eliminates «circular waits»
│  • we impose a «global order» on «resources»
│  • a process can only acquire resources «in this order»
│    ◦ must release + «re-acquire» if the order is wrong
│  • it is impossible to form a cycle this way

Finally, the perhaps most practical approach «within a single
program» (or another closed system, but not so much in an operating
system) is to impose a global order on resource acquisition, which
each thread must observe. This implies that the static «resource
dependency graph» is acyclic and deadlocks cannot occur.

│ Livelock
│
│  • in a deadlock, no progress can be made
│  • but it's not much better if processes go back and forth
│    ◦ for instance releasing and re-acquiring resources
│    ◦ they make no «useful» progress
│    ◦ they additionally consume resources
│  • this is a «livelock» and is just as bad as a deadlock

In a deadlock, the blocked threads are all usually suspended (not
runnable). If some or all of the threads involved in a deadlock-like
condition are active (busy waiting, polling or otherwise attempting
an action which is bound to fail forever), we talk about a
livelock: threads are executing instructions, but progress is not
being made.

│ Starvation
│
│  • starvation happens when a process «can't» make «progress»
│  • «generalisation» of both «deadlock» and «livelock»
│  • for instance, «unfair» scheduling on a busy system
│  • also recall the «readers and writers» problem

Finally, the most general notion is «starvation», which is a
condition when a thread cannot make progress, for whatever reason:
it is blocked on a resource, it is denied processor time, etc.
An example that is neither a deadlock nor a livelock would arise in
a naive solution to the readers & writers problem from earlier: if
there are sufficiently many readers so that there is always a read
lock on the resource, writers will be blocked indefinitely, and
hence starved.

│ Review Questions
│
│  21. What is a mutex?
│  22. What is a deadlock?
│  23. What are the conditions for a deadlock to form?
│  24. What is a race condition?