# Concurrency and Locking This lecture will deal with the issues that arise from running multiple threads and processes at the same time, both using time-sharing of a single processor and by executing on multiple physical CPU cores. │ Lecture Overview │ │ 1. Inter-Process Communication │ 2. Synchronisation │ 3. Deadlocks In the first part, we will explore the basic why's and how's of inter-process and inter-thread communication. This will naturally lead to questions about shared resources, and to the topic of thread synchronisation, mutual exclusion and so on. Finally, we will deal with waiting and deadlocks, which arise whenever multiple threads can wait for each other. │ What is Concurrency? │ │ • events that can happen at the same time │ • it is not important if it «does», only that it «can» │ • events can be given a «happens-before» partial order │ • they are «concurrent» if unordered by «happens-before» Generally speaking, an event «happens-before» another event when they are causally connected: the first event makes the other event possible, or directly causes it. However, just as often, events are «not» causally connected: they might happen in any order. We say that such events are «concurrent»: they can happen in either order, or they can happen, figuratively speaking, at the same time. │ Why Concurrency? │ │ • problem decomposition │ ◦ different «tasks» can be largely independent │ • reflecting external concurrency │ ◦ serving «multiple clients» at once │ • performance and hardware limitations │ ◦ «higher throughput» on multicore computers Concurrency arises, in software systems, for a number of reasons: first, and perhaps most important, is problem decomposition: it is much easier to design a large system without explicitly ordering all possible events. A complex system will perform a large number of naturally independent – concurrent -- tasks. While it would be certainly possible to impose an artificial order on such tasks, it is not a very practical approach. Another important reason is that the concurrency came from outside: if there are external events that are concurrent, it is hard to imagine that the responses of the system would be strictly ordered: normally, the actual order of concurrent, external events cannot be predicted, and imposing an order on reactions would mean that at least some of the reactions would be unacceptably delayed. Finally, concurrency in software allows the underlying hardware to perform better: concurrent instructions can be executed in parallel, without much concern about their relative ordering – which allows the hardware to use its limited resources more efficiently. │ Parallel Hardware │ │ • hardware is inherently parallel │ • software is inherently sequential │ • something has to give │ ◦ hint: it's not going to be hardware Unlike software, which usually consists of a «sequence» of instructions that are executed in order, hardware consists of spatially organized circuits. Most of the circuitry can operate independently of other circuits. For instance, a typical CPU will contain circuits for multiplying numbers, and other circuits for adding numbers. Those circuits are almost entirely independent, and can operate at the same time, on different pairs of numbers. If the processor is adding numbers, it could be multiplying some other numbers at the same time, without any ill effect on the process of addition happening in another part of the processor. Of course, multi-core processors are an extreme manifestation of this: the separate cores are simply (mostly) independent copies of the same (very complicated) circuit. Since we cannot make sequential execution much faster than it already is, the best we can hope for is leveraging concurrency to execute as many things in parallel (on parallel hardware) as we can. ## Inter-Process Communication Communication is an important part of all but the most trivial software. While the mechanisms described in this section are traditionally known as inter-«process» communication (IPC), we will also consider cases where threads of a single process use those mechanisms (and will not use a special name, even though the communication is, in fact, «intra»-process in those cases). │ Reminder: What is a Thread │ │ • thread is a «sequence» of instructions │ • each instruction «happens-before» the next │ ◦ or: happens-before is a total order on the thread │ • basic unit of «scheduling» Before we go any further, we need to recall, from the last lecture, that a thread is a sequence of instructions: as such, each instruction «happens before» the next instruction (for the more mathematically inclined, this simply means that on the instructions of a single thread, happens-before is a linear ordering). Additionally, threads are a basic unit of scheduling: each processor core is running, at any given time, a single thread. │ Reminder: What is a Process │ │ • the basic unit of «resource ownership» │ ◦ primarily «memory», but also open files &c. │ • may contain one or more «threads» │ • processes are «isolated» from each other │ ◦ IPC creates gaps in that isolation A process, then, is a unit of resource ownership – processes own memory (virtual address spaces), open files, network connections and so on. Each process has at least one (but possibly multiple) threads, which carry out the computation. Since each process is isolated in its own virtual address space, there is no direct way that threads from different processes can interact (communicate, synchronize). However, some degree of communication is desirable, and required: the operating system therefore provides a few primitives, which allow processes to talk to each other, creating controlled gaps in the isolation. │ I/O vs Communication │ │ • take standard input and output │ ◦ imagine process A «writes» a file │ ◦ later, process B «reads» that file │ • «communication» happens in «real time» │ ◦ between two running threads / processes │ ◦ automatic: without user intervention Another term that we want to delineate is that of «communication»: in particular, we need to differentiate it from ‘offline’ input and output. Technically speaking, reading a file from disk is communication, because some program wrote that file earlier. However, this is not the kind of thing that we are interested in: we are only interested in the instances that happen in real time. More formally, there should be alternation of actions from two threads, ordered by happens-before, i.e. at least one instruction of thread A is happens-before-sandwiched between instructions of thread B. │ Direction │ │ • «bidirectional» communication is typical │ ◦ this is analogous to a conversation │ • but unidirectional communication also makes sense │ ◦ e.g. sending commands to a child process │ ◦ do acknowledgments count as communication? Strictly speaking, using the above definition, all communication is bidirectional. However, truly unidirectional communication is quite rare: even semantically unidirectional communication often includes an acknowledgement or other form of reply, satisfying the definition above. │ Communication Example │ │ • «network services» are a typical example │ • take a web server and a web browser │ • the browser «sends a request» for a web page │ • the server «responds» by sending data An intuitive, if technically complicated, example of communication would be the interaction of a web browser with a web server. In this case, the browser will use a network connection to send a request, which will be processed by the server. Finally, the server will send a reply, for which the browser is waiting: it is easy to see the causal connections: the request happens-before the reply which happens-before the browser showing the content to the user. │ Files │ │ • it is possible to communicate through «files» │ • multiple processes can open the «same file» │ • one can write data and another can process it │ ◦ the original program picks up the results │ ◦ typical when using «programs as modules» Multiple programs (processes) can open the same file, and read or write data into it at the same time. With a little care, this can be done safely. It is not hard to imagine, then, that the two programs could use this capability for communication, as defined above: one of the programs writes some data on at a particular offset, the other program reads them and perhaps confirms this, possibly using a different mechanism. │ A File-Based IPC Example │ │ • files are used e.g. when you run ‹cc file.c› │ ◦ it first runs a preprocessor: ‹cpp -o file.i file.c› │ ◦ then the compiler proper: ‹cc1 -o file.o file.i› │ ◦ and finally a linker: ‹ld file.o crt.o -lc› │ • the «intermediate» files may be hidden in ‹/tmp› │ ◦ and deleted when the task is completed As an example, even if a little specific, let's recall how the compiler driver communicates with the individual compilation stages (the compilation process has been covered in the second lecture, in case you need to review it). Each step looks the same: the driver arranges an intermediate form of the program to be written to a file, then invokes a subprocess which reads that file and writes its output into another file. Clearly, the invocation of next stage «happens after» the output of the previous has been written. │ Directories │ │ • communication by «placing» files or links │ • typical use: a «spool» directory │ ◦ clients drop files into the directory for processing │ ◦ a server periodically «picks up» files from there │ • used for e.g. «printing» and «email» Another approach for communication through the file system leverages directories: a daemon sets up a directory, into which its clients can drop files. These files are then picked up and processed by the daemon, and then unlinked from the directory. This approach is often seen in print spooling: the client program drops a file (in suitable format) that it wants printed into a «spool directory», where it is picked up by the printer daemon, which arranges for the file to be printed and removed from the queue. Email systems use an analogous process, where mail is either dropped into spool directories as part of internal processing (most mail servers are built from multiple services connected by IPC), or for pickup by individual users. │ Pipes │ │ • a device for moving «bytes» in a «stream» │ ◦ note the difference from messages │ • one process «writes», the other «reads» │ • the reader «blocks» if the pipe is «empty» │ • the writer «blocks» if the pipe buffer is «full» We have already talked about pipes earlier: recall that pipes move bytes from one process to another, using the standard file system API: data is sent using ‹write› and received using ‹read›. An important consideration is that each pipe has a (finite) buffer attached to it: when the buffer is full, a process trying to ‹write› data will be blocked, until the other process issues a ‹read›, removing some of the data from the buffer. Likewise, when the buffer is empty, attempting to ‹read› from the pipe will block, until the other side issues a ‹write› that and hence provides some data for ‹read› to return. │ UNIX and Pipes │ │ • pipes are used extensively in UNIX │ • «pipelines» built via the shell's ‹|› operator │ • e.g. ‹ls | grep hello.c› │ • most useful for processing data in «stages» Especially in the case of user-setup pipes (via shell pipelines), the boundary between IPC and “standard IO” is rather blurry. Programs in UNIX are often written in a style, where they read input, process it and write the result as their output. This makes them amenable for use in pipelines. While pipes are arguably “more automatic” than dropping the output in a file and running another command to process the file, there is also a degree of manual intervention. Another way to look at the difference may be that a pipe is primarily an IPC mechanism, while a file is primarily a storage mechanism. │ Sockets │ │ • similar to, but «more capable» than pipes │ • allows one «server» to talk to many clients │ • each «connection» acts like a bidirectional pipe │ • could be local but also connected via a «network» You might remember that sockets are pipe-like devices, but more powerful (and more complicated). A socket allows a single server to open communication channels to multiple clients at the same time (possibly over a network), providing a bidirectional channel between each of the clients and the server. Additionally, sockets also come in a ‘datagram’ flavour, in which case they do not establish connections and do not behave like pipes at all: instead, in this mode, they can be used to send messages. │ Shared Memory │ │ • memory is «shared» when multiple threads can access it │ ◦ happens naturally for «threads» of a single process │ ◦ the primary means of inter-thread communication │ • many processes can map the same physical location │ ◦ this is the more traditional setting │ ◦ hence also allows inter-«process» communication Shared memory is, in some sense, the most straightforward communication mechanism possible: memory access is, after all, a very common operation in a program. Among threads of the same process, all memory is ‘shared’ in the sense that all threads can read and write all memory (since they share the same page table). Of course, in most programs, only a small, carefully delineated part of the memory is used for exchanging data between threads. Additionally, processes can set up a piece of shared memory between them. However, since there is no shared virtual address space, each process may have the shared block of memory mapped at different virtual address (let's call the relevant virtual addresses A and B). However, there is only one block of physical memory involved: the shared memory resides at the physical address P. Inter-process shared memory is then set up in such a way, that for the first process, A translates to P and in the second process, B translates to P. ┌─┬─┬─┬─┬─┐ │ │ │A│ │B│ process 1 └─┴─┴─┴─┴─┘ │ ╰───╮ ▼ ▼ ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐ │ │ │ │ │P│ │ │ │ │ │ │ physical memory └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ ▲ ▲ ▲ ╭─╯ ╭───┼────╯ ┌─┬─┬─┬─┬─┐ │ │ │A│ │B│ process 2 └─┴─┴─┴─┴─┘ The address B may or may not be a valid address in the first process, and the same holds for A in the second process. │ Message Passing │ │ • communication using discrete «messages» │ • we may or may not care about «delivery order» │ • we can decide to tolerate «message loss» │ • often used across a «network» │ • can be implemented on top of sockets While it is hard to beat the efficiency of shared memory as a communication mechanism, message passing has other advantages: first of all, it is much safer (communication through shared memory can be quite tricky to get right) and, for some use cases, considerably easier to use. Moreover, message passing can be done over networks, which makes it especially suitable for distributed systems. ## Synchronisation In this section, we will deal with a specific form of communication, where no (application) data is exchanged. The purpose of synchronisation primitives is to ensure that actions of multiple threads or processes proceed in a correct order (of which, there are usually many). Our main motivation will be prevention of «race conditions», which are, essentially, the «wrong orderings» of «concurrent actions» (for a suitable value of ‘wrong’). │ Shared Variables │ │ • structured view of «shared memory» │ • typical in «multi-threaded» programs │ • e.g. any «global» variable in a program │ • but may also live in memory from ‹malloc› Before we proceed to talk about synchronisation proper, we will have a look at a proper communication device, which we will use as an example: a «shared variable» is simply a named piece of shared memory, available in multiple threads under the same name. │ Shared Heap Variable │ │ void *thread( int *x ) { *x = 7; } /* C */ │ int main() │ { │ pthread_t id; │ int *x = malloc( sizeof( int ) ); │ pthread_create( &id, NULL, thread, x ); │ } It is also possible to have anonymous shared variables, allocated dynamically on the heap. They are accessed through a pointer. │ Race Condition: Example │ │ • consider a «shared counter», ‹i› │ • and the following «two threads» │ │ int i = 0; /* C */ │ void thread1() { i = i + 1; } │ void thread2() { i = i - 1; } │ │ What is the value of ‹i› after both finish? Before we attempt to define a race condition, let's look at an example. In the above program, two threads modify the same shared variable: one adds, and another subtracts, one. Try to think about the possible outcomes, before you look ahead. │ Race on a Variable │ │ • memory access is «not» atomic │ • take ‹i = i + 1› / ‹i = i - 1› │ │ a₀ ← load i | b₀ ← load i │ a₁ ← a₀ + 1 | b₁ ← b₀ - 1 │ store a₁ i | store b₁ i To understand why the innocent-looking code from the previous slide fails, we need to look at a low-level description of the program (at the level of, essentially, machine instructions). Notice that each of the two statement translates into three separate instructions: load a value from memory, perform the arithmetic operation, and store the result back into memory. At this level of detail, it should be easy to see how the instructions from the two threads can be ordered to give a wrong (or at least unexpected) result, without violating the happens-before relation (remember that instructions of each thread separately are linearly ordered). │ Critical Section │ │ • any section of code that must «not» be «interrupted» │ • the statement ‹x = x + 1› «could» be a critical section │ • what is a critical section is «domain-dependent» │ ◦ another example could be a bank transaction │ ◦ or an insertion of an element into a linked list If we want the result of the above program to be always 0, we must demand that each of the two statements is an instruction sequence that belongs to a common «critical section»: that is, a span of code that must not be interrupted by any instruction that belongs to the same critical section. In this case, this means that once either of the ‹load› instructions starts executing, the thread which did so must first get all the way to the corresponding ‹store› before the other thread can execute its ‹load›. Critical sections do not modify the happens-before relation: the critical sections, as units, can happen in either order (they are still concurrent). However, the entirety of the critical section behaves, with regard to other instances of the same critical section, as a single atomic instruction. │ Race Condition: Definition │ │ • (anomalous) behaviour that «depends on timing» │ • typically among «multiple threads» or processes │ • an «unexpected sequence» of events happens │ • recall that ordering is not guaranteed We can now attempt a definition: a «race condition» is a behaviour which 1. depends on timing (in the sense that it depends on a specific ordering of concurrent events), 2. was not anticipated by the programmer. With the above definition, a race condition can be benign, and the term is often used in this sense. In our study of the phenomenon, however, it is more useful to also include a third clause: 3. the behaviour is erroneous. Race conditions often lurk in places where there is a lot of concurrency: threaded programs are a prime example (almost all instructions are concurrent with a large number of other instructions coming from other threads). However, it is important to keep in mind, that some form of communication is necessary for a race condition to arise – concurrency alone is not sufficient. This is because «completely concurrent» processes cannot influence each other, and hence their behaviour cannot depend on the particular ordering of the events that are concurrent among them. │ Races in a Filesystem │ │ • the file system is also a «shared resource» │ • and as such, prone to race conditions │ • e.g. two threads both try to create the «same file» │ ◦ what happens if they both succeed? │ ◦ if both write data, the result will be garbled Since file system is a resource that is shared by otherwise unrelated processes, it creates an environment prone to race conditions: there is ample concurrency, but also a lot of (sometimes accidental) communication. │ Mutual Exclusion │ │ • context: only one thread can access a resource at once │ • ensured by a «mutual exclusion device» (a.k.a «mutex») │ • a mutex has 2 operations: ‹lock› and ‹unlock› │ • ‹lock› may need to wait until another thread ‹unlocks› Now that we have basic understanding of the problems associated with concurrency, let's look at some of the available remedies. The first, and in some sense simplest synchronisation primitive is a «mutual exclusion device», designed to enforce «critical sections». It is often the case that the critical section is associated with a «resource»: that is, all the code between each «acquisition» and the corresponding «release» of the resource is part of a single critical section. This essentially means that one of the steps of the acquisition process is «locking a mutex» and, analogously, the mutex is «unlocked» as part of the release process associated with the resource. │ Semaphore │ │ • somewhat «more general» than a mutex │ • allows «multiple» interchangeable «instances of a resource» │ ◦ consider there are N identical printers │ ◦ then N processes can be printing at any given time │ • basically an atomic counter Semaphores are, in a sense, a generalisation of a mutex – however, they are only useful when dealing with resources (i.e. it can only be used to guard a general critical section in the case it is exactly equivalent to a mutex). The difference between a mutex and a semaphore is that multiple threads can enter the section of code guarded by a semaphore, but the number is capped by a constant. If the constant is set to 1, then the semaphore is the same as a mutex. │ Monitors │ │ • a programming «language» device (not OS-provided) │ • internally uses standard «mutual exclusion» │ • data of the monitor is only accessible to its methods │ • only «one thread» can enter the monitor at once Essentially, a monitor is a fully encapsulated «object» (as defined in object oriented programming), with one additional restriction: each instance of a monitor has a critical section associated with it, and each of its method is, in its entirety, part of this critical section. As long each public method leaves the object in a consistent state (which is normally required of all objects), a monitor is automatically thread-safe, that is, multiple threads can call its methods without risking race conditions. │ Condition Variables │ │ • what if the monitor needs to «wait» for something? │ • imagine a bounded queue implemented as a monitor │ ◦ what happens if it becomes «full»? │ ◦ the writer must be «suspended» │ • condition variables have ‹wait› and ‹signal› operations Since critical sections are often associated with communication, it may happen that code currently in a critical section cannot proceed until some other thread performs a particular action. This use case is served by another synchronisation device, known as a «condition variable»: the variable can be «waited for», which blocks the calling thread, and «signalled», which wakes up the waiting thread and allows it to proceed. │ Spinlocks │ │ • a «spinlock» is the simplest form of a «mutex» │ • the ‹lock› method repeatedly tries to acquire the lock │ ◦ this means it is taking up «processor time» │ ◦ also known as «busy waiting» │ • spinlocks contention on the «same CPU» is very «bad» │ ◦ but can be very efficient «between» CPUs A spinlock is a particularly simple «implementation» of the abstract mutex concept. The state of the device is a single bit, the lock operation uses an atomic compare and swap to change the bit from 0 to 1 «atomically» (i.e. in such a way that the operation fails in case some other thread succeeded in changing it to 1 while the operation was running), looping until it succeeds. The unlock operation simply resets the bit to 0. There are a few upsides: the implementation is very simple, the state is very compact and the latency is as good as it gets, assuming that the contention is against another CPU core. The uncontended case is pretty much optimal. There is, however, one serious (and in many scenarios, fatal) drawback: contention on the same CPU core has the worst possible behaviour (maximal latency and minimal throughput). Another important problem, though less severe, is that throughput strongly depends on the average length of the protected critical section: locks that are only held for a short time perform well, but longer critical sections will waste significant resources by holding up a CPU core in the busy-waiting loop. Overall, spinlocks are useful for protecting critical sections which are short and which are guaranteed to be contended by threads running on different CPU cores. This often happens inside the kernel, but only rarely in user-space programs. │ Suspending Mutexes │ │ • these need cooperation from the OS «scheduler» │ • when lock acquisition fails, the thread «sleeps» │ ◦ it is put on a «waiting» queue in the scheduler │ • «unlocking» the mutex will «wake up» the waiting thread │ • needs a system call → «slow» compared to a spinlock A different, much more complicated, but also much more versatile, implementation of a mutex is a «suspending mutex». The main difference lies in the contended case of the «lock» operation: whenever there is an attempt to lock an already-locked mutex, the unlucky thread is suspended and placed on a wait queue of the scheduler. The «unlock» operation, of course, needs to be augmented to check if there are any threads waiting for the mutex, and if yes, wake up one of them. The interaction with the scheduler means that both the «lock» and «unlock» operations must perform a system call, at least in some cases. Compared to atomic instructions, system calls are much more expensive, increasing the overhead of the mutex considerably. A common implementation technique uses a combination of a spinlock and a suspending mutex: the lock operation runs a small number of loops trying to acquire the mutex, and suspends the thread if this fails. This approach combines the good behaviours of both, but is the most complicated implementation-wise, and the state of the mutex is also at least as big as that of a suspending mutex. │ Condition Variables Revisited │ │ • same principle as a «suspending» mutex │ • the waiting thread goes into a wait queue │ • ‹signal› moves the thread back to a run queue │ • the busy-wait version is known as «polling» The usual way to implement a condition variable is to interact with the scheduler, allowing the waiting thread to free up the CPU core it was occupying. A busy-waiting, spinlock-like alternative is possible, though not commonly used. │ Barrier │ │ • sometimes, «parallel» computation proceeds in «phases» │ ◦ «all» threads must finish phase 1 │ ◦ before «any» can start phase 2 │ • this is achieved with a barrier │ ◦ blocks all threads until the «last one arrives» │ ◦ waiting threads are usually «suspended» Another synchronisation device that we will consider is a barrier. While the devices that we have described so far can be used in scenarios with more than two threads, their behaviour is always governed by pairwise interactions. A barrier is different: it synchronises a number of threads at a single point. Only when all the participating threads gather at the barrier are they allowed to continue. │ Readers and Writers │ │ • imagine a «shared database» │ • many threads can read the database at once │ • but if one is writing, no other can read nor write │ • what if there are always some readers? Let us consider another synchronisation problem, again involving more than two threads: there is a piece of data (a database if you like) that many threads need to consult, and sometimes update. The naive solution would be to simply wrap all access to the data structure in a critical section: this is certainly correct, but wasteful, since multiple readers do not interact with each other, but still need to queue up to interact with the data. The synchronisation device that solves the problem is called an rwlock, or read-write lock, with 3 operations: «lock for reading», «lock for writing» and «unlock». The invariant is that if the rwlock is locked for writing, it is locked by exactly one thread. Otherwise, there might be multiple threads holding a read lock (which of course prevents any write locks from being taken until the read locks are all released). │ Read-Copy-Update │ │ • the fastest lock is «no lock» │ • RCU allows «readers» to work while «updates» are done │ ◦ make a copy and «update» the copy │ ◦ point «new readers» to the updated copy │ • when is it safe to «reclaim memory»? In some cases, the readers & writers scenario can be solved without taking any locks at all (not even a critical section is required). The way this works is that a writer, instead of modifying the data in place (and therefore interfering with concurrent reads), makes a copy of the data (which is a read operation, and hence can be safely performed concurrently), performs the updates in this new copy of the data, and then instructs all future readers to use this copy instead, usually by updating a pointer. ## Deadlocks and Starvation │ Dining Philosophers │ │ ╭────────────╮ ┌──────┐ ╭────────────╮ │ │ Aristotle │ │ fork │ │ Plato │ │ ╰────────────╯ └──────┘ ╰────────────╯ │ ┌──────┐ ┌──────┐ │ │ fork │ bowl │ fork │ │ └──────┘ └──────┘ │ ╭────────────╮ ┌──────┐ ╭────────────╮ │ │ Pythagoras │ │ fork │ │ Socrates │ │ ╰────────────╯ └──────┘ ╰────────────╯ In the ‘dining philosophers’ problem, there is a bowl of food in the middle of a table, a number of philosophers seated around the table, and between each pair of philosophers, there is a single fork. To eat, each philosopher needs to use two forks at the same time. The philosophers sit and think, but when they become hungry, they pick up the forks and eat. This scenario is used to illustrate the problem of deadlocks and starvation in concurrent systems. Can you come up with instructions for the philosophers that would ensure that each hungry philosopher gets to eat (in finite time)? Consider why simple algorithms fail. │ Shared Resources │ │ • hardware comes in a «limited» number of «instances» │ • many devices can only do «one» thing at a time │ • think «printers», DVD writers, tape drives, ... │ • we want to use the devices «efficiently» → «sharing» │ • resources can be «acquired» and «released» The most natural setting to contemplate deadlocks is one in which multiple threads (and/or processes) compete for finite «resources». In this case, a resource is an abstract object, which can be acquired, used, and released. A resource cannot be used before it has been acquired, and can no longer be used after being released. The thread that has acquired (but not yet released) a resource is said to «own» it. │ Network-based Sharing │ │ • «sharing» is not limited to processes on one computer │ • printers and scanners can be «network-attached» │ • the entire network may need to «coordinate access» │ ◦ this could lead to multi-computer «deadlocks» In practice, some resources are ‘remote’ – made available to processes on computer A by another computer B attached to the same network. This does not make any practical difference to our consideration (other than perhaps realizing, that a single deadlock may span multiple computers). │ Locks as Resources │ │ • we explored «locks» in the previous section │ • locks (mutexes) are also a form of «resource» │ ◦ a mutex can be acquired (locked) and released │ ◦ a locked mutex «belongs» to a particular thread │ • locks are «proxy» (stand-in) resources In the following, we will consider locks (or even critical sections as such) to be simply a form of resource: the operations on an abstract resource neatly map to operations on an (abstract) lock. │ Preemptable Resources │ │ • sometimes, held resources can be «taken away» │ • this is the case with e.g. «physical memory» │ ◦ a process can be «swapped» to disk if need be │ • preemtability may also depend on «context» │ ◦ maybe paging is not available Some resources can be taken away from their owner temporarily, without causing ill effects, or at an acceptable cost (in terms of latency, loss of throughput, or resource-specific considerations, such as waste of material). A canonic example would be a «page of memory»: even though it is acquired by a particular process, the system might take it away without cooperation from the process, and transparently give it back whenever it is actually used (this technique is known as swapping). │ Non-preemptable Resources │ │ • those resources «cannot» be (easily) taken away │ • think photo printer in the middle of a page │ • or a DVD burner in the middle of writing │ • «non»-preemptable resources can cause «deadlocks» Many resources are, however, practically non-preemptable; that is, once acquired by a thread or a process, they cannot be unilaterally taken away without considerable damage, e.g. killing the owning process or irreparably damaging its output through the device. │ Resource Acquisition │ │ • a process needs to «request access» to a resource │ • this is called an «acquisition» │ • when the request is granted, it can use the device │ • after it is done, it must «release» the device │ ◦ this makes it available for other processes Now that we have discussed resources in a bit more detail, let's just quickly revisit the protocol for their acquisition and release. In the following, we operate on the assumption that only a limited number of processes can own a particular resource at any given time (most often just one, in fact). │ Waiting │ │ • what to do if we wish to «acquire» a «busy» resource? │ • unless we don't really need it, we have to «wait» │ • this is the same as waiting for a «mutex» │ • the thread is moved to a wait queue We will also assume that acquisition is «blocking»: if a resource is busy (owned by another process), the process will wait until it is released before continuing execution. This does not always need to be the case, but it is the most common approach. │ Resource Deadlock │ │ • two resources, A and B │ • two threads (processes), P and Q │ • P «acquires» A, Q «acquires» B │ • P tries to «acquire» B but has to «wait» for Q │ • Q tries to «acquire» A but has to «wait» for P Now we are finally equipped to look at (resource) «deadlocks». We will look at the simplest possible case, with two threads and two resources. After the above sequence of events, there can be no further progress: without an outside intervention, both P and Q will be blocked forever. This is what we call a «deadlock». Please note that it is «usually» the case that both acquisitions in P are concurrent with both acquisitions in Q. Of course, the above situation can be generalized to any number of threads and resources (as long as there are at least 2 of each). │ Resource Deadlock Conditions │ │ 1. mutual exclusion │ 2. hold and wait condition │ 3. non-preemtability │ 4. circular wait │ │ Deadlock is only possible if all 4 are present. A number of sound design decisions conspire to create the conditions for a deadlock to occur. It is certainly natural that a resource should only be used by a single thread at any given time – that is, after all, the nature of resources. The «hold and wait» condition arises when a single thread can request multiple resources at once, performing the acquisitions in a sequence. It is hard to object here too, since it fits the linear nature of most programs – the program (or rather each thread of a program) is essentially a sequence of instructions, and each instruction is only performed after the previous has finished. Resource acquisition cannot finish while a resource is busy, and the only way to have more than a single resource is to acquire each in turn. In some cases, non-preemptability is not even negotiable: it is a property of the resource in question. There is some maneuvering space with critical sections that cause no interaction with any external entity (resource, other threads) with the exception of their associated lock. In this case, it might be possible to roll back the effects of the critical section and attempt to restart it. This is, however, not easy to achieve. Let us consider a static «resource dependency graph», in which nodes are resources and edges indicate that a hold-and-wait condition exists between them, that is, an edge A → B is present in the graph if there is a thread which attempts to acquire B while holding A. The «circular wait» condition is satisfied whenever there is a cycle in this graph. │ Non-Resource Deadlocks │ │ • not all deadlocks are due to «resource» contention │ • imagine a «message-passing» system │ • process A is «waiting» for a message │ • process B sends a message to A and «waits» for reply │ • the message is «lost» in transit Quite importantly, the above four conditions only pertain to «resource deadlocks»: there are other types of deadlocks with a different set of conditions. This means that even if we can eliminate one of the 4 conditions, and hence prevent resource deadlocks, this does not mean, unfortunately, that our system will be deadlock-free. │ Example: Pipe Deadlock │ │ • recall that both the «reader» and «writer» can «block» │ • what if we create a pipe in «each direction»? │ • process A writes data and tries to read a reply │ ◦ it blocks because the «opposite» pipe is «empty» │ • process B reads the data but «waits for more» → deadlock A typical example of a non-resource deadlock has to do with pipes: remember that an empty pipe blocks on read, while a full pipe blocks on write. There is a number of ways this can go wrong, if there is more than one pipe involved. You can probably spot similarities between this type of deadlock, and the resource deadlock we have discussed at length. │ Deadlocks: Do We Care? │ │ • deadlocks can be very «hard to debug» │ • they can also be exceedingly «rare» │ • we may find the risk of a deadlock «acceptable» │ • just «reboot» everything if we hit a deadlock │ ◦ also known as the ostrich algorithm Many (probably most) deadlocks arise from race conditions, and hence will not happen very often. The so-called «ostrich algorithm» takes advantage of this rarity: if a deadlock happens, kill all the involved processes, or possibly just reboot the entire system. Of course, deciding that a deadlock has happened can be tricky. │ Deadlock Detection │ │ • we can at least try to «detect» deadlocks │ • usually by checking the «circular wait» condition │ • keep a graph of ownership vs waiting │ • if there is a «loop» in the graph → «deadlock» One way to detect deadlocks is to use a dynamic counterpart of the static «circular wait» condition: in this case, the dependency graph has two types of nodes: threads and resources. Edges are always between a thread T and a resource R: R → T exists if thread T currently owns resource R, and T → R exists, if thread T is waiting for the acquisition of R. If a cycle exists in this graph, the system is in a deadlock (and the threads which lie on the cycle are all blocked). Let's revisit the example with two threads, P and Q, and two resources, A and B that we saw earlier: 1. P successfully «acquires» A (giving rise to edge A → P) 2. likewise, Q «acquires» B, meaning B → Q 3. P tries to «acquire» B but has to «wait», hence P → B 4. Q tries to «acquire» A but has to «wait», hence Q → A This is the resulting graph, with an obvious cycle in it: ┌───┐ ╭───╮ │ P │◀─│ A │ └───┘ ╰───╯ ▼ ▲ ╭───╮ ┌───┐ │ B │─▶│ Q │ ╰───╯ └───┘ │ Deadlock Recovery │ │ • if a preemptable resource is involved, «reassign» it │ • otherwise, it may be possible to do a «rollback» │ ◦ this needs elaborate «checkpointing» mechanisms │ • all else failing, «kill» some of the processes │ ◦ the devices may need to be «re-initialised» A deadlock involving at least one preemptable resource can still happen, but unlike in the ‘standard’ case, it is possible to recover and continue computation without forcibly terminating any of the threads or processes involved. Instead, one of the resources is preempted, breaking the cycle and allowing the system to make progress again. If the deadlock instead includes a restartable critical section (as outlined earlier), then this critical section can be rolled back and restarted, again allowing other threads to make progress. Finally, if all else fails, the system can pick a victim thread on the cycle and terminate it, releasing any resources it might have been holding. Like the other cases, this allows the system to progress again. │ Deadlock Avoidance │ │ • we can possibly «deny acquisitions» to avoid deadlocks │ • must know the «maximum» resources for each process │ • avoidance relies on «safe states» │ ◦ worst case: «all processes» ask for «maximum resources» │ ◦ «safe» means deadlocks are «avoided» in the «worst» case In general, deadlock avoidance is an approach where the system will deny resource acquisitions if they might later lead to a deadlock. The best-known instance of deadlock avoidance is Banker's Algorithm, invented by E. Dijkstra. This algorithm applies in cases where each resource has a number of fungible instances greater than any single thread might request at once. The input to the algorithm is the maximal number of instances of a resource that can be requested by each thread. The «safety» invariant is this: 1. there is a thread T, such that the available resources are sufficient to satisfy its maximal resource allocation, 2. when the thread T terminates (returning all its currently allocated resources), the invariant still holds. Any resource acquisitions which would violate this invariant are denied. The initial conditions imply that, at the outset, we are in a safe state; thus, we have demonstrated that the algorithm is correct. The algorithm assumes that nothing apart from resource acquisitions might block any of the threads. The following table illustrates the algorithm (P, Q and R are threads, the numbers are resources held / maximal allocation): │ step │ P │ Q │ R │ available │ action │ ├┄┄┄┄┄┄┼┄┄┄┄▶┤┄┄┄┄▶┤┄┄┄┄▶┤┄┄┄┄┄┄┄┄┄┄▶┤┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┤ │ 1 │ 0/3 │ 0/2 │ 0/4 │ 5/5 │ P acquires 1 │ │ 2 │ 1/3 │ 0/2 │ 0/4 │ 4/5 │ R acquires 2 │ │ 3 │ 1/3 │ 0/2 │ 2/4 │ 2/5 │ Q acquires 1 │ │ 4 │ 1/3 │ 1/2 │ 2/4 │ 1/5 │ P is denied 1 │ │ 5 │ 1/3 │ 1/2 │ 2/4 │ 1/5 │ Q acquires 1 │ │ 6 │ 1/3 │ 2/2 │ 2/4 │ 0/5 │ Q terminates │ │ 7 │ 1/3 │ – │ 2/4 │ 2/5 │ P acquires 1 │ │ 8 │ 2/3 │ – │ 2/4 │ 1/5 │ │ Notice that in step 4, the request by P has been denied, because there would no longer be any thread guaranteed to be able to make sufficient progress to release resources. │ Deadlock Prevention │ │ • deadlock avoidance is typically «impractical» │ • there are 4 «conditions» for deadlocks to exist │ • we can try attacking those conditions │ • if we can remove one of them, deadlocks are «prevented» Deadlock avoidance is, unfortunately, not very practical in general-purpose operating systems: it is often the case, that a particular resource has exactly 1 instance available, and the above algorithm would then force all programs which might use the resource to run one after another. Moreover, the assumption that there are no other interactions between threads is not very realistic either. │ Prevention via Spooling │ │ • this attacks the «mutual exclusion» property │ • multiple programs could write to a printer │ • the data is «collected» by a spooling daemon │ • which then sends the jobs to the printer «in sequence» This approach can trade a deadlock on the printer for a deadlock on disk space. However, disk space is much more likely to be preemptable in this scenario, since a job blocked by a full disk can be canceled (and erased from disk) and later retried (unlike a half-printed page). │ Prevention via Reservation │ │ • we can also try removing «hold-and-wait» │ • for instance, we can only allow «batch acquisition» │ ◦ the process must request everything at once │ ◦ this is usually «impractical» │ • alternative: release and «re-acquire» It is certainly conceivable that we might require that all resource acquisitions are done in batches, always requesting, as a single atomic action, all the resources required in a particular section of code. The resources may be released one at a time. If an additional resource is required while other resources are already held, the program must first release everything it holds before requesting the new batch. │ Prevention via Ordering │ │ • this approach eliminates «circular waits» │ • we impose a «global order» on «resources» │ • a process can only acquire resources «in this order» │ ◦ must release + «re-acquire» if the order is wrong │ • it is impossible to form a cycle this way Finally, the perhaps most practical approach «within a single program» (or another closed system, but not so much in an operating system) is to impose a global order on resource acquisition, which each thread must observe. This implies that the static «resource dependency graph» is acyclic and deadlocks cannot occur. │ Livelock │ │ • in a deadlock, no progress can be made │ • but it's not much better if processes go back and forth │ ◦ for instance releasing and re-acquiring resources │ ◦ they make no «useful» progress │ ◦ they additionally consume resources │ • this is a «livelock» and is just as bad as a deadlock In a deadlock, the blocked threads are all usually suspended (not runnable). If some or all of the threads involved in a deadlock-like condition are active (busy waiting, polling or otherwise attempting an action which is bound to fail forever), we talk about a livelock: threads are executing instructions, but progress is not being made. │ Starvation │ │ • starvation happens when a process «can't» make «progress» │ • «generalisation» of both «deadlock» and «livelock» │ • for instance, «unfair» scheduling on a busy system │ • also recall the «readers and writers» problem Finally, the most general notion is «starvation», which is a condition when a thread cannot make progress, for whatever reason: it is blocked on a resource, it is denied processor time, etc. An example that is neither a deadlock nor a livelock would arise in a naive solution to the readers & writers problem from earlier: if there are sufficiently many readers so that there is always a read lock on the resource, writers will be blocked indefinitely, and hence starved. │ Review Questions │ │ 21. What is a mutex? │ 22. What is a deadlock? │ 23. What are the conditions for a deadlock to form? │ 24. What is a race condition?