# System Libraries and APIs

In this section, we will study the programming interfaces of operating systems,
first in some generality, without a specific system in mind. We will then go on
to deal specifically with the C-language interface of POSIX systems.

│ Programming Interfaces
│ 
│  • kernel «system call» interface
│  • → «system libraries» / APIs ←
│  • inter-process protocols
│  • command-line utilities (scripting)

In most operating systems, the lowest-level interface accessible to application
programs is the «system call» interface. It is, typically, specified in terms of
a machine-language-level protocol (that is, an ABI), but usually also provided
as a C API. This is the case for POSIX-mandated system calls, but also on e.g.
Windows NT systems.

│ Lecture Overview
│ 
│  1. The C Programming Language
│  2. System Libraries
│    ◦ what is a library?
│    ◦ header files & libraries
│  3. Compiler & Linker
│    ◦ object files, executables
│  4. File-based APIs

In this lecture, we will start by reviewing (or perhaps introducing) the C
programming language. Then we will move on to the subject of libraries in
general and system libraries in particular. We will look at how libraries enter
the program compilation process and what other ingredients there are. Finally,
we will have a closer look at a specific set of file-based programming
interfaces.

│ Sidenote: UNIX and POSIX
│
│  • we will mostly use those terms interchangeably
│  • it is a «family» of operating systems
│    ◦ started in late 60s / early 70s
│  • POSIX is a «specification»
│    ◦ a document describing what the OS should provide
│    ◦ including programming interfaces
│
│ We will «assume POSIX» unless noted otherwise

Before we begin, it should be noted that throughout this course, we will use
POSIX and UNIX systems as examples. If a specific function or interface is
mentioned without further qualification, it is assumed to be specified by POSIX
and implemented by UNIX-like systems.

## The C Programming Language

The C programming language is one of the most commonly used languages in
operating system implementations. It is also the subject of PB071, and at this
point, you should be already familiar with its basic syntax. Likewise, you are
expected to understand the concept of a «function» and other basic building
blocks of programs. Even if you don't know the specific C syntax, the idea is
very similar to any other programming language you might know.

│ Programming Languages
│ 
│  • there are many different languages
│    ◦ C, C++, Java, C#, ...
│    ◦ Python, Perl, Ruby, ...
│    ◦ ML, Haskell, Agda, ...
│  • but «C» has a «special place» in most OSes

Different programming languages have different use-cases in mind, and exist at
different levels of abstraction. Most languages other than C that you will meet,
both at the university and in practice, are so-called high-level languages.
There are quite a few language families, and there is a number of higher-level
languages derived from C, like C++, Java or C#.

For the purposes of this course, we will mostly deal with plain C, and with
POSIX (Bourne-style) «shell», which can also be thought of as a programming
language.

│ C: The Least Common Denominator
│ 
│  • except for assembly, C is the “bare minimum”
│  • you can almost think of C as «portable assembly»
│  • it is very easy to call C functions
│  • and to use C data structures
│ 
│ You can use C libraries in almost every language

You could think of C as a ‘portable assembler’, with a few minor bells and
whistles in form of the standard library. Apart from this library of basic and
widely useful subroutines, C provides: abstraction from machine opcodes (with
human-friendly infix operator syntax), structured control flow, and automatic
local variables as its main advantages over assembly.

In particular the abstraction over the target processor and its instruction set
proved to be instrumental in early operating systems, and helped establish the
idea that an operating system is an entity separate from the hardware.

On top of that, C is also popular as a systems programming language because
almost any program, regardless of what language it is written in, can quite
easily call C functions and use C data structures.

│ The Language of Operating Systems
│ 
│  • many (most) kernels are «written in C»
│  • this usually extends to system libraries
│  • and sometimes to almost the entire OS
│  • non-C operating systems provide «C APIs»

Consequently, C has essentially become a ‘language of operating systems’: most
kernels and even the bulk of most operating systems is written in C. Each
operating system (apart from perhaps a few exceptions) provides a C standard
library in some form and can execute programs written in C (and more
importantly, provide them with essential services).

## System Libraries

We have already touched the topic of system libraries last week, in the
‘anatomy’ section. It is now time to look at them in more detail: what they
contain, how are they stored in the file system, how are they combined with
programs. We will also briefly talk about system call wrappers (which mediate
low-level access to kernel services – we will discuss this topic in more
detail in the next lecture). Finally, we will look at a few examples of
system libraries which appear in popular operating systems.

│ (System) Libraries
│ 
│  • mainly «C functions» and «data types»
│  • interfaces defined in «header files»
│  • definitions provided in «libraries»
│    ◦ static libraries (archives): ‹libc.a›
│    ◦ shared (dynamic) libraries: ‹libc.so›
│  • on Windows: ‹msvcrt.lib› and ‹msvcrt.dll›
│  • there are (many) more besides ‹libc› / ‹msvcrt›

In this course, when we talk about libraries, we will mean C libraries
specifically. Not Python or Haskell modules, which are quite different. That
said, a typical C library has basically two parts, one is header files which
provide a description of the interface (the API) and the compiled library code
(an archive or a shared library).

The interface (as described in header files) consists of functions (for which,
the types of arguments and the type of return value are given in a header file)
and of data structures. The bodies of the functions (their implementation) is
what makes up the compiled library code. To illustrate:

│ Declaration: «what» but not «how»
│ 
│     int sum( int a, int b ); /* C */
│ 
│ Definition: «how» is the operation done?
│ 
│     int sum( int a, int b ) /* C */
│     {
│         return a + b;
│     }

The first example on this slide is a declaration: it tells us the name of a
function, its inputs and its output. The second example is called a
«definition» (or sometimes a «body») of the function and contains the
operations to be performed when the function is called.

│ Library Files
│ 
│  • ‹/usr/lib› on most Unices
│    ◦ may be mixed with «application libraries»
│    ◦ especially on Linux-derived systems
│    ◦ also ‹/usr/local/lib› for user/app libraries
│  • on Windows: ‹C:\Windows\System32›
│    ◦ user libraries often «bundled» with programs

The machine code that makes up the library (i.e. the code that was generated
from function definitions) resides in files. Those files are what we usually
call ‘libraries’ and they usually live in a specific filesystem location. On
most UNIX system, those locations are ‹/usr/lib› and possibly ‹/lib› for system
libraries and ‹/usr/local/lib› for user or application libraries. On certain
systems (especially Linux-based), user libraries are mixed with system
libraries and they are all stored in ‹/usr/lib›.

On Windows, the situation is similar in that both system and application
libraries are installed in a common location. Additionally, on Windows (and on
macOS), shared libraries are often installed alongside the application.

│ Static Libraries
│ 
│  • stored in ‹libfile.a›, or ‹file.lib› (Windows)
│  • only needed for «compiling» (linking) programs
│  • the code is «copied» into the executable
│  • the resulting executable is also called «static»
│    ◦ and is easier to work with for the OS
│    ◦ but also more wasteful

Static libraries are only used when building executables and are not required
for normal operation of the system. Therefore, many operating systems do not
install them by default – they have to be installed separately as part of the
developer kit. When a static library is linked into a program, this basically
entails copying the machine code from the library into the final executable.

In this scenario, after linking is performed, the library is no longer needed
since the executable contains all the code required for its execution. For
system libraries, this means that the code that comes from the library is
present on the system in many copies, once in each program that uses the
library. This is somewhat alleviated by linkers only copying the parts of the
library that are actually needed by the program, but there is still substantial
duplication.

The duplication arising this way does not only affect the file system, but also
memory (RAM) when those programs are loaded – multiple copies of the same
function will be loaded into memory when such programs are executed.

│ Shared (Dynamic) Libraries
│ 
│  • required for «running» programs
│  • linking is done at «execution» time
│  • less code duplication
│  • can be «upgraded» separately
│  • but: dependency problems

The other approach to libraries is «dynamic», or «shared» libraries. In this
case, the library is required to actually run the program: the linker does not
copy the machine code from the library into the executable. Instead, it only
notes that the library must be loaded alongside with the program when the
latter is executed.

This reduces code duplication, both on disk and in memory. It also means that
the library can be updated separately from the application. This often makes
updates easier, especially in case a library is used by many programs and is,
for example, found to contain a security problem. In a static library, this
would mean that each program that uses the library needs to be updated. A
shared library can be replaced and the fixed code will be loaded alongside
programs as usual.

The downside is that it is difficult to maintain binary compatibility – to
ensure that programs that were built against one version of the library also
work with a later version. When this is violated, as often happens, people run
into dependency problems (also known as DLL hell on Windows).

│ Header Files
│ 
│  • on UNIX: ‹/usr/include›
│  • contains «prototypes» of C functions
│  • and definitions of C data structures
│  • required to «compile» C and C++ programs

Like static libraries, header files are only required when building programs,
but not when using them. Header files are fragments of C source code, and on
UNIX systems are traditionally stored in ‹/usr/include›. User-installed header
files (i.e. not those provided by system libraries) live under
‹/usr/local/include› (though again, on Linux-based systems user and system
headers are often intermixed in ‹/usr/include›).

│ Header Example 1 (from ‹unistd.h›)
│ 
│     int      execv(char *, char **); /* C */
│     pid_t    fork(void);
│     int      pipe(int *);
│     ssize_t  read(int, void *, size_t);
│ 
│ (and many more prototypes)

This is an excerpt from an actual system header file, and declares a few of the
functions that comprise the POSIX C API.

│ Header Example 2 (from ‹sys/time.h›)
│ 
│     struct timeval /* C */
│     {
│         time_t   tv_sec;
│         long     tv_usec;
│     };
│     
│     /* ... */
│     
│     int gettimeofday(timeval *, timezone *);
│     int settimeofday(timeval *, timezone *);

This is another excerpt from an actual header – this time the snippet
contains a definition of a «data structure». The layout (order of fields and
their types, along with hidden «padding») of such structures is quite
important, since that becomes part of the ABI. In other words, the definition
above describes not just the high-level interface but also how bytes are laid
out in memory.

│ The POSIX C Library
│ 
│  • ‹libc› – the C runtime library
│  • contains ISO C functions
│    ◦ ‹printf›, ‹fopen›, ‹fread›
│  • and a number of POSIX functions
│    ◦ ‹open›, ‹read›, ‹gethostbyname›, ...
│    ◦ C wrappers for system calls

As we have already mentioned previously, it is a tradition of UNIX systems
that ‹libc› combines the basic C library and the basic POSIX library. For the
following, a particular subset of the POSIX library is going to be rather
important, namely the «system call wrappers». Those are C functions whose only
purpose is to invoke their matching «system calls».

│ System Calls: Numbers
│ 
│  • system calls are performed at «machine level»
│  • which syscall to perform is decided by a «number»
│    ◦ e.g. ‹SYS_write› is 4 on OpenBSD
│    ◦ numbers defined by ‹sys/syscall.h›
│    ◦ different for each OS

At the level of the OS kernel (cue next week), system calls are represented by
«numbers» (which are often given symbolic names like ‹SYS_write›, but are
nonetheless just small integers and not memory addresses like with ordinary C
functions). The numbers are specific to any given kernel. And of course, the
‹libc› must use the same numbering as the kernel.

│ System Calls: the ‹syscall› function
│ 
│  • there is a C function called ‹syscall›
│    ◦ prototype: ‹int syscall( int number, ... )›
│  • this implements the «low-level» syscall sequence
│  • it takes a «syscall number» and syscall parameters
│    ◦ this is a bit like ‹printf›
│    ◦ first parameter decides what are the other parameters
│  • (more about how ‹syscall()› works next week)

Typically, all system calls work essentially the same: the library takes the
(syscall) number and some additional data (parameters), stores them at the
pre-arranged location (registers, memory) and jumps into the kernel. Since
this sequence is uniform across system calls, it is possible to have a single
C function which can perform any system call, given its number.

This function actually exists and is called ‹syscall›. It's entirely possible
to perform all your syscalls using this one C function, and never call the
more convenient single-purpose wrappers (see also below).

│ System Calls: Wrappers
│ 
│  • using ‹syscall()› directly is inconvenient
│  • ‹libc› has a function for each system call
│    ◦ ‹SYS_write› → ‹int write( int, char *, size_t )›
│    ◦ ‹SYS_open› → ‹int open( char *, int )›
│    ◦ and so on and so forth
│  • those wrappers may use ‹syscall()› internally

To make programming a fair bit more convenient, instead of saying

    syscall( SYS_write, fd, buffer, size ); /* C */

we can use a function called ‹write›, like this:

    write( fd, buffer size ); /* C */

Besides being shorter to type, it is also safer: the compiler can
check that we passed the right number and types of arguments. The
function might internally use the equivalent ‹syscall()› invocation
– though in practice, we prefer to sacrifice this particular bit of
abstraction to save a few instructions on the comparatively hot (hot
= one that is executed often) code path. That is, each syscall
wrapper contains a copy of the code for entering the kernel, instead
of calling ‹syscall›.

│ Portability
│ 
│  • libraries provide an «abstraction layer» over OS internals
│  • they are responsible for «application portability»
│    ◦ along with standardised filesystem locations
│    ◦ and user-space utilities to some degree
│  • higher-level languages rely on system libraries

An important function of libraries is to provide a uniform API to the upper
layers of the system. The designers of an operating system may decide to
substantially depart from the traditional system call protocol, or even from
the traditional set of system calls. However, even if the kernel looks quite
non-POSIX-y, it is often still possible to provide a set of C functions that
behave as POSIX specifies. This has been done more than once, most often on
top of microkernels, e.g. Microsoft NT (Windows NT, XP and later) or on Mach
(macOS, HURD). All those systems are capable of supporting POSIX programs
without being built around a UNIX-like monolithic kernel.

Of course, the API alone is not sufficient to make POSIX programs work
correctly: there are certain expectations about the filesystem (both semantics
of the file system itself, but also which files exist and what they contain)
and other aspects of the system.

│ NeXTSTEP and Objective C
│ 
│  • the NeXT OS was built around «Objective C»
│  • system libraries had ObjC APIs
│  • in API terms, ObjC is very «different from C»
│    ◦ also very different from C++
│    ◦ traditional «OOP» features (like Smalltalk)
│  • this has been partly inherited into «macOS»
│    ◦ Objective C evolved into Swift

Not all operating systems provide (exclusively) C APIs. Historically, one of
the earlier departures was the NeXT operating system, which used Objective C
extensively. While the procedural part of the language is simply C, the
object-oriented part is based on Smalltalk, with pervasive late binding and
dynamic types.

│ System Libraries: UNIX
│ 
│  • the math library ‹libm›
│    ◦ implements math functions like ‹sin› and ‹exp›
│  • thread library ‹libpthread›
│  • terminal access: ‹libcurses›
│  • cryptography: ‹libcrypto› (OpenSSL)
│  • the C++ standard library ‹libstdc++› or ‹libc++›

While ‹libc› is quite central, there are many other libraries that are part of
a UNIX system. You would find most of the above examples on most UNIX systems
in some form.

│ System Libraries: Windows
│ 
│  • ‹msvcrt.dll› – the ISO C functions
│  • ‹kernel32.dll› – basic OS APIs
│  • ‹gdi32.dll› – Graphics Device Interface
│  • ‹user32.dll› – standard GUI elements

System libraries look quite differently on Windows: there is no ‹libc›:
instead, the C standard library has its own DLL (the ‹msvcrt›, from MicroSoft
Visual C RunTime) while operating system services (the low-level kind) live in
‹kernel32.dll›. The other two libraries allow applications to provide a
graphical user interface. The libraries mentioned here all provide C APIs,
though there are also C++ and C# interfaces (which are partly wrappers around
the above libraries, but not exclusively).

│ Documentation
│ 
│  • manual pages on UNIX
│    ◦ try e.g. ‹man 2 write› on ‹aisa.fi.muni.cz›
│    ◦ section 2: system calls
│    ◦ section 3: library functions (‹man 3 printf›)
│  • MSDN for Windows
│    ◦ <https://msdn.microsoft.com>
│  • you can learn «a lot» from those sources

Most OS vendors provide extensive documentation of their programmer's
interfaces. On UNIX, this is typically part of the OS installation itself
(manual pages, command ‹man›), while on Windows, this is a separate resource
(these days accessible online, previously distributed in print or on optical
media).

## Compiler & Linker

While compiling (and linking) programs is not core functionality of an
operating system, it is quite useful to understand how these components work.
Moreover, in earlier systems, a C compiler was considered a rather essential
component and this tradition continues in many modern UNIX systems to this
day. We will discuss different artefacts of compilation – object files,
libraries and executables, as well as the process of linking object code and
libraries to produce executables. We will also highlight the differences
between static and shared (dynamic) libraries.

│ C Compiler
│ 
│  • many POSIX systems ship with a «C compiler»
│  • the compiler takes a C «source file» as input
│    ◦ a text file with a ‹.c› suffix
│  • and produces an «object file» as its output
│    ◦ binary file with machine code in it
│    ◦ but cannot be directly executed

Compilers transform human-readable programs into machine-executable programs.
Of course, both those forms of the program need to be stored in memory: the
first is usually in the form of «plain text» (usually encoded as UTF-8, or in
older systems as ASCII). In this form, bytes stored in the file encode
human-readable letters.

On the output side, the file is «binary» (which is really just a catch-all
term for files that are not plain text), and stores machine-friendly
«instructions» – primitive operations that the CPU can execute. Only the
compiler output cannot be directly executed yet, even though most of the
instructions are in their final form.

The missing piece are addresses: numbers which describe memory locations
within the program itself (they may point at instructions or at data embedded
in the program). At this stage, though, neither code nor data has been
assigned to particular addresses, and hence the program cannot be executed (it
will need to be «linked» first, more on that later).

│ Object Files
│ 
│  • contain native «machine» (executable) code
│  • along with static data
│    ◦ e.g. string literals used in the program
│  • possibly split into a number of «sections»
│    ◦ ‹.text›, ‹.rodata›, ‹.data› and so on
│  • and metadata
│    ◦ list of «symbols» (function names) and their addresses

The purpose of object files is to store this semi-finished machine code, along
with any static data (like string literals or numeric constants) that appear
in the program. All this is sorted into «sections» – usually one section for
machine code (also called text and called ‹.text› in the object file), another
for read-only data (e.g. string literals), called ‹.rodata›, another for
mutable but statically-initialized variables – ‹.data›. Bundled with all this
is «metadata», which describes the content of the file (again in a
machine-readable form).

One example of such metadata is a «symbol table», which gives file-relative
addresses of high-level functions that have been compiled into the object
file. That is, the compiler will take a definition of a function that we wrote
in C and emit machine code for this function. The ‹.text› section of an object
file will consist of a number of such functions, one after another: the symbol
table then tells us where each of the functions begins.

│ Object File Formats
│ 
│  • ‹a.out› – earliest UNIX object format
│  • COFF – Common Object File Format
│    ◦ adds support for sections over ‹a.out›
│  • PE – Portable Executable (MS «Windows»)
│  • Mach-O – Mach Microkernel Executable («macOS»)
│  • «ELF» – Executable and Linkable Format (all modern Unices)

There is a number of different physical layouts of object files, and each of
those also carries slightly different semantics. By far the most common format
used in POSIX systems is «ELF». The other common formats in contemporary use
are «PE» (used by MS operating systems) and «Mach-O» (used by Apple operating
systems).

│ Archives (Static Libraries)
│ 
│  • static libraries on UNIX are called «archives»
│  • this is why they get the ‹.a› suffix
│  • they are like a ‹zip› file full of «object files»
│  • plus a table of symbols (function names)

An archive is the simplest way to bundle multiple object files. As the name
implies, it is essentially just a collection of object files stored as a
single file. Each object file retains its identity and its content does not
change in any way when it is bundled into an archive.

The only difference from a typical data archive (a ‹tar› or a ‹zip› archive,
say) is that besides the object files themselves, the archive contains an
additional metadata section – a symbol table, or rather a symbol index. If
someone (typically the linker) needs to find the definition of a particular
function (symbol), it can first consult this archive-wide index to find which
object file provides that symbol. This makes linking more efficient, since the
linker does not need to sequentially scan each object file in the archive to
find the definition.

│ Linker
│ 
│  • object files are «incomplete»
│  • they can refer to «symbols» that they do not define
│    ◦ the definitions can be in libraries
│    ◦ or in other object files
│  • a «linker» puts multiple object files together
│    ◦ to produce a «single executable»
│    ◦ or maybe a shared library

As pointed out earlier, it is the job of a «linker» to combine object files
(and libraries) into executables. The process is fairly involved, so we will
describe it across the next few slides. The «input» to the linker is a bunch of
«object files» and the output is a single «executable» or sometimes a single
«shared library».

Even though archives are handled specially by the linker, object files which
are given to the linker directly will always become part of the final
executable. Object files provided in archives are only used if they provide
symbols which are required to complete the executable.

│ Symbols vs Addresses
│ 
│  • we use symbolic «names» to call functions &c.
│  • but the ‹call› machine instruction needs an «address»
│
│  • the executable will eventually live in memory
│  • data and instructions need to be given «addresses»
│  • what a linker does is «assign» those addresses

The main entities that come up during linking are «symbols» and «addresses».
In a program, the machine code and the data is loaded in memory, and as we
know, each memory location has an «address». The program in its compiled form
can use addresses to refer to parts of itself. For instance, to call a
subroutine, we provide its starting address to a special ‹call› instruction,
which tells the CPU to start executing code from that address.

However, when humans write programs, they do not assign addresses to pieces of
data, to functions or to individual instructions. Instead, if the program
needs to refer to a part of itself, we give those parts names: those names are
known as «symbols». It is the shared responsibility of the compiler and the
linker to assign addresses to the individual symbols, in such a way that the
objects stored in memory do not conflict (overlap).

If you think about it, it would be very difficult to do by hand: we usually
don't know how long the machine code will be for any given function, and we
would need to guess and then add gaps in case we need to add more code to a
function, and so on. And we would need to remember which code lives at which
address and so on. It is all very uncomfortable, and even assembly programmers
usually avoid assigning addresses by hand. In fact, one of the primary roles
of an assembler is to translate from symbolic to numeric addresses. But I
digress.

│ Resolving Symbols
│ 
│  • the linker processes one object file at a time
│  • it maintains a «symbol table»
│    ◦ mapping symbols (names) to addresses
│    ◦ dynamically updated as more objects are processed
│  • relocations are typically processed all at once at the end
│  • «resolving symbols» = finding their addresses

The linker works by maintaining an ‘incomplete executable’ and makes progress
by merging each of the input object files into this work-in-progress file. The
strategy for assigning final addresses is simple enough: there's a single
output ‹.text› section, a single output ‹.data› section and so on. When an
input file is processed, its own ‹.text› section is simply appended to the
‹.text› produced so far. The same process is repeated for every section.

The symbol tables of the input object files are likewise merged one by one,
and the addresses adjusted as symbols are added. In addition to symbol
«definitions», object files contain symbol «uses» – those are known as
relocations, and are stored in a relocation table. Relocations contain the
«address of the instruction» that needs to be patched and the «symbol» the
address of which is to be patched in. Like the sections themselves and the
symbol table, the relocation table is built up.

The relocations are also processed by the linker: usually, this means writing
the final address of a particular symbol into an as-of-yet incomplete
instruction or into a variable in the data section. This is usually done once
the output symbol table is complete.

The relocation and symbol tables are often discarded at the end (but may be
retained in the output file in some cases – the symbol table more often than
the relocation table).

│ Executable
│ 
│  • finished «image» of a program to be executed
│  • usually in the same format as «object files»
│  • but already complete, with symbols resolved
│    ◦ «but:» may use «shared libraries»
│    ◦ in that case, «some» symbols remain unresolved

The output of the linker is, in the usual case, an «executable». This is a
file that is based on the same format as object files of the given operating
system, but is «complete» in some sense. In static executables (those which
don't use shared libraries), all references and relocations are already
resolved and the program can be loaded into memory and directly executed by
the CPU, without further adjustments.

It is also worth noting that the addresses that the executable uses when
referring to parts of itself are «virtual addresses» (this is also the case
with shared libraries below). We will talk more about those in a later
lecture, but right now we can at least say that this means that different
programs on the same operating system can use overlapping addresses for their
instructions and data. This is not a problem, because virtual addresses are
private to each process, and hence each copy of each executing program.

│ Shared Libraries
│ 
│  • each shared library only needs to be in memory once
│  • shared libraries use «symbolic names» (like object files)
│  • there is a “mini linker” in the OS to resolve those names
│    ◦ usually known as a «runtime» linker
│    ◦ resolving = finding the addresses
│  • shared libraries can use other shared libraries
│    ◦ they can form a «DAG» (Directed Acyclic Graph)

The downside of static libraries is that they need to be loaded separately
(often in slightly different versions) along with each program that uses them:
in fact, since the linker embedded them into the program, they are quite
inseparable from it.

As we have already mentioned, this is not very efficient. Instead, we can
store the library code in separate executable-like files that get loaded into
the address space of programs that need it. Of course, relocations in the main
program that refer to symbols from shared libraries (and vice versa), and
obviously also relocations in shared libraries that refer to other shared
libraries, those need to be resolved. This is usually done either when the
program is loaded into memory, or lazily, right before the relocation is first
used.

In either case, there needs to be a program which will resolve those
relocations: this is the «runtime linker» – it is superficially similar to
the normal, compile-time linker, but in reality is quite different.

│ Addresses Revisited
│ 
│  • when you run a program, it is «loaded into memory»
│  • parts of the program refer to other parts of the program
│    ◦ this means they need to know «where» it will be loaded
│    ◦ this is a responsibility of the «linker»
│  • shared libraries use «position-independent code»
│    ◦ works regardless of the base address it is loaded at
│    ◦ we won't go into detail on how this is achieved

We mentioned that executables and libraries use virtual addresses to refer to
their own parts. However, this does not help in shared libraries as much as it
helps in executables. The letdown is that we want to load the same library
along with multiple programs: but if the addresses used by the library are
fixed, this means that the library needs to be loaded at the same start
address into each program that uses that library. This quickly becomes
impractical as we add more libraries into the system – no two libraries would
be allowed to overlap and none of them would be allowed to overlap with any
of the executables.

In practice, what we instead do is that we compile the libraries in such a way
that they don't use absolute addresses to refer to parts of themselves. This
often adds a little execution overhead, but makes it possible to load the
library at any address range that is available in the current process. This
makes the job of the runtime linker much easier.

│ Compiler, Linker &c.
│ 
│  • the C compiler is usually called ‹cc›
│  • the linker is known as ‹ld›
│  • the archive (static library) manager is ‹ar›
│  • the «runtime» linker is often known as ‹ld.so›

On many UNIXes, the compiler and the linker are available as part of the
system itself. The command names are standardized.

## File-Based APIs

On POSIX systems, the API for using the filesystem is a very important one,
because it in fact provides access to a number of additional resources, which
appear as ‘abstract’ (special) files in the system.

│ Everything is a File
│ 
│  • part of the UNIX «design philosophy»
│  • «directories» are files
│  • «devices» are files
│  • «pipes» are files
│  • network connections are (almost) files

File is an abstraction: it is an object from which we can read bytes and into
which we can write bytes (not all files will let us do both). In regular
files, we can read and write at any offset, and if we write something we can
later read that same thing (unless it was rewritten in the meantime).

Directories are somewhat like this: we can read bytes from them to find out
what files are present in that directory and how to find them in the file
system. We can create new entries by writing into the directory. Incidentally,
this is not how things are usually done, but it's not hard to imagine it could
be.

Quite a few «devices» (peripherals) behave this way: all kinds of hard drives
(just a big bunch of bytes), printers (write some bytes to have them printed),
scanners (write bytes to send commands, read bytes with the image data), audio
devices (read bytes from microphones, write bytes into speakers), and so on.

Pipes are like that too: one program writes bytes, and another reads them.
And network connections are more or less just pipes that work across the
network.

│ Why is Everything a File
│ 
│  • «re-use» the comprehensive «file system API»
│  • re-use existing file-based command-line tools
│  • bugs are bad → «simplicity» is good
│  • want to print? ‹cat file.txt > /dev/ulpt0›
│    ◦ (reality is a little more complex)

Since we already have an API to work with «abstract files» (because we need to
work with real files anyway), it becomes reasonable to ask why not use this
existing API to work with other objects that look like files. It makes sense
not just at the level of C functions, but at the level of command-line
programs too. In general, re-using existing mechanisms makes things more
flexible, and often also simpler. Of course, there are caveats (devices often
need to support operations that don't map well to reading or writing bytes,
sockets are also somewhat problematic).

│ What is a Filesystem?
│ 
│  • a set of «files» and «directories»
│  • usually lives on a single block device
│    ◦ but may also be virtual
│  • directories and files form a «tree»
│    ◦ directories are internal nodes
│    ◦ files are leaf nodes

While we have a decent idea of what a «file» is, what about a file «system»?
Well, a file system is a collection of files and directories, typically stored
on a single block device. The directories and files form a tree (at least
until symlinks come into play, at which point things start going south).
Regular files are always leaf nodes in this tree.

│ File Paths
│ 
│  • filesystems use «paths» to point at files
│  • a string with ‹/› as a directory delimiter
│    ◦ the delimiter is ‹\› on Windows
│  • a leading ‹/› indicates the «filesystem root»
│  • e.g. ‹/usr/include›

Paths are how we refer to files and directories within the tree. The top-level
(«root») directory is named ‹/›. Each directory «entry» carries a name (and a
link to the actual file or directory it represents) – this name can be used
in the path to refer to the given entity. So with a path like ‹/usr/include›,
we start at the «root directory» (the initial slash), then in that directory,
we look for an entity called ‹usr› and when we find it, we check that it is a
directory again. If that is so, we then look at its direct descendants again
and look for an entity labelled ‹include›.

│ The File Hierarchy
│ 
│               ╭───╮
│       ┌───────│ / │──────────┐
│       │       ╰───╯          │
│       ▼         ▼            ▼
│    ╭─────╮   ╭─────╮      ╭─────╮
│    │ home│   │ var │   ┌──│ usr │──┐
│    ╰─────╯   ╰─────╯   │  ╰─────╯  │
│       ▼                ▼           ▼
│   ╭───────╮        ╭───────╮    ╭─────╮
│   │xrockai│ ┌──────│include│    │ lib │──────┐
│   ╰───────╯ │      ╰───────╯    ╰─────╯      │
│             ▼          ▼           ▼         ▼
│         ╭───────╮ ╭─────────╮  ╭───────╮ ╭───────╮
│         │stdio.h│ │unistd.h │  │ libc.a│ │ libm.a│
│         ╰───────╯ ╰─────────╯  ╰───────╯ ╰───────╯

That's an example of a file system tree. You can practice looking up various
paths in the tree, using the algorithm described above.

│ The Role of Files and Filesystems
│ 
│  • «very» central in «Plan9»
│  • central in most UNIX systems
│    ◦ cf. Linux pseudo-filesystems
│    ◦ ‹/proc› provides info about all processes
│    ◦ ‹/sys› gives info about the kernel and devices
│  • somewhat «reduced» in Windows
│  • quite «suppressed» in Android (and more on iOS)

Different operating systems put different emphasis on the file system. We will
take the way POSIX positions the file system as the baseline – in this case,
the file system is quite central: in addition to regular files and
directories, all sorts of special files appear in the file system and provide
access to various OS facilities. However, there are also many services and
APIs that are not based on the file system, including e.g. process management,
memory management and so on. In many UNIX-like systems, the reliance on
FS-based APIs is notched up a bit: e.g. process management is done via a
virtual ‹/proc› filesystem (many different systems), or device discovery and
configuration via ‹/sys› (Linux). Another level above that is Plan9, where
essentially everything that can be made into a file system is made into one.
Another experimental system, GNU/Hurd, has a similar ambition.

If we go the other way from POSIX, we have the native Windows APIs, which
emphasise the file system much less than would be typical in POSIX. Most
objects have dedicated APIs, even if they are rather file-like. However, the
file system is still prominently present both in the APIs and in the user
interface. Both are further suppressed by modern ‘scaled-down’ operating
systems like Android and iOS (even if both are POSIX-compatible under the
hood, ‘normal’ applications are not allowed to access the POSIX API, or the
file system, and it is usually also hidden from users).

│ The Filesystem API
│ 
│  • you «open» a file (using the ‹open()› syscall)
│  • you can ‹read()› and ‹write()› data
│  • you ‹close()› the file when you are done
│  • you can ‹rename()› and ‹unlink()› files
│  • you can use ‹mkdir()› to create directories

So how does the file system API look on POSIX systems? To work with a file,
you usually need to ‹open› it first: you supply a «path» and some flags to
tell the OS what you intend to do with the file. More on that in a short
while. When you have a file open, you can ‹read› data from it and ‹write› data
into it. When you are done, you use ‹close› to free up the associated
resources. To work with directories, you usually don't need to ‹open› them
(though you can). You can rename files (this is a directory operation) using
‹rename›, remove them from the file system hierarchy using ‹unlink› (this
erases the corresponding directory entry), and you can create new directories
using ‹mkdir›.

│ File Descriptors
│ 
│  • the kernel keeps a table of open files
│  • the «file descriptor» is an index into this table
│  • you do everything using file descriptors
│  • non-Unix systems have similar concepts
│    ◦ descriptors are called «handles» on Windows

Remember ‹open›? When we want to work with a file, we need a way to identify
that file, and paths are not super convenient in this respect: someone could
rename the file we were working with, and suddenly it is gone, or worse, the
file could be replaced by a different file or even a directory. Additionally,
looking up a file by its path is a comparatively expensive operation: the OS
has to read every directory mentioned in the «path» and run a lookup on it.
While this information is often cached in RAM, it still takes valuable time.

When we open a file, we get back a «file descriptor» – this is a small
integer, and using this descriptor as an index into a table, the kernel can
look up all the metadata it needs (to carry out reads and writes) in constant
time. The descriptor is also associated with the file directly, so if the file
is moved around or even unlinked from the directory tree, the descriptor still
points to the same file.

Most non-POSIX file system APIs have a similar notion (sometimes open does not
return a number but a different data type, e.g. a pointer, and sometimes this
value is called a «handle» instead of a descriptor... but the concept is more
or less the same).

│ Regular files
│ 
│  • these contain «sequential data» (bytes)
│  • may have inner structure but the OS does not care
│  • there is «metadata» attached to files
│    ◦ like when were they last modified
│    ◦ who can and who cannot access the file
│  • you ‹read()› and ‹write()› files

A regular file is what it appears to be. It is a sequence of bytes, stored on
a persistent storage device and has metadata associated that makes it possible
to locate all that data in actual disk sectors. The bytes inside the file are
of no concern to the operating system. When data is read from a file, the
operating system consults the file system metadata to find the particular
sectors on disk that store the content. When data is overwritten, the same
thing happens but those sectors are rewritten with the new data. When new data
is appended, the operating system looks up some free space on the disk, then
adjusts the file metadata to point at the (now taken) sectors and writes the
data in there. There is some additional metadata stored alongside each file,
like whom it belongs to or when it was modified.

│ Directories
│ 
│  • a «list» of files and other directories
│    ◦ internal nodes of the filesystem tree
│    ◦ directories give names to files
│  • can be opened just like files
│    ◦ but ‹read()› and ‹write()› is not allowed
│    ◦ files are created with ‹open()› or ‹creat()›
│    ◦ directories with ‹mkdir()›
│    ◦ directory listing with ‹opendir()› and ‹readdir()›

A directory is a (potentially) internal node in the file hierarchy:
their role is to give «names» to files, making them accessible via
«paths». Like regular files, directories are self-contained objects,
but instead of raw bytes, they contain structured data: namely, a
directory maps file names to other files (those can be regular
files, other directories, or one of the special file types we will
talk about shortly).

In principle, it would be possible to implement ‹read› and ‹write›
for directories, but this would be problematic: if those functions
dealt with the actual on-disk representation of a directory, user
programs could easily corrupt directory entries. This is quite
undesirable: instead, under normal circumstances, directories are
used via «paths»: when we present a file path to ‹open›, the
operating system will automatically traverse directories as needed.

Of course, user programs sometimes need to iterate through all
directory entries, i.e. list all files in a given directory. To this
end, POSIX provides the ‹opendir› function along with ‹readdir›,
‹seekdir›, ‹closedir› and so on. These functions provide a
high-level API for interacting with directories. Nonetheless, this
API is read-only: directory entries are created whenever files are
created using the corresponding path, e.g. using ‹mkdir› or ‹open›
with the ‹O_CREAT› flag.

│ Mounts
│ 
│  • UNIX joins all file systems into a single hierarchy
│  • the root of one filesystem becomes a directory in another
│    ◦ this is called a «mount point»
│  • Windows uses «drive letters» instead (‹C:›, ‹D:› &c.)

A single computer (and hence, a single operating system) may have
more than one hard drive available to it. In this case, it is
customary that each such device contains its own file system: the
question arises, how to present such multiple file systems to the
user. The UNIX strategy is to present all the file systems within a
single directory tree: to this end, one of the file systems is
picked as a «root» file system: in this (and only in this) file
system, the FS root directory ‹/› is the same as the system root
directory. All other file systems are joined existing directories of
other file systems at their root. Consider two file systems:

       ╭────╮                ╭────╮
       │  / │                │  / │
       ╰────╯                ╰────╯
       ┌─╯╰─────┐         ┌───╯  ╰──┐
       ▼        ▼         ▼         ▼
    ╭─────╮  ╭─────╮  ╭───────╮  ╭─────╮
    │ home│  │ usr │  │include│  │ lib │
    ╰─────╯  ╰─────╯  ╰───────╯  ╰─────╯
                          │        │   ╰──────┐
                          ▼        ▼          ▼
                      ╭───────╮ ╭───────╮ ╭───────╮
                      │stdio.h│ │ libc.a│ │ libm.a│
                      ╰───────╯ ╰───────╯ ╰───────╯

If we now «mount» the second file system onto the ‹/usr› directory of the
first, we get the following unified hierarchy:

       ╭────╮       
       │  / │       
       ╰────╯       
       ┌─╯╰─────┐   
       ▼        ▼   
    ╭─────╮  ┌─────┐
    │ home│  │ usr │
    ╰─────╯  └─────┘
          ┌───╯  ╰──┐
          ▼         ▼
      ╭───────╮  ╭─────╮
      │include│  │ lib │
      ╰───────╯  ╰─────╯
          │        │   ╰──────┐
          ▼        ▼          ▼
      ╭───────╮ ╭───────╮ ╭───────╮
      │stdio.h│ │ libc.a│ │ libm.a│
      ╰───────╯ ╰───────╯ ╰───────╯ 

Usually, file systems are mounted onto «empty» directories: if ‹/usr› was not
empty on the left (root) file system, its content would be hidden by the
mount.

The other strategy is to present multiple file systems using multiple separate
trees. This is the strategy implemented by the MS Windows family of operating
systems: each file system is assigned a single letter, and each becomes its
own, separate tree.

│ Pipes
│ 
│  • pipes are a simple communication device
│  • one program can ‹write()› data to the pipe
│  • another program can ‹read()› that same data
│  • each end of the pipe gets a «file descriptor»
│  • a pipe can live in the filesystem («named» pipe)

Pipes are somewhat like files in that it's possible to write data
(bytes) into them, and read data from them. In most cases, the
program doing the writing is a different program from the one doing
the reading. Unlike a regular file, the data is not permanently
stored anywhere: it disappears from the pipe as soon as it is read.

Of course, there is a buffer associated with a pipe, but it is only
stored in RAM. This allows the writing process to write data even if
the other end is not actively reading at the same time: the OS will
buffer the write until such time it can be read.

Normally, a pipe is an anonymous device, only accessible via file
descriptors. When these are closed, the device is destroyed. There
is another variant of a pipe, though, called a «named pipe» which is
given a name in the file system. This does not mean the «data» is
stored anywhere: a named pipe operates just like an anonymous pipe,
the difference is that it can be passed to ‹open› using its path.

│ Devices
│ 
│  • «block» and «character» devices are (special) «files»
│  • block devices are accessed one «block at a time»
│    ◦ a typical block device would be a «disk»
│    ◦ includes USB mass storage, flash storage, etc
│    ◦ you can create a «file system» on a block device
│  • «character» devices are more like normal files
│    ◦ terminals, tapes, serial ports, audio devices

As we have already mentioned, many peripheral devices look like
sequences of bytes, or possibly as sequences of blocks. A typical
block device is «addressable»: the user can seek to a particular
location of the device and read a chunk of data (an integer number
of blocks). Unless someone writes to a particular location of the
device, reading from the same address multiple times will yield the
same data.

On the other hand, character devices often behave rather like pipes,
in the sense that if a program writes some bytes into the device,
reading the device will not yield those same bytes. Instead,
character devices usually mediate communication with a peripheral
that consumes bytes (which are written into the device) and/or
provides some output (which is what the program gets when it reads
from the device). Consider a printer: writing bytes into the
printer's character device will cause those bytes to be printed
(after possibly being interpreted by the printer).

Another example would be that after a scanner has been instructed to
scan a document, the pixels captured by its optical sensor can be,
in some form, extracted by reading from its character device.
Essentially, these types of character devices behave like a pipe,
but instead of another program, the other end is a hardware device
(or most likely its firmware).

│ Sockets
│ 
│  • the socket API comes from early BSD Unix
│  • socket represents a (possible) «network connection»
│  • sockets are more complicated than normal files
│    ◦ establishing connections is hard
│    ◦ messages get lost much more often than file data
│  • you get a «file descriptor» for an open socket
│  • you can ‹read()› and ‹write()› to sockets

Sockets are, in some sense, a generalization of pipes. There are
essentially 3 types of sockets:

 1. a «listening» socket, which allows many «clients» to connect to
    a single «server» – strictly speaking, these sockets do not
    transport data, instead, they allow processes to establish
    «connections»,
 2. a «connected» socket, one of which is created for each
    connection and which behaves essentially like a bidirectional
    pipe (standard pipes being unidirectional),
 3. a «datagram» socket, which can be used to send data without
    establishing connections, using special send/receive API.

While the third is rather special and un-pipe-like, the first two
are usually used together as a point-to-multipoint means of
communication: the server listens on an «address», and any client
which has this address can establish communication with the server.
This is quite unlike pipes, which usually need to be pre-arranged
(i.e. the programs must already be aware of each other).

│ Socket Types
│ 
│  • sockets can be «internet» or «unix domain»
│    ◦ internet sockets connect to other computers
│    ◦ Unix sockets live in the filesystem
│  • sockets can be «stream» or «datagram»
│    ◦ stream sockets are like pipes
│    ◦ you can write a continuous «stream» of data
│    ◦ datagram sockets can send individual «messages»

There are two basic address types: internet sockets, which are used
for inter-machine communication (using TCP/IP), and «unix domain
sockets», which are used for local communication. A unix socket is
like a named pipe: it has a path in the file system, and client
programs can use this path to establish a connection to the server.

│ Review Questions
│ 
│  5. What is a shared (dynamic) library?
│  6. What does a linker do?
│  7. What is a symbol in an object file?
│  8. What is a file descriptor?