# System Libraries and APIs In this section, we will study the programming interfaces of operating systems, first in some generality, without a specific system in mind. We will then go on to deal specifically with the C-language interface of POSIX systems. │ Programming Interfaces │ │ • kernel «system call» interface │ • → «system libraries» / APIs ← │ • inter-process protocols │ • command-line utilities (scripting) In most operating systems, the lowest-level interface accessible to application programs is the «system call» interface. It is, typically, specified in terms of a machine-language-level protocol (that is, an ABI), but usually also provided as a C API. This is the case for POSIX-mandated system calls, but also on e.g. Windows NT systems. │ Lecture Overview │ │ 1. The C Programming Language │ 2. System Libraries │ ◦ what is a library? │ ◦ header files & libraries │ 3. Compiler & Linker │ ◦ object files, executables │ 4. File-based APIs In this lecture, we will start by reviewing (or perhaps introducing) the C programming language. Then we will move on to the subject of libraries in general and system libraries in particular. We will look at how libraries enter the program compilation process and what other ingredients there are. Finally, we will have a closer look at a specific set of file-based programming interfaces. │ Sidenote: UNIX and POSIX │ │ • we will mostly use those terms interchangeably │ • it is a «family» of operating systems │ ◦ started in late 60s / early 70s │ • POSIX is a «specification» │ ◦ a document describing what the OS should provide │ ◦ including programming interfaces │ │ We will «assume POSIX» unless noted otherwise Before we begin, it should be noted that throughout this course, we will use POSIX and UNIX systems as examples. If a specific function or interface is mentioned without further qualification, it is assumed to be specified by POSIX and implemented by UNIX-like systems. ## The C Programming Language The C programming language is one of the most commonly used languages in operating system implementations. It is also the subject of PB071, and at this point, you should be already familiar with its basic syntax. Likewise, you are expected to understand the concept of a «function» and other basic building blocks of programs. Even if you don't know the specific C syntax, the idea is very similar to any other programming language you might know. │ Programming Languages │ │ • there are many different languages │ ◦ C, C++, Java, C#, ... │ ◦ Python, Perl, Ruby, ... │ ◦ ML, Haskell, Agda, ... │ • but «C» has a «special place» in most OSes Different programming languages have different use-cases in mind, and exist at different levels of abstraction. Most languages other than C that you will meet, both at the university and in practice, are so-called high-level languages. There are quite a few language families, and there is a number of higher-level languages derived from C, like C++, Java or C#. For the purposes of this course, we will mostly deal with plain C, and with POSIX (Bourne-style) «shell», which can also be thought of as a programming language. │ C: The Least Common Denominator │ │ • except for assembly, C is the “bare minimum” │ • you can almost think of C as «portable assembly» │ • it is very easy to call C functions │ • and to use C data structures │ │ You can use C libraries in almost every language You could think of C as a ‘portable assembler’, with a few minor bells and whistles in form of the standard library. Apart from this library of basic and widely useful subroutines, C provides: abstraction from machine opcodes (with human-friendly infix operator syntax), structured control flow, and automatic local variables as its main advantages over assembly. In particular the abstraction over the target processor and its instruction set proved to be instrumental in early operating systems, and helped establish the idea that an operating system is an entity separate from the hardware. On top of that, C is also popular as a systems programming language because almost any program, regardless of what language it is written in, can quite easily call C functions and use C data structures. │ The Language of Operating Systems │ │ • many (most) kernels are «written in C» │ • this usually extends to system libraries │ • and sometimes to almost the entire OS │ • non-C operating systems provide «C APIs» Consequently, C has essentially become a ‘language of operating systems’: most kernels and even the bulk of most operating systems is written in C. Each operating system (apart from perhaps a few exceptions) provides a C standard library in some form and can execute programs written in C (and more importantly, provide them with essential services). ## System Libraries We have already touched the topic of system libraries last week, in the ‘anatomy’ section. It is now time to look at them in more detail: what they contain, how are they stored in the file system, how are they combined with programs. We will also briefly talk about system call wrappers (which mediate low-level access to kernel services – we will discuss this topic in more detail in the next lecture). Finally, we will look at a few examples of system libraries which appear in popular operating systems. │ (System) Libraries │ │ • mainly «C functions» and «data types» │ • interfaces defined in «header files» │ • definitions provided in «libraries» │ ◦ static libraries (archives): ‹libc.a› │ ◦ shared (dynamic) libraries: ‹libc.so› │ • on Windows: ‹msvcrt.lib› and ‹msvcrt.dll› │ • there are (many) more besides ‹libc› / ‹msvcrt› In this course, when we talk about libraries, we will mean C libraries specifically. Not Python or Haskell modules, which are quite different. That said, a typical C library has basically two parts, one is header files which provide a description of the interface (the API) and the compiled library code (an archive or a shared library). The interface (as described in header files) consists of functions (for which, the types of arguments and the type of return value are given in a header file) and of data structures. The bodies of the functions (their implementation) is what makes up the compiled library code. To illustrate: │ Declaration: «what» but not «how» │ │ int sum( int a, int b ); /* C */ │ │ Definition: «how» is the operation done? │ │ int sum( int a, int b ) /* C */ │ { │ return a + b; │ } The first example on this slide is a declaration: it tells us the name of a function, its inputs and its output. The second example is called a «definition» (or sometimes a «body») of the function and contains the operations to be performed when the function is called. │ Library Files │ │ • ‹/usr/lib› on most Unices │ ◦ may be mixed with «application libraries» │ ◦ especially on Linux-derived systems │ ◦ also ‹/usr/local/lib› for user/app libraries │ • on Windows: ‹C:\Windows\System32› │ ◦ user libraries often «bundled» with programs The machine code that makes up the library (i.e. the code that was generated from function definitions) resides in files. Those files are what we usually call ‘libraries’ and they usually live in a specific filesystem location. On most UNIX system, those locations are ‹/usr/lib› and possibly ‹/lib› for system libraries and ‹/usr/local/lib› for user or application libraries. On certain systems (especially Linux-based), user libraries are mixed with system libraries and they are all stored in ‹/usr/lib›. On Windows, the situation is similar in that both system and application libraries are installed in a common location. Additionally, on Windows (and on macOS), shared libraries are often installed alongside the application. │ Static Libraries │ │ • stored in ‹libfile.a›, or ‹file.lib› (Windows) │ • only needed for «compiling» (linking) programs │ • the code is «copied» into the executable │ • the resulting executable is also called «static» │ ◦ and is easier to work with for the OS │ ◦ but also more wasteful Static libraries are only used when building executables and are not required for normal operation of the system. Therefore, many operating systems do not install them by default – they have to be installed separately as part of the developer kit. When a static library is linked into a program, this basically entails copying the machine code from the library into the final executable. In this scenario, after linking is performed, the library is no longer needed since the executable contains all the code required for its execution. For system libraries, this means that the code that comes from the library is present on the system in many copies, once in each program that uses the library. This is somewhat alleviated by linkers only copying the parts of the library that are actually needed by the program, but there is still substantial duplication. The duplication arising this way does not only affect the file system, but also memory (RAM) when those programs are loaded – multiple copies of the same function will be loaded into memory when such programs are executed. │ Shared (Dynamic) Libraries │ │ • required for «running» programs │ • linking is done at «execution» time │ • less code duplication │ • can be «upgraded» separately │ • but: dependency problems The other approach to libraries is «dynamic», or «shared» libraries. In this case, the library is required to actually run the program: the linker does not copy the machine code from the library into the executable. Instead, it only notes that the library must be loaded alongside with the program when the latter is executed. This reduces code duplication, both on disk and in memory. It also means that the library can be updated separately from the application. This often makes updates easier, especially in case a library is used by many programs and is, for example, found to contain a security problem. In a static library, this would mean that each program that uses the library needs to be updated. A shared library can be replaced and the fixed code will be loaded alongside programs as usual. The downside is that it is difficult to maintain binary compatibility – to ensure that programs that were built against one version of the library also work with a later version. When this is violated, as often happens, people run into dependency problems (also known as DLL hell on Windows). │ Header Files │ │ • on UNIX: ‹/usr/include› │ • contains «prototypes» of C functions │ • and definitions of C data structures │ • required to «compile» C and C++ programs Like static libraries, header files are only required when building programs, but not when using them. Header files are fragments of C source code, and on UNIX systems are traditionally stored in ‹/usr/include›. User-installed header files (i.e. not those provided by system libraries) live under ‹/usr/local/include› (though again, on Linux-based systems user and system headers are often intermixed in ‹/usr/include›). │ Header Example 1 (from ‹unistd.h›) │ │ int execv(char *, char **); /* C */ │ pid_t fork(void); │ int pipe(int *); │ ssize_t read(int, void *, size_t); │ │ (and many more prototypes) This is an excerpt from an actual system header file, and declares a few of the functions that comprise the POSIX C API. │ Header Example 2 (from ‹sys/time.h›) │ │ struct timeval /* C */ │ { │ time_t tv_sec; │ long tv_usec; │ }; │ │ /* ... */ │ │ int gettimeofday(timeval *, timezone *); │ int settimeofday(timeval *, timezone *); This is another excerpt from an actual header – this time the snippet contains a definition of a «data structure». The layout (order of fields and their types, along with hidden «padding») of such structures is quite important, since that becomes part of the ABI. In other words, the definition above describes not just the high-level interface but also how bytes are laid out in memory. │ The POSIX C Library │ │ • ‹libc› – the C runtime library │ • contains ISO C functions │ ◦ ‹printf›, ‹fopen›, ‹fread› │ • and a number of POSIX functions │ ◦ ‹open›, ‹read›, ‹gethostbyname›, ... │ ◦ C wrappers for system calls As we have already mentioned previously, it is a tradition of UNIX systems that ‹libc› combines the basic C library and the basic POSIX library. For the following, a particular subset of the POSIX library is going to be rather important, namely the «system call wrappers». Those are C functions whose only purpose is to invoke their matching «system calls». │ System Calls: Numbers │ │ • system calls are performed at «machine level» │ • which syscall to perform is decided by a «number» │ ◦ e.g. ‹SYS_write› is 4 on OpenBSD │ ◦ numbers defined by ‹sys/syscall.h› │ ◦ different for each OS At the level of the OS kernel (cue next week), system calls are represented by «numbers» (which are often given symbolic names like ‹SYS_write›, but are nonetheless just small integers and not memory addresses like with ordinary C functions). The numbers are specific to any given kernel. And of course, the ‹libc› must use the same numbering as the kernel. │ System Calls: the ‹syscall› function │ │ • there is a C function called ‹syscall› │ ◦ prototype: ‹int syscall( int number, ... )› │ • this implements the «low-level» syscall sequence │ • it takes a «syscall number» and syscall parameters │ ◦ this is a bit like ‹printf› │ ◦ first parameter decides what are the other parameters │ • (more about how ‹syscall()› works next week) Typically, all system calls work essentially the same: the library takes the (syscall) number and some additional data (parameters), stores them at the pre-arranged location (registers, memory) and jumps into the kernel. Since this sequence is uniform across system calls, it is possible to have a single C function which can perform any system call, given its number. This function actually exists and is called ‹syscall›. It's entirely possible to perform all your syscalls using this one C function, and never call the more convenient single-purpose wrappers (see also below). │ System Calls: Wrappers │ │ • using ‹syscall()› directly is inconvenient │ • ‹libc› has a function for each system call │ ◦ ‹SYS_write› → ‹int write( int, char *, size_t )› │ ◦ ‹SYS_open› → ‹int open( char *, int )› │ ◦ and so on and so forth │ • those wrappers may use ‹syscall()› internally To make programming a fair bit more convenient, instead of saying syscall( SYS_write, fd, buffer, size ); /* C */ we can use a function called ‹write›, like this: write( fd, buffer size ); /* C */ Besides being shorter to type, it is also safer: the compiler can check that we passed the right number and types of arguments. The function might internally use the equivalent ‹syscall()› invocation – though in practice, we prefer to sacrifice this particular bit of abstraction to save a few instructions on the comparatively hot (hot = one that is executed often) code path. That is, each syscall wrapper contains a copy of the code for entering the kernel, instead of calling ‹syscall›. │ Portability │ │ • libraries provide an «abstraction layer» over OS internals │ • they are responsible for «application portability» │ ◦ along with standardised filesystem locations │ ◦ and user-space utilities to some degree │ • higher-level languages rely on system libraries An important function of libraries is to provide a uniform API to the upper layers of the system. The designers of an operating system may decide to substantially depart from the traditional system call protocol, or even from the traditional set of system calls. However, even if the kernel looks quite non-POSIX-y, it is often still possible to provide a set of C functions that behave as POSIX specifies. This has been done more than once, most often on top of microkernels, e.g. Microsoft NT (Windows NT, XP and later) or on Mach (macOS, HURD). All those systems are capable of supporting POSIX programs without being built around a UNIX-like monolithic kernel. Of course, the API alone is not sufficient to make POSIX programs work correctly: there are certain expectations about the filesystem (both semantics of the file system itself, but also which files exist and what they contain) and other aspects of the system. │ NeXTSTEP and Objective C │ │ • the NeXT OS was built around «Objective C» │ • system libraries had ObjC APIs │ • in API terms, ObjC is very «different from C» │ ◦ also very different from C++ │ ◦ traditional «OOP» features (like Smalltalk) │ • this has been partly inherited into «macOS» │ ◦ Objective C evolved into Swift Not all operating systems provide (exclusively) C APIs. Historically, one of the earlier departures was the NeXT operating system, which used Objective C extensively. While the procedural part of the language is simply C, the object-oriented part is based on Smalltalk, with pervasive late binding and dynamic types. │ System Libraries: UNIX │ │ • the math library ‹libm› │ ◦ implements math functions like ‹sin› and ‹exp› │ • thread library ‹libpthread› │ • terminal access: ‹libcurses› │ • cryptography: ‹libcrypto› (OpenSSL) │ • the C++ standard library ‹libstdc++› or ‹libc++› While ‹libc› is quite central, there are many other libraries that are part of a UNIX system. You would find most of the above examples on most UNIX systems in some form. │ System Libraries: Windows │ │ • ‹msvcrt.dll› – the ISO C functions │ • ‹kernel32.dll› – basic OS APIs │ • ‹gdi32.dll› – Graphics Device Interface │ • ‹user32.dll› – standard GUI elements System libraries look quite differently on Windows: there is no ‹libc›: instead, the C standard library has its own DLL (the ‹msvcrt›, from MicroSoft Visual C RunTime) while operating system services (the low-level kind) live in ‹kernel32.dll›. The other two libraries allow applications to provide a graphical user interface. The libraries mentioned here all provide C APIs, though there are also C++ and C# interfaces (which are partly wrappers around the above libraries, but not exclusively). │ Documentation │ │ • manual pages on UNIX │ ◦ try e.g. ‹man 2 write› on ‹aisa.fi.muni.cz› │ ◦ section 2: system calls │ ◦ section 3: library functions (‹man 3 printf›) │ • MSDN for Windows │ ◦ │ • you can learn «a lot» from those sources Most OS vendors provide extensive documentation of their programmer's interfaces. On UNIX, this is typically part of the OS installation itself (manual pages, command ‹man›), while on Windows, this is a separate resource (these days accessible online, previously distributed in print or on optical media). ## Compiler & Linker While compiling (and linking) programs is not core functionality of an operating system, it is quite useful to understand how these components work. Moreover, in earlier systems, a C compiler was considered a rather essential component and this tradition continues in many modern UNIX systems to this day. We will discuss different artefacts of compilation – object files, libraries and executables, as well as the process of linking object code and libraries to produce executables. We will also highlight the differences between static and shared (dynamic) libraries. │ C Compiler │ │ • many POSIX systems ship with a «C compiler» │ • the compiler takes a C «source file» as input │ ◦ a text file with a ‹.c› suffix │ • and produces an «object file» as its output │ ◦ binary file with machine code in it │ ◦ but cannot be directly executed Compilers transform human-readable programs into machine-executable programs. Of course, both those forms of the program need to be stored in memory: the first is usually in the form of «plain text» (usually encoded as UTF-8, or in older systems as ASCII). In this form, bytes stored in the file encode human-readable letters. On the output side, the file is «binary» (which is really just a catch-all term for files that are not plain text), and stores machine-friendly «instructions» – primitive operations that the CPU can execute. Only the compiler output cannot be directly executed yet, even though most of the instructions are in their final form. The missing piece are addresses: numbers which describe memory locations within the program itself (they may point at instructions or at data embedded in the program). At this stage, though, neither code nor data has been assigned to particular addresses, and hence the program cannot be executed (it will need to be «linked» first, more on that later). │ Object Files │ │ • contain native «machine» (executable) code │ • along with static data │ ◦ e.g. string literals used in the program │ • possibly split into a number of «sections» │ ◦ ‹.text›, ‹.rodata›, ‹.data› and so on │ • and metadata │ ◦ list of «symbols» (function names) and their addresses The purpose of object files is to store this semi-finished machine code, along with any static data (like string literals or numeric constants) that appear in the program. All this is sorted into «sections» – usually one section for machine code (also called text and called ‹.text› in the object file), another for read-only data (e.g. string literals), called ‹.rodata›, another for mutable but statically-initialized variables – ‹.data›. Bundled with all this is «metadata», which describes the content of the file (again in a machine-readable form). One example of such metadata is a «symbol table», which gives file-relative addresses of high-level functions that have been compiled into the object file. That is, the compiler will take a definition of a function that we wrote in C and emit machine code for this function. The ‹.text› section of an object file will consist of a number of such functions, one after another: the symbol table then tells us where each of the functions begins. │ Object File Formats │ │ • ‹a.out› – earliest UNIX object format │ • COFF – Common Object File Format │ ◦ adds support for sections over ‹a.out› │ • PE – Portable Executable (MS «Windows») │ • Mach-O – Mach Microkernel Executable («macOS») │ • «ELF» – Executable and Linkable Format (all modern Unices) There is a number of different physical layouts of object files, and each of those also carries slightly different semantics. By far the most common format used in POSIX systems is «ELF». The other common formats in contemporary use are «PE» (used by MS operating systems) and «Mach-O» (used by Apple operating systems). │ Archives (Static Libraries) │ │ • static libraries on UNIX are called «archives» │ • this is why they get the ‹.a› suffix │ • they are like a ‹zip› file full of «object files» │ • plus a table of symbols (function names) An archive is the simplest way to bundle multiple object files. As the name implies, it is essentially just a collection of object files stored as a single file. Each object file retains its identity and its content does not change in any way when it is bundled into an archive. The only difference from a typical data archive (a ‹tar› or a ‹zip› archive, say) is that besides the object files themselves, the archive contains an additional metadata section – a symbol table, or rather a symbol index. If someone (typically the linker) needs to find the definition of a particular function (symbol), it can first consult this archive-wide index to find which object file provides that symbol. This makes linking more efficient, since the linker does not need to sequentially scan each object file in the archive to find the definition. │ Linker │ │ • object files are «incomplete» │ • they can refer to «symbols» that they do not define │ ◦ the definitions can be in libraries │ ◦ or in other object files │ • a «linker» puts multiple object files together │ ◦ to produce a «single executable» │ ◦ or maybe a shared library As pointed out earlier, it is the job of a «linker» to combine object files (and libraries) into executables. The process is fairly involved, so we will describe it across the next few slides. The «input» to the linker is a bunch of «object files» and the output is a single «executable» or sometimes a single «shared library». Even though archives are handled specially by the linker, object files which are given to the linker directly will always become part of the final executable. Object files provided in archives are only used if they provide symbols which are required to complete the executable. │ Symbols vs Addresses │ │ • we use symbolic «names» to call functions &c. │ • but the ‹call› machine instruction needs an «address» │ │ • the executable will eventually live in memory │ • data and instructions need to be given «addresses» │ • what a linker does is «assign» those addresses The main entities that come up during linking are «symbols» and «addresses». In a program, the machine code and the data is loaded in memory, and as we know, each memory location has an «address». The program in its compiled form can use addresses to refer to parts of itself. For instance, to call a subroutine, we provide its starting address to a special ‹call› instruction, which tells the CPU to start executing code from that address. However, when humans write programs, they do not assign addresses to pieces of data, to functions or to individual instructions. Instead, if the program needs to refer to a part of itself, we give those parts names: those names are known as «symbols». It is the shared responsibility of the compiler and the linker to assign addresses to the individual symbols, in such a way that the objects stored in memory do not conflict (overlap). If you think about it, it would be very difficult to do by hand: we usually don't know how long the machine code will be for any given function, and we would need to guess and then add gaps in case we need to add more code to a function, and so on. And we would need to remember which code lives at which address and so on. It is all very uncomfortable, and even assembly programmers usually avoid assigning addresses by hand. In fact, one of the primary roles of an assembler is to translate from symbolic to numeric addresses. But I digress. │ Resolving Symbols │ │ • the linker processes one object file at a time │ • it maintains a «symbol table» │ ◦ mapping symbols (names) to addresses │ ◦ dynamically updated as more objects are processed │ • relocations are typically processed all at once at the end │ • «resolving symbols» = finding their addresses The linker works by maintaining an ‘incomplete executable’ and makes progress by merging each of the input object files into this work-in-progress file. The strategy for assigning final addresses is simple enough: there's a single output ‹.text› section, a single output ‹.data› section and so on. When an input file is processed, its own ‹.text› section is simply appended to the ‹.text› produced so far. The same process is repeated for every section. The symbol tables of the input object files are likewise merged one by one, and the addresses adjusted as symbols are added. In addition to symbol «definitions», object files contain symbol «uses» – those are known as relocations, and are stored in a relocation table. Relocations contain the «address of the instruction» that needs to be patched and the «symbol» the address of which is to be patched in. Like the sections themselves and the symbol table, the relocation table is built up. The relocations are also processed by the linker: usually, this means writing the final address of a particular symbol into an as-of-yet incomplete instruction or into a variable in the data section. This is usually done once the output symbol table is complete. The relocation and symbol tables are often discarded at the end (but may be retained in the output file in some cases – the symbol table more often than the relocation table). │ Executable │ │ • finished «image» of a program to be executed │ • usually in the same format as «object files» │ • but already complete, with symbols resolved │ ◦ «but:» may use «shared libraries» │ ◦ in that case, «some» symbols remain unresolved The output of the linker is, in the usual case, an «executable». This is a file that is based on the same format as object files of the given operating system, but is «complete» in some sense. In static executables (those which don't use shared libraries), all references and relocations are already resolved and the program can be loaded into memory and directly executed by the CPU, without further adjustments. It is also worth noting that the addresses that the executable uses when referring to parts of itself are «virtual addresses» (this is also the case with shared libraries below). We will talk more about those in a later lecture, but right now we can at least say that this means that different programs on the same operating system can use overlapping addresses for their instructions and data. This is not a problem, because virtual addresses are private to each process, and hence each copy of each executing program. │ Shared Libraries │ │ • each shared library only needs to be in memory once │ • shared libraries use «symbolic names» (like object files) │ • there is a “mini linker” in the OS to resolve those names │ ◦ usually known as a «runtime» linker │ ◦ resolving = finding the addresses │ • shared libraries can use other shared libraries │ ◦ they can form a «DAG» (Directed Acyclic Graph) The downside of static libraries is that they need to be loaded separately (often in slightly different versions) along with each program that uses them: in fact, since the linker embedded them into the program, they are quite inseparable from it. As we have already mentioned, this is not very efficient. Instead, we can store the library code in separate executable-like files that get loaded into the address space of programs that need it. Of course, relocations in the main program that refer to symbols from shared libraries (and vice versa), and obviously also relocations in shared libraries that refer to other shared libraries, those need to be resolved. This is usually done either when the program is loaded into memory, or lazily, right before the relocation is first used. In either case, there needs to be a program which will resolve those relocations: this is the «runtime linker» – it is superficially similar to the normal, compile-time linker, but in reality is quite different. │ Addresses Revisited │ │ • when you run a program, it is «loaded into memory» │ • parts of the program refer to other parts of the program │ ◦ this means they need to know «where» it will be loaded │ ◦ this is a responsibility of the «linker» │ • shared libraries use «position-independent code» │ ◦ works regardless of the base address it is loaded at │ ◦ we won't go into detail on how this is achieved We mentioned that executables and libraries use virtual addresses to refer to their own parts. However, this does not help in shared libraries as much as it helps in executables. The letdown is that we want to load the same library along with multiple programs: but if the addresses used by the library are fixed, this means that the library needs to be loaded at the same start address into each program that uses that library. This quickly becomes impractical as we add more libraries into the system – no two libraries would be allowed to overlap and none of them would be allowed to overlap with any of the executables. In practice, what we instead do is that we compile the libraries in such a way that they don't use absolute addresses to refer to parts of themselves. This often adds a little execution overhead, but makes it possible to load the library at any address range that is available in the current process. This makes the job of the runtime linker much easier. │ Compiler, Linker &c. │ │ • the C compiler is usually called ‹cc› │ • the linker is known as ‹ld› │ • the archive (static library) manager is ‹ar› │ • the «runtime» linker is often known as ‹ld.so› On many UNIXes, the compiler and the linker are available as part of the system itself. The command names are standardized. ## File-Based APIs On POSIX systems, the API for using the filesystem is a very important one, because it in fact provides access to a number of additional resources, which appear as ‘abstract’ (special) files in the system. │ Everything is a File │ │ • part of the UNIX «design philosophy» │ • «directories» are files │ • «devices» are files │ • «pipes» are files │ • network connections are (almost) files File is an abstraction: it is an object from which we can read bytes and into which we can write bytes (not all files will let us do both). In regular files, we can read and write at any offset, and if we write something we can later read that same thing (unless it was rewritten in the meantime). Directories are somewhat like this: we can read bytes from them to find out what files are present in that directory and how to find them in the file system. We can create new entries by writing into the directory. Incidentally, this is not how things are usually done, but it's not hard to imagine it could be. Quite a few «devices» (peripherals) behave this way: all kinds of hard drives (just a big bunch of bytes), printers (write some bytes to have them printed), scanners (write bytes to send commands, read bytes with the image data), audio devices (read bytes from microphones, write bytes into speakers), and so on. Pipes are like that too: one program writes bytes, and another reads them. And network connections are more or less just pipes that work across the network. │ Why is Everything a File │ │ • «re-use» the comprehensive «file system API» │ • re-use existing file-based command-line tools │ • bugs are bad → «simplicity» is good │ • want to print? ‹cat file.txt > /dev/ulpt0› │ ◦ (reality is a little more complex) Since we already have an API to work with «abstract files» (because we need to work with real files anyway), it becomes reasonable to ask why not use this existing API to work with other objects that look like files. It makes sense not just at the level of C functions, but at the level of command-line programs too. In general, re-using existing mechanisms makes things more flexible, and often also simpler. Of course, there are caveats (devices often need to support operations that don't map well to reading or writing bytes, sockets are also somewhat problematic). │ What is a Filesystem? │ │ • a set of «files» and «directories» │ • usually lives on a single block device │ ◦ but may also be virtual │ • directories and files form a «tree» │ ◦ directories are internal nodes │ ◦ files are leaf nodes While we have a decent idea of what a «file» is, what about a file «system»? Well, a file system is a collection of files and directories, typically stored on a single block device. The directories and files form a tree (at least until symlinks come into play, at which point things start going south). Regular files are always leaf nodes in this tree. │ File Paths │ │ • filesystems use «paths» to point at files │ • a string with ‹/› as a directory delimiter │ ◦ the delimiter is ‹\› on Windows │ • a leading ‹/› indicates the «filesystem root» │ • e.g. ‹/usr/include› Paths are how we refer to files and directories within the tree. The top-level («root») directory is named ‹/›. Each directory «entry» carries a name (and a link to the actual file or directory it represents) – this name can be used in the path to refer to the given entity. So with a path like ‹/usr/include›, we start at the «root directory» (the initial slash), then in that directory, we look for an entity called ‹usr› and when we find it, we check that it is a directory again. If that is so, we then look at its direct descendants again and look for an entity labelled ‹include›. │ The File Hierarchy │ │ ╭───╮ │ ┌───────│ / │──────────┐ │ │ ╰───╯ │ │ ▼ ▼ ▼ │ ╭─────╮ ╭─────╮ ╭─────╮ │ │ home│ │ var │ ┌──│ usr │──┐ │ ╰─────╯ ╰─────╯ │ ╰─────╯ │ │ ▼ ▼ ▼ │ ╭───────╮ ╭───────╮ ╭─────╮ │ │xrockai│ ┌──────│include│ │ lib │──────┐ │ ╰───────╯ │ ╰───────╯ ╰─────╯ │ │ ▼ ▼ ▼ ▼ │ ╭───────╮ ╭─────────╮ ╭───────╮ ╭───────╮ │ │stdio.h│ │unistd.h │ │ libc.a│ │ libm.a│ │ ╰───────╯ ╰─────────╯ ╰───────╯ ╰───────╯ That's an example of a file system tree. You can practice looking up various paths in the tree, using the algorithm described above. │ The Role of Files and Filesystems │ │ • «very» central in «Plan9» │ • central in most UNIX systems │ ◦ cf. Linux pseudo-filesystems │ ◦ ‹/proc› provides info about all processes │ ◦ ‹/sys› gives info about the kernel and devices │ • somewhat «reduced» in Windows │ • quite «suppressed» in Android (and more on iOS) Different operating systems put different emphasis on the file system. We will take the way POSIX positions the file system as the baseline – in this case, the file system is quite central: in addition to regular files and directories, all sorts of special files appear in the file system and provide access to various OS facilities. However, there are also many services and APIs that are not based on the file system, including e.g. process management, memory management and so on. In many UNIX-like systems, the reliance on FS-based APIs is notched up a bit: e.g. process management is done via a virtual ‹/proc› filesystem (many different systems), or device discovery and configuration via ‹/sys› (Linux). Another level above that is Plan9, where essentially everything that can be made into a file system is made into one. Another experimental system, GNU/Hurd, has a similar ambition. If we go the other way from POSIX, we have the native Windows APIs, which emphasise the file system much less than would be typical in POSIX. Most objects have dedicated APIs, even if they are rather file-like. However, the file system is still prominently present both in the APIs and in the user interface. Both are further suppressed by modern ‘scaled-down’ operating systems like Android and iOS (even if both are POSIX-compatible under the hood, ‘normal’ applications are not allowed to access the POSIX API, or the file system, and it is usually also hidden from users). │ The Filesystem API │ │ • you «open» a file (using the ‹open()› syscall) │ • you can ‹read()› and ‹write()› data │ • you ‹close()› the file when you are done │ • you can ‹rename()› and ‹unlink()› files │ • you can use ‹mkdir()› to create directories So how does the file system API look on POSIX systems? To work with a file, you usually need to ‹open› it first: you supply a «path» and some flags to tell the OS what you intend to do with the file. More on that in a short while. When you have a file open, you can ‹read› data from it and ‹write› data into it. When you are done, you use ‹close› to free up the associated resources. To work with directories, you usually don't need to ‹open› them (though you can). You can rename files (this is a directory operation) using ‹rename›, remove them from the file system hierarchy using ‹unlink› (this erases the corresponding directory entry), and you can create new directories using ‹mkdir›. │ File Descriptors │ │ • the kernel keeps a table of open files │ • the «file descriptor» is an index into this table │ • you do everything using file descriptors │ • non-Unix systems have similar concepts │ ◦ descriptors are called «handles» on Windows Remember ‹open›? When we want to work with a file, we need a way to identify that file, and paths are not super convenient in this respect: someone could rename the file we were working with, and suddenly it is gone, or worse, the file could be replaced by a different file or even a directory. Additionally, looking up a file by its path is a comparatively expensive operation: the OS has to read every directory mentioned in the «path» and run a lookup on it. While this information is often cached in RAM, it still takes valuable time. When we open a file, we get back a «file descriptor» – this is a small integer, and using this descriptor as an index into a table, the kernel can look up all the metadata it needs (to carry out reads and writes) in constant time. The descriptor is also associated with the file directly, so if the file is moved around or even unlinked from the directory tree, the descriptor still points to the same file. Most non-POSIX file system APIs have a similar notion (sometimes open does not return a number but a different data type, e.g. a pointer, and sometimes this value is called a «handle» instead of a descriptor... but the concept is more or less the same). │ Regular files │ │ • these contain «sequential data» (bytes) │ • may have inner structure but the OS does not care │ • there is «metadata» attached to files │ ◦ like when were they last modified │ ◦ who can and who cannot access the file │ • you ‹read()› and ‹write()› files A regular file is what it appears to be. It is a sequence of bytes, stored on a persistent storage device and has metadata associated that makes it possible to locate all that data in actual disk sectors. The bytes inside the file are of no concern to the operating system. When data is read from a file, the operating system consults the file system metadata to find the particular sectors on disk that store the content. When data is overwritten, the same thing happens but those sectors are rewritten with the new data. When new data is appended, the operating system looks up some free space on the disk, then adjusts the file metadata to point at the (now taken) sectors and writes the data in there. There is some additional metadata stored alongside each file, like whom it belongs to or when it was modified. │ Directories │ │ • a «list» of files and other directories │ ◦ internal nodes of the filesystem tree │ ◦ directories give names to files │ • can be opened just like files │ ◦ but ‹read()› and ‹write()› is not allowed │ ◦ files are created with ‹open()› or ‹creat()› │ ◦ directories with ‹mkdir()› │ ◦ directory listing with ‹opendir()› and ‹readdir()› A directory is a (potentially) internal node in the file hierarchy: their role is to give «names» to files, making them accessible via «paths». Like regular files, directories are self-contained objects, but instead of raw bytes, they contain structured data: namely, a directory maps file names to other files (those can be regular files, other directories, or one of the special file types we will talk about shortly). In principle, it would be possible to implement ‹read› and ‹write› for directories, but this would be problematic: if those functions dealt with the actual on-disk representation of a directory, user programs could easily corrupt directory entries. This is quite undesirable: instead, under normal circumstances, directories are used via «paths»: when we present a file path to ‹open›, the operating system will automatically traverse directories as needed. Of course, user programs sometimes need to iterate through all directory entries, i.e. list all files in a given directory. To this end, POSIX provides the ‹opendir› function along with ‹readdir›, ‹seekdir›, ‹closedir› and so on. These functions provide a high-level API for interacting with directories. Nonetheless, this API is read-only: directory entries are created whenever files are created using the corresponding path, e.g. using ‹mkdir› or ‹open› with the ‹O_CREAT› flag. │ Mounts │ │ • UNIX joins all file systems into a single hierarchy │ • the root of one filesystem becomes a directory in another │ ◦ this is called a «mount point» │ • Windows uses «drive letters» instead (‹C:›, ‹D:› &c.) A single computer (and hence, a single operating system) may have more than one hard drive available to it. In this case, it is customary that each such device contains its own file system: the question arises, how to present such multiple file systems to the user. The UNIX strategy is to present all the file systems within a single directory tree: to this end, one of the file systems is picked as a «root» file system: in this (and only in this) file system, the FS root directory ‹/› is the same as the system root directory. All other file systems are joined existing directories of other file systems at their root. Consider two file systems: ╭────╮ ╭────╮ │ / │ │ / │ ╰────╯ ╰────╯ ┌─╯╰─────┐ ┌───╯ ╰──┐ ▼ ▼ ▼ ▼ ╭─────╮ ╭─────╮ ╭───────╮ ╭─────╮ │ home│ │ usr │ │include│ │ lib │ ╰─────╯ ╰─────╯ ╰───────╯ ╰─────╯ │ │ ╰──────┐ ▼ ▼ ▼ ╭───────╮ ╭───────╮ ╭───────╮ │stdio.h│ │ libc.a│ │ libm.a│ ╰───────╯ ╰───────╯ ╰───────╯ If we now «mount» the second file system onto the ‹/usr› directory of the first, we get the following unified hierarchy: ╭────╮ │ / │ ╰────╯ ┌─╯╰─────┐ ▼ ▼ ╭─────╮ ┌─────┐ │ home│ │ usr │ ╰─────╯ └─────┘ ┌───╯ ╰──┐ ▼ ▼ ╭───────╮ ╭─────╮ │include│ │ lib │ ╰───────╯ ╰─────╯ │ │ ╰──────┐ ▼ ▼ ▼ ╭───────╮ ╭───────╮ ╭───────╮ │stdio.h│ │ libc.a│ │ libm.a│ ╰───────╯ ╰───────╯ ╰───────╯ Usually, file systems are mounted onto «empty» directories: if ‹/usr› was not empty on the left (root) file system, its content would be hidden by the mount. The other strategy is to present multiple file systems using multiple separate trees. This is the strategy implemented by the MS Windows family of operating systems: each file system is assigned a single letter, and each becomes its own, separate tree. │ Pipes │ │ • pipes are a simple communication device │ • one program can ‹write()› data to the pipe │ • another program can ‹read()› that same data │ • each end of the pipe gets a «file descriptor» │ • a pipe can live in the filesystem («named» pipe) Pipes are somewhat like files in that it's possible to write data (bytes) into them, and read data from them. In most cases, the program doing the writing is a different program from the one doing the reading. Unlike a regular file, the data is not permanently stored anywhere: it disappears from the pipe as soon as it is read. Of course, there is a buffer associated with a pipe, but it is only stored in RAM. This allows the writing process to write data even if the other end is not actively reading at the same time: the OS will buffer the write until such time it can be read. Normally, a pipe is an anonymous device, only accessible via file descriptors. When these are closed, the device is destroyed. There is another variant of a pipe, though, called a «named pipe» which is given a name in the file system. This does not mean the «data» is stored anywhere: a named pipe operates just like an anonymous pipe, the difference is that it can be passed to ‹open› using its path. │ Devices │ │ • «block» and «character» devices are (special) «files» │ • block devices are accessed one «block at a time» │ ◦ a typical block device would be a «disk» │ ◦ includes USB mass storage, flash storage, etc │ ◦ you can create a «file system» on a block device │ • «character» devices are more like normal files │ ◦ terminals, tapes, serial ports, audio devices As we have already mentioned, many peripheral devices look like sequences of bytes, or possibly as sequences of blocks. A typical block device is «addressable»: the user can seek to a particular location of the device and read a chunk of data (an integer number of blocks). Unless someone writes to a particular location of the device, reading from the same address multiple times will yield the same data. On the other hand, character devices often behave rather like pipes, in the sense that if a program writes some bytes into the device, reading the device will not yield those same bytes. Instead, character devices usually mediate communication with a peripheral that consumes bytes (which are written into the device) and/or provides some output (which is what the program gets when it reads from the device). Consider a printer: writing bytes into the printer's character device will cause those bytes to be printed (after possibly being interpreted by the printer). Another example would be that after a scanner has been instructed to scan a document, the pixels captured by its optical sensor can be, in some form, extracted by reading from its character device. Essentially, these types of character devices behave like a pipe, but instead of another program, the other end is a hardware device (or most likely its firmware). │ Sockets │ │ • the socket API comes from early BSD Unix │ • socket represents a (possible) «network connection» │ • sockets are more complicated than normal files │ ◦ establishing connections is hard │ ◦ messages get lost much more often than file data │ • you get a «file descriptor» for an open socket │ • you can ‹read()› and ‹write()› to sockets Sockets are, in some sense, a generalization of pipes. There are essentially 3 types of sockets: 1. a «listening» socket, which allows many «clients» to connect to a single «server» – strictly speaking, these sockets do not transport data, instead, they allow processes to establish «connections», 2. a «connected» socket, one of which is created for each connection and which behaves essentially like a bidirectional pipe (standard pipes being unidirectional), 3. a «datagram» socket, which can be used to send data without establishing connections, using special send/receive API. While the third is rather special and un-pipe-like, the first two are usually used together as a point-to-multipoint means of communication: the server listens on an «address», and any client which has this address can establish communication with the server. This is quite unlike pipes, which usually need to be pre-arranged (i.e. the programs must already be aware of each other). │ Socket Types │ │ • sockets can be «internet» or «unix domain» │ ◦ internet sockets connect to other computers │ ◦ Unix sockets live in the filesystem │ • sockets can be «stream» or «datagram» │ ◦ stream sockets are like pipes │ ◦ you can write a continuous «stream» of data │ ◦ datagram sockets can send individual «messages» There are two basic address types: internet sockets, which are used for inter-machine communication (using TCP/IP), and «unix domain sockets», which are used for local communication. A unix socket is like a named pipe: it has a path in the file system, and client programs can use this path to establish a connection to the server. │ Review Questions │ │ 5. What is a shared (dynamic) library? │ 6. What does a linker do? │ 7. What is a symbol in an object file? │ 8. What is a file descriptor?