# Access Control

This lecture will focus on basic security considerations in an
operating system, with focus on file systems, which are typically
the most visible instance of access control in an OS.

│ Lecture Overview
│
│  1. Multi-User Systems
│  2. File Systems
│  3. Sub-user Granularity

We will first look at the motivation and implementation of «users»,
the basic unit of ownership and access control in an operating
system. We will also look at some consequences and some applications
of multi-user computing, and discuss how access control is
implemented and enforced. In the second part, we will focus on the
canonical case study in access control: file systems. Finally, the
last part will explore what happens when per-user access control is
not sufficient and we need a more fine-grained permission system.

## Multi-User Systems

Multi-user systems had been the norm until the rise of personal
computers circa mid-80s: earlier computers were too expensive and
too bulky to be allocated to a single person. Instead, earlier
systems used some form of multi-tenancy, whether implemented
administratively (batch systems) or by the operating system
(interactive, terminal-based computers).

│ Users
│
│  • originally a proxy for «people»
│  • currently a more «general abstraction»
│  • user is the unit of «ownership»
│  • many «permissions» are user-centered

The concept of a «user» has evolved from the need to keep separate
accounts for distinct people (the eponymous users of the system). In
modern systems, a «user» continues to be an abstraction that
includes accounts for individual humans, but also covers other
needs. Essentially, «user» is a unit of ownership, and of access
control.

│ Computer Sharing
│
│  • computer is a (often costly) «resource»
│  • efficiency of use is a concern
│    ◦ a single user rarely exploits a computer fully
│  • data sharing makes access control a necessity

While efficient resource usage is what drove multi-tenancy of
computer systems, it is the global shared file system that drove the
requirement for access control: users do not necessarily wish to
trust all other users of the system with access to their files.

│ Ownership
│
│  • various «objects» in an OS can be «owned»
│    ◦ primarily «files» and «processes»
│  • the owner is typically whoever «created» the object
│    ◦ though ownership can be «transferred»
│    ◦ restrictions usually apply

The standard model of access control in operating systems revolves
around «ownership» of «objects». Generally speaking, ownership of an
object confers both rights (to manipulate the object) and
obligations (owned objects count towards quotas). Depending on
circumstances, object ownership may be transferred, either by the
original owner, or by system administrators.

│ Process Ownership
│
│  • each «process» belongs to some user
│  • the process acts «on behalf» of the user
│    ◦ the process gets the same privilege as its owner
│    ◦ this both «constrains» and «empowers» the process
│  • processes are «active» participants

The perhaps most important ownership relationship is between users
and their processes. This is because processes execute code on
behalf of the user, and all actions a user takes on a system are
mediated by some process or another. In this sense, processes act on
behalf of their owner and the actions they perform are subject to
any restrictions which apply to the user in question.

│ File Ownership
│
│  • each «file» also belongs to some user
│  • this gives «rights» to the «user» (or rather their processes)
│    ◦ they can «read» and «write» the file
│    ◦ they can «change permissions» or ownership
│  • files are «passive» participants

Like processes, files are objects which are subject to ownership.
However, unlike processes, files are passive: they do not perform
any actions. Hence in this case, ownership simply gives the owner
certain rights to perform actions on the file (most importantly
change access control rights pertaining to that file).

│ Access Control Models
│
│  • «owners» usually decide who can access their objects
│    ◦ this is known as «discretionary» access control
│  • in high-security environments, this is not allowed
│    ◦ known as «mandatory» access control
│    ◦ a central authority decides the policy

There are two main approaches to access control: the common
«discretionary» model, where owners decide who can interact with
their files (or other objects, as applicable) and «mandatory», in
which users are not trusted with matters of security, and decisions
about access control are placed in the hands of a central authority.

In both cases, the operating system grants (or denies) access to
object based on an «access control policy»: however, only in the
latter case this policy can be thought of as a coherent,
self-contained document (as opposed to a collection of rules decided
by a number of uncoordinated users).

│ (Virtual) System Users
│
│  • users are a useful ownership «abstraction»
│  • various system services get their own ‘fake’ users
│  • this allows them to «own files» and «processes»
│  • and also «limit» their «access» to the rest of the OS

Users have turned out to be a really useful abstraction. It is
common practice that services (whether system- or application-level)
run under special users of their own. This means that these service
can own files and other resources, and run processes under their own
identity. Additionally, it means that those services can be
restricted using the same mechanisms that apply to ‘normal’ users.

│ Principle of Least Privilege
│
│  • entities should have «minimum» privilege required
│    ◦ applies to «software» components
│    ◦ but also to «human» users of the system
│  • this «limits» the scope of «mistakes»
│    ◦ and also of security compromises

The «principle of least privilege» is an important maxim for
designing secure systems: it tells us that, regardless of the
subject and object combination, permissions should only be granted
where there is genuine need for the subject to manipulate the
particular object. The rationale is that mistakes happen, and when
they do, we would rather limit their scope (and hence damage):
mistakes cannot endanger objects which are inaccessible to the
culprit.

│ Privilege Separation
│
│  • different parts of a system need different privilege
│  • least privilege dictates «splitting» the system
│    ◦ components are «isolated» from each other
│    ◦ they are given only the rights they need
│  • components «communicate» using very simple IPC

An important corollary of the principle of least privilege is the
design pattern known as «privilege separation». Systems which follow
it are split into a number of independent components, each serving a
small, well-defined and security-wise self-contained function. Each
of these modules can be then isolated in their own little sandbox
and communicate with the rest of the system through narrowly defined
interfaces (usually built on some form of inter-process
communication).

│ Process Separation
│
│  • recall that each process runs in its own «address space»
│    ◦ «shared memory» must be explicitly requested
│  • each «user» has a view of the «filesystem»
│    ◦ a lot more is shared by default in the filesystem
│    ◦ especially the «namespace» (directory hierarchy)

There is not much need for access control of memory: each process
has their own and cannot see the memory of any other process (with
small, controlled exceptions created through mutual consent of the
two processes).

The file system is, however, very different: there is a global,
shared namespace that is visible to all users and all processes.
Moreover, many of the objects (files) are «meant» to be shared, in
a rather ad-hoc fashion, either through ‘well-known’ paths (this
being the case with many system files) or through passing paths
around. Importantly, paths are «not» any sort of access token and in
almost all circumstances, withholding a path does not prevent access
to the object (paths can be easily discovered).

│ Access Control Policy
│
│  • there are 3 pieces of information
│    ◦ the «subject» (user)
│    ◦ the «action»/«verb» (what is to be done)
│    ◦ the «object» (the file or other resource)
│  • there are many ways to «encode» this information

We have mentioned earlier, that the totality of the rules that
decide which actions are allowed, and which disallowed, is known as
an «access control policy». In the abstract, it is a rulebook which
answers questions of the form ‘Is (subject) allowed to perform
(action) on (object)?’ There are clearly many different ways in
which this rulebook can be encoded: we will look at some of the most
common strategies later.

│ Access Rights Subjects
│
│  • in a typical OS those are (possibly virtual) «users»
│    ◦ sub-user units are possible (e.g. programs)
│    ◦ «roles» and «groups» could also be subjects
│  • the subject must be «named» (names, identifiers)
│    ◦ easy on a single system, «hard» in a «network»

The most common access control «subject» (at least when it comes to
access policy «specification»), are, as was already hinted at,
«users», whether ‘real’ (those that stand in for people) or virtual
(which stand for services).

In most circumstances, it must be possible to «name» the subjects,
so that it's possible to refer to them in rules.  Sometimes,
however, rules can be directly attached to subjects, in which case
there is no need for these subjects to have stable identifiers
attached.

│ Access Rights Actions (Verbs)
│
│  • the available ‘verbs’ (actions) depend on «object» type
│  • a typical object would be a «file»
│    ◦ files can be «read», «written», «executed»
│    ◦ «directories» can be «searched» or «listed» or «changed»
│  • network connections can be established &c.

The particular choice of actions depends on the object type: each
such type has a fixed list of actions, which correspond to
operations, or variants of operations, that the operating system
offers through its interfaces.

The actions may be affected by the policy directly or indirectly – for
instance, the «read» permission on a file is not enforced at the
time a ‹read› call is performed: instead, it is checked at the time
of ‹open›, with the provision that ‹read› can be only used on file
descriptors that are «open for reading». That is, the program is
required to indicate, at the time of ‹open›, whether it wishes to
read from the file.

│ Access Rights Objects
│
│  • anything that can be «manipulated» by «programs»
│    ◦ although not everything is subject to access control
│  • could be «files», «directories», «sockets», shared «memory», ...
│  • object «names» depend on their type
│    ◦ file paths, i-node numbers, IP addresses, ...

Like subjects, objects need to have names unless the pieces of
policy relevant to them are directly attached to the objects
themselves. However, in case of objects, this direct attachment is
much more common: it is rather typical that an i-node embeds
permission information.

│ Subjects in POSIX
│
│  • there are 2 types of «subjects»: «users» and «groups»
│  • each «user» can belong to «multiple groups»
│  • users are split into «normal» users and ‹root›
│    ◦ ‹root› is also known as the «super-user»

In POSIX systems, there are two basic types of subjects that can
appear in the access control policy: users and groups. Since POSIX
only covers access control for the file system, objects do not need
to be named: their permissions are attached to the i-node.

A special user, known as ‹root›, represents the system administrator
(also known as the super-user). This account is not subject to
permission checking. Additionally, there is a number of actions
(usually not attached to particular objects) which only the ‹root›
user can perform (e.g. reboot the computer).

│ User and Group Identifiers
│
│  • users and groups are represented as «numbers»
│    ◦ this improves «efficiency» of many operations
│    ◦ the numbers are called ‹uid› and ‹gid›
│  • those numbers are valid on a «single computer»
│    ◦ or at most, a local network

In the access control policy, users and groups are identified by
numbers (each user and each group getting a small, locally unique
integer). Since these identifiers have a fixed size, they can be
stored very compactly in i-nodes, and can be also very efficiently
compared, both of which have been historically important
considerations. Besides efficiency, the numeric identifiers also
make the layout of data structures which carry them simpler,
reducing scope for bugs.

│ User Management
│
│  • the system needs a «database» of «users»
│  • in a network, user «identities» often need to be «shared»
│  • could be as simple as a «text file»
│    ◦ ‹/etc/passwd› and ‹/etc/group› on UNIX systems
│  • or as complex as a distributed database

The user database serves two basic roles: it tells the system which
users are authorized to access the system (more on this later), and
it maps between human-readable user names and the numeric
identifiers that the system uses internally.

In local networks, it is often desirable that all computers have the
same idea about who the users are, and that they use the same
mapping between their names and id's. LDAP and Active Directory are
popular choices for centralised network-level user databases.

│ Changing Identities
│
│  • each «process» belongs to a particular «user»
│  • ownership is «inherited» across ‹fork()›
│  • «super-user» processes can use ‹setuid()›
│  • ‹exec()› can sometimes change a process owner

Recall that all processes are created using the ‹fork› system call,
with the exception of ‹init›. When a process forks, the child
process inherits the ownership of the parent, that is, it belongs to
the same user as the parent does (whose ownership is not affected by
‹fork›).

However, if a process is owned by the super-user, it can change its
owner by using the ‹setuid› system call. Additionally, ‹exec› can
sometimes change the owner of the process, via the so-called
‹setuid› bit (not to be confused with the system call of the same
name). The ‹init› process is owned by the super-user. 

│ Login
│
│  • a super-user process manages «user logins»
│  • the user types in their name and «password»
│    ◦ the ‹login› program «authenticates» the user
│    ◦ then calls ‹setuid()› to change the process owner
│    ◦ and uses ‹exec()› to start a shell for the user

You may recall that at the end of the boot process, a ‹login›
process is executed to allow users to authenticate themselves and
start a session. The traditional implementation of ‹login› first
asks the user for their user name and password, which it checks
against the user database. If the credentials match, the ‹login›
program sets up the basic environment, changes the owner of the
process to the user who just authenticated themselves and executes
their preferred shell (as configured in the user database).

│ User Authentication
│
│  • the user needs to «authenticate» themselves
│  • «passwords» are the most commonly used method
│    ◦ the «system» needs to recognize the right password
│    ◦ user should be able to change their password
│  • «biometric» methods are also quite popular

By far, the most common method of authenticating users (that is,
ascertaining that they are who they claim they are) is by asking for
a secret – a password or a passphrase. The idea is that only the
legitimate owner of the account in question knows this secret.

In an ideal case, the system does not store the password itself (in
case the password database is compromised), but stores instead
information that can be used to check that a password that the user
typed in is correct. The usual way this is done is via (salted)
cryptographic hash functions.

Besides passwords, other authentication methods exist, most notably
cryptographic tokens and biometrics.

│ Remote Login
│
│  • authentication over «network» is more complicated
│  • «passwords» are easiest, but not easy
│    ◦ «encryption» is needed to safely transmit passwords
│    ◦ along with «computer authentication»
│  • «2-factor» authentication is a popular improvement

While password is simply short string that can be quite easily sent
across a network, there are caveats. First, the network itself is
often insecure, and the password could be snooped by an attacker.
This means we need to use cryptography to transmit the password, or
otherwise prove its knowledge.

The other problem is, in case we send an encrypted password, that
the computer at the other end may not be the one we expect (i.e. it
could belong to an attacker).

Since the user is not required to be physically present to attempt
authenticating, this significantly increases the risk of attacks,
making strong passwords much more important. Besides strong
passwords, security can be improved by 2-factor authentication (more
on this shortly).

│ Computer Authentication
│
│  • how to ensure we send the password to the «right party»?
│    ◦ an attacker could «impersonate» our remote computer
│  • usually via «asymmetric cryptography»
│    ◦ a private key can be used to «sign» messages
│    ◦ the server signs a challenge to establish its «identity»

When interacting with a remote computer (via a network), it is
rather important to ensure that we communicate with the computer
that we intended to. While the most immediate concern is sending
passwords, of course this is not the only concern: accidentally
uploading secret data to the wrong computer would be as bad, if not
worse.

A common approach, then, is that each computer gets a unique private
key, while its public counterpart (or at least its fingerprint) is
distributed to other computers. When connecting, the client can
generate a random challenge, and ask the remote computer to sign it
using the secret key associated to the computer that we intended to
contact, in order to prove its identity. Unless the target computer
itself has been compromised, an attacker will be unable to produce a
valid signature and will be foiled.

│ 2-factor Authentication
│
│  • 2 different types of authentication
│    ◦ harder to spoof «both» at the same time
│  • there are a few factors to pick from
│    ◦ something the user «knows» (password)
│    ◦ something the user «has» (keys, tokens)
│    ◦ what the user «is» (biometric)

Two-factor (or multi-factor) authentication is popular for remote
authentication (as outlined earlier), since networks make attacks
much cheaper and more frequent. In this case, the first factor is
usually a password, and the second factor is a cryptographic «token»
– a small device (often in the form of a keychain) which generates
a unique sequence of codes, one of which the user transcribes to
prove ownership of the token. Remote biometric authentication is
somewhat less practical (though not impossible).

Of course, two-factor authentication can be used locally too, in
which case biometrics become considerably more attractive.
Cryptographic tokens or smart cards are also common, though in the
local case, they usually communicate with the computer directly,
instead of relying on the user to copy a code.

│ Enforcement: Hardware
│
│  • all «enforcement» begins with the hardware
│    ◦ the CPU provides a «privileged mode» for the kernel
│    ◦ DMA memory and IO instructions are «protected»
│  • the MMU allows the kernel to «isolate processes»
│    ◦ and protect its own integrity

Now that we have an access control policy and we have established
the identity of the user, there is one last thing that needs to be
addressed, and that is «enforcement» of the policy. Of course, an
access control policy is useless if it can be circumvented.

The ability of an operating system to enforce security stems from
hardware facilities: software alone cannot sufficiently constrain
other software running on the same computer. The main tool that
allows the kernel to enforce its security policy is the MMU (and the
fact that only the kernel can program it) and its control over
interrupt handlers.

│ Enforcement: Kernel
│
│  • kernel uses «hardware facilities» to implement security
│    ◦ it stands between «resources» and «processes»
│    ◦ access is mediated through «system calls»
│  • «file systems» are part of the kernel
│  • «user» and «group» «abstractions» are part of the kernel

Hardware resources are controlled by the kernel: memory via the MMU,
processors via the timer interrupt, memory-mapped peripherals again
through the MMU and through the interrupt handler table. Since user
programs cannot directly access physical resources, any interaction
with them must go through the kernel (via system calls), presenting
an opportunity for the kernel to check the requested actions against
the policy.

│ Enforcement: System Calls
│
│  • the kernel acts as an «arbitrator»
│  • a process is trapped in its own «address space»
│  • processes use system calls to access resources
│    ◦ kernel can decide what to allow
│    ◦ based on its «access control model» and «policy»

When a system call is executed, the kernel knows the owner of that
process, and also any objects involved in the system call. Armed
with this knowledge, it can easily consult the access control policy
to decide whether the requested action is allowed, and if it is not,
return an error to the process, instead of performing the action.

│ Enforcement: Service APIs
│
│  • userland processes can enforce access control
│    ◦ usually system services which provide IPC API
│  • e.g. via the ‹getpeereid()› system call
│    ◦ tells the caller «which user» is «connected» to a socket
│    ◦ user-level access control relies on «kernel» facilities

Just as the kernel sits on resources that user programs cannot
directly access, the same principle can be applied in userspace
programs, especially services.

Probably the most illustrative example is a relational database: the
database engine runs under a dedicated (virtual) user and stores its
data in a collection of files. The permissions on those files are
set such that only the owner can read or write them – hence, the
kernel will disallow any other process from interacting with those
files directly.

Nonetheless, the database system can selectively allow other
programs to «indirectly» interact with the data it stores: the
programs connect to a database server using a UNIX socket. At this
point, the database can ask the operating system to provide the user
identifier under which the client is running (using ‹getpeereid›).

Since the server can directly access the files which store the data,
it can, on the behalf of the client, execute queries and return the
results. It can, however, also disallow certain queries based on its
own access control policy and the user id of the client.


## File Systems

As outlined earlier, file systems are usually the most user-visible
aspect of an operating system with access control applied to it.
Additionally, permissions in the file system are usually directly
visible to users and manipulated by them.

│ File Access Rights
│
│  • «file systems» are a case study in access control
│  • all modern file systems maintain «permissions»
│    ◦ the only extant «exception» is FAT (USB sticks)
│  • different systems adopt different representation


│ Representation
│
│  • file systems are usually «object-centric»
│    ◦ permissions are attached to individual objects
│    ◦ easily answers “who can access this file”?
│  • there is a «fixed» set of «verbs»
│    ◦ those may be different for «files» and «directories»
│    ◦ different «systems» allow «different verbs»


│ The UNIX Model
│
│  • each file and directory has a single «owner»
│  • plus a single owning «group»
│    ◦ not limited to those the owner belongs to
│  • «ownership» and «permissions» are attached to «i-nodes»


│ Access vs Ownership
│
│  • POSIX ties «ownership» and «access» rights
│  • only 3 subjects can be named on a file
│    ◦ the owner (user)
│    ◦ the owning group
│    ◦ anyone else


│ Access Verbs in POSIX File Systems
│
│  • read: «read» a file, «list» a directory
│  • write: «write» a file, «link»/«unlink» i-nodes to a directory
│  • execute: ‹exec› a program, enter the directory
│  • execute as owner (group): ‹setuid›/‹setgid›


│ Permission Bits
│
│  • basic UNIX «permissions» can be encoded in «9 bits»
│  • 3 bits per 3 subject designations
│    ◦ first comes the owner, then group, then others
│    ◦ written as e.g. ‹rwxr-x–-› or ‹0750›
│  • plus two numbers for the owner/group identifiers


│ Changing File Ownership
│
│  • the owner and ‹root› can change file owners
│  • ‹chown› and ‹chgrp› system utilities
│  • or via the C API
│    ◦ ‹chown()›, ‹fchown()›, ‹fchownat()›, ‹lchown()›
│    ◦ same set for ‹chgrp›


│ Changing File Permissions
│
│  • again available to the owner and to ‹root›
│  • ‹chmod› is the user space utility
│    ◦ either numeric argument: ‹chmod 644 file.txt›
│    ◦ or symbolic: ‹chmod +x script.sh›
│  • and the corresponding system call (numeric-only)


│ ‹setuid› and ‹setgid›
│
│  • «special permissions» on «executable» files
│  • they allow ‹exec› to also change the process owner
│  • often used for granting extra privileges
│    ◦ e.g. the ‹mount› command runs as the «super-user»


│ Sticky Directories
│
│  • file creation and deletion is a «directory» permission
│    ◦ this is problematic for «shared directories»
│    ◦ in particular the system ‹/tmp› directory
│  • in a «sticky» directory, different rules apply
│    ◦ new files can be created as usual
│    ◦ only the «owner» can «unlink» a file from the directory


│     
│ Access Control Lists
│
│  • ACL is a list of ACE's (access control «elements»)
│    ◦ each ACE is a subject + verb pair
│    ◦ it can name an arbitrary user
│  • ACL is attached to an object (file, directory)
│  • more flexible than the traditional UNIX system


│ ACLs and POSIX
│
│  • part of POSIX.1e (security extensions)
│  • most POSIX systems implement ACLs
│    ◦ this does «not» supersede UNIX permission bits
│    ◦ instead, they are interpreted as part of the ACL
│  • «file system» support is not universal (but widespread)


│ Device Files
│
│  • UNIX represents «devices» as «special i-nodes»
│    ◦ this makes them subject to normal «access control»
│  • the particular device is described in the «i-node»
│    ◦ only a «super-user» can create device nodes
│    ◦ users could otherwise gain access to any device


│ Sockets and Pipes
│
│  • «named» sockets and pipes are just «i-nodes»
│    ◦ also subject to standard file permissions
│  • especially useful with «sockets»
│    ◦ a service sets up a «named socket» in the file system
│    ◦ «file permissions» decide who can talk to the service


│ Special Attributes
│
│  • flags that allow «additional restrictions» on file use
│    ◦ e.g. «immutable» files (cannot be changed by anyone)
│    ◦ «append-only» files (for logfile integrity protection)
│    ◦ compression, copy-on-write controls
│  • «non-standard» (Linux ‹chattr›, BSD ‹chflags›)


│ Network File System
│
│  • NFS 3.0 simply transmits numeric ‹uid› and ‹gid›
│    ◦ the numbering needs to be «synchronised»
│    ◦ can be done via a «central user database»
│  • NFS 4.0 uses «per-user» authentication
│    ◦ the user authenticates to the server directly
│    ◦ filesystem ‹uid› and ‹gid› values are mapped


│ File System Quotas
│
│  • «storage space» is limited, «shared» by users
│    ◦ files take up storage space
│    ◦ file ownership is also a «liability»
│  • «quotas» set up «limits» space use by users
│    ◦ exhausted quota can lead to «denial» of «access»


│ Removable Media
│
│  • access control at «file system» level makes no sense
│    ◦ other computers may choose to «ignore» permissions
│    ◦ «user names» or id's would not make sense anyway
│  • option 1: «encryption» (for denying reads)
│  • option 2: «hardware»-level controls
│    ◦ usually read-only vs read-write on the entire medium


│ The ‹chroot› System Call
│
│  • each process in UNIX has its own «root directory»
│    ◦ for most, this coincides with the «system root»
│  • the root directory can be changed using ‹chroot()›
│  • can be useful to «limit» file system «access»
│    ◦ e.g. in «privilege separation» scenarios


│ Uses of ‹chroot›
│
│  • ‹chroot› alone is «not» a security mechanism
│    ◦ a super-user process can «get out» easily
│    ◦ but not easy for a «normal user» process
│  • also useful for «diagnostic» purposes
│  • and as lightweight alternative to «virtualisation»


## Sub-User Granularity

In this section, we will explore a few cases where a more precise
notion of an access control subject is required or useful.

│ Users are Not Enough
│
│  • users are not always the right abstraction
│    ◦ «creating users» is relatively «expensive»
│    ◦ only a super-user can create new users
│  • you may want to include «programs» as «subjects»
│    ◦ or rather, the combination user + program

One of the main drawbacks of the user-centric security paradigm is
heavyweight and requires super-user privileges. Moreover, normal
users cannot easily constrain processes under auxiliary users (only
via a ‹setuid› helper, which must again be configured by the ‹root›
user).

A natural extension of the concept of an «access control subject» is
to include the currently running program in the description –
allowing the policy to say things like ‹/home/xuser/mail› can be
accessed by thunderbird (a mail client) running under the account of
‹xuser›, but not by firefox (a web browser) running under the same
account.

│ Naming Programs
│
│  • users have user names, but how about programs?
│  • option 1: cryptographic «signatures»
│    ◦ «portable» across computers but «complex»
│    ◦ establishes «identity» based on the «program itself»
│  • option 2: i-node of the «executable»
│    ◦ simple, local, identity based on «location»

Unfortunately, attaching policy rules to programs is much harder
than it is for files or users, since their identity is rather
elusive. There might be any number of programs called thunderbird,
some of which may be different versions or builds of the same
software, but some might just claim to be thunderbird to get to
one's email.

A fairly good, if complicated, solution is to embed a cryptographic
signature into executables, stating the rough equivalent of ‘this
program is Firefox, signed by Mozilla’. Assuming we trust Mozilla
(we probably do since we run their software), we can refer to
‘Firefox by Mozilla’ in our access control policy. A variation of
this approach is used by mobile operating systems, like Android and
iOS.

The other option, much simpler, is to add a note like ‘this program
is Firefox‘ to the i-node of the executable. This approach is used
by systems like SELinux (where the note is realized as a «security
label»).

│ Program as a Subject
│
│  • program: passive (file) vs active (processes)
│    ◦ only a «process» can be a subject
│    ◦ but program «identity» is attached to the file
│  • rights of a «process» depend on its «program»
│    ◦ ‹exec()› will change privileges

Now that we have managed to delineate what is a program and how to
identify it, a new problem pops up: in both cases, we have attached
the identity to a file, but it actually belongs to a process.
However, processes being much more dynamic than files, assigning
identifiers to them is even less practical. In this case, we can use
the same trick that was used for ‹setuid› programs: the ‹exec›
system call can examine the binary and adjust the privileges of the
process accordingly.

│ Mandatory Access Control
│
│  • delegates permission control to a «central authority»
│  • often coupled with «security labels»
│    ◦ classifies «subjects» (users, processes)
│    ◦ and also «objects» (files, sockets, programs)
│  • the owner «cannot» change object permissions

Security labels are, in some sense, a generalisation of user groups.
They can be attached to both objects and subjects, and ‹exec›
will update the labels attached to a process based on the labels
attached to the executable (file).

Under mandatory access control, the users are not allowed to change
permissions on objects. However, in practical systems, both modes
are usually combined: discretionary permissions are attached to
files as usual, and applied to an action whenever the mandatory
rules alone would have allowed it.

│ Capabilities
│
│  • not all verbs (actions) need to take objects
│  • e.g. shutting down the computer (there is only one)
│  • mounting file systems (they can't be always named)
│  • listening on ports with number less than 1024

The term ‘capabilities’ is often used to mean one of two forms of
access control policy rules:

 1. where the object is a singleton, i.e. there is only a single
    object for the given action, or
 2. where it is impractical to name the objects or to attach
    permission information to them.

 
│ Dismantling the ‹root› User
│
│  • the traditional ‹root› user is «all-powerful»
│    ◦ “all or nothing” is often unsatisfactory
│    ◦ violates the principle of least privilege
│  • many special properties of ‹root› are capabilities
│    ◦ ‹root› then becomes the user with all capabilities
│    ◦ other users can get selective privileges

In many cases, the simple split between ‹root› and normal users
(which, incidentally, mirrors the split between the kernel and user
programs) is inadequate. There are three principal ways to address
this: 

 1. ‹setuid› programs can extend some of the special ‹root›-only
    privileges to normal users (e.g. ‹mount›, ‹passwd›),
 2. the system of «capabilities» adds the option of allowing certain
    users to perform some of the restricted operations,
 3. the user-level approach mentioned at the end of section 1,
    where the service runs under ‹root› (e.g. PolicyKit).

 
│ Security and Execution
│
│  • security hinges on what is «allowed to execute»
│  • «arbitrary code execution» are the worst exploits
│    ◦ this allows «unauthorized» execution of code
│    ◦ same effect as «impersonating» the user
│    ◦ almost as bad as stolen credentials

Control over which code can execute (and with what privileges) is at
the center of all access control restrictions. If a program can be
tricked into executing code supplied by an attacker, all the
privileges that the program had are automatically available to the
attacker as well.

│ Untrusted Input
│
│  • programs often process «data» from «dubious sources»
│    ◦ think image viewers, audio & video players
│    ◦ archive extraction, font rendering, ...
│  • bugs in programs can be «exploited»
│    ◦ the program can be «tricked» into «executing data»

The most common way programs can be hijacked in this manner is
through improper processing of «untrusted inputs», that is, content
coming from untrustworthy sources. If unexpected input data can
derail program execution, this opens the door for an attacker to
take control of the program.

The payload (the code that the attacker wants executed) is usually
supplied as part of the input, and hence is normally treated as data
by the program. However, in presence of certain bug, the program can
be tricked into executing (or interpreting) this data as code.

│ Process as a Subject
│
│  • some privileges can be tied to a particular «process»
│    ◦ those only apply during the «lifetime» of the process
│    ◦ often «restrictions» rather than privileges
│    ◦ this is how «privilege dropping» is done
│  • restrictions are «inherited» across ‹fork()›

Programs (or parts of programs running in a separate process) can
ask the operating system to remove some of their privileges (like
file system access, network access, and so on). There are many ways
to do this, though they are not very portable (i.e. they depend on
non-POSIX features of particular operating systems, e.g. Linux user
namespaces, seccomp, FreeBSD Capsicum, OpenBSD ‹pledge› and ‹unveil›
and so on).

One of the few portable approaches, known as privilege drop, is
essentially a subset of privilege separation: a special user is
created for the particular process and the process, after having
done any privileged initialization operations that it needed to do,
uses ‹setuid› and perhaps ‹chroot› to lock itself down.

│ Sandboxing
│
│  • tries to «limit damage» from code execution «exploits»
│  • the program «drops» all privileges it can
│    ◦ this is done «before» it touches any of the «input»
│    ◦ the attacker is stuck with the «reduced privileges»
│    ◦ this can often prevent a successful attack

Sandboxing is a collection of techniques (including some of the
above) that tries to minimize the impact of a successful exploit
against a program. Sandboxing can be voluntary (the program sets up
its own sandbox) and involuntary (see also next slide).

│ Untrusted Code
│
│  • traditionally, you would only execute «trusted» code
│    ◦ often based on «reputation» or other «external» factors
│    ◦ this does not «scale» to a large number of vendors
│  • it is common to execute «untrusted», even dubious code
│    ◦ this can be okay with sufficient «sandboxing»

Running code from questionable sources is always risky, but is
essentially guaranteed to result in a compromise unless precautions
are taken. However, since the modern web is full of executable code,
we simply resort to locking it down as much as we can and hope for
the best.

│ API-Level Access Control
│
│  • capability system for «user-level resources»
│    ◦ things like contact lists, calendars, bookmarks
│    ◦ objects not provided directly by the kernel
│  • enforcement e.g. via a «virtual machine»
│    ◦ not applicable to execution of «native code»
│    ◦ alternative: an IPC-based API

Selectively granting permissions to programs through user-level
permission systems is also possible for non-root users. There are
two commonly employed methods:

 1. a (program-level) virtual machine, like the JVM or the
    javascript virtual machines built into web browsers, which
    enforce that the program only talks to the system through
    restricted APIs,
 2. a strict sandbox with the only access to the system provided by
    a daemon running on the outside of the sandbox (e.g. snap and
    flatpak, to a degree).

Both approaches can be combined, with a common technique locking a
VM using OS-level sandboxing to defend against security bugs in the
VM itself.

│ Android/iOS Permissions
│
│  • applications from a store are «semi-trusted»
│  • typically «single-user» computers/devices
│  • permissions are attached to «apps» instead of users
│  • partially virtual users, partially API-level

On Android, for instance, each application gets its own virtual user
with very limited permissions and interaction with the system is
done almost exclusively through high-level APIs. These APIs then
perform permission checks, possibly prompting the user for
confirmation as needed.

│ Review Questions
│
│  37. What is a user?
│  38. What is the principle of least privilege?
│  39. What is an access control object?
│  40. What is a sandbox?