# Network Stack

In this lecture, we will look at networking from the point of view
of the operating system. We will mainly focus on the internet stack:
that is TCP/IP and related protocols and host name resolution. We
will also look at network file systems (i.e. file systems which are
stored by one computer on a network, but can be used by multiple
other computers on the same network).

│ Lecture Overview
│
│  1. Networking Intro
│  2. The TCP/IP Stack
│  3. Using Networks
│  4. Network File Systems

We will first do a quick recap of networking terminology and of the
basic concepts in general terms. Afterwards we will look at the
TCP/IP stack more specifically, and how it matches the more general
notions introduced earlier. The next part of the lecture will focus
on network-related application programming interfaces. Finally, we
will look at file system sharing in a network environment.


## Networking Intro 

In this section, we will mostly deal with familiar network-related
concepts, so that we have sufficient context down the line, when we
delve into a bit more detail and into OS-level specifics.

│ Host and Domain Names
│
│  • «hostname» = human readable computer name
│  • «hierarchical» system, little endian: ‹www.fi.muni.cz›
│  • FQDN = fully-qualified domain name
│  • the «local» «suffix» may be omitted (‹ping aisa›)

The first thing we need to understand is how to identify computers
within a network. The primary means to do this is via «hostnames»:
human-readable names, which come in two flavours: the name of the
computer itself, and a fully-qualified name, which includes the name
of the network to which the computer is connected, so to speak.

│ Network Addresses
│
│  • address = «machine»-friendly and numeric
│  • IPv4 address: 4 octets (bytes): ‹192.168.1.1›
│    ◦ the octets are ordered MSB-first (big endian)
│  • IPv6 address: 16 octets
│  • Ethernet (MAC): 6 octets, ‹c8:5b:76:bd:6e:0b›

While humans prefer to refer to computers using human-readable
names, those are not suitable for actual communication. Instead,
when computers need to refer to other computers, they use numeric
addresses (just like with memory locations or disk sectors).
Depending on the protocol, the size and structure of the address may
be different: traditional IPv4 uses 4 octets, while the addresses in
the newer IPv6 use up 16 (128 bits). One other type of address that
you can commonly encounter is MAC (from media access control), which
is best known from the Ethernet protocol.

│ Network Types
│
│  • LAN = Local Area Network
│    ◦ Ethernet: «wired», up to 10Gb/s
│    ◦ WiFi (802.11): «wireless», up to 1Gb/s
│  • WAN = Wide Area Network (the internet)
│    ◦ PSTN, xDSL, PPPoE
│    ◦ GSM, 2G (GPRS, EDGE), 3G (UMTS), 4G (LTE)
│    ◦ also LAN technologies – Ethernet, WiFi

Networks are broadly categorized into two types: local area,
spanning an office, a household, maybe a building. LAN is usually a
single «broadcast domain», which means, roughly speaking, that each
computer can directly reach any other computer attached to the same
LAN. The most common technologies (layers 1 and 2) used in LANs are
the wired «ethernet» (the most common variety running at 1Gb/s, less
common but still mainstream versions at 10Gb/s) and the wireless
«WiFi» (formally known as IEEE 802.11).

Wide-area networks, on the other hand, span large distances and
connect a large number of computers. The canonic WAN is the
internet, or the network of an ISP (internet service provider). Wide
area networks often use a different set of low-level technologies.

│ Networking Layers
│
│  1. Link (Ethernet, WiFi)
│  2. Internet / Network (IP)
│  3. Transport (TCP, UDP, ...)
│  4. Application (HTTP, SMTP, ...)

The standard model of networking (known as Open Systems
Interconnection, or OSI for short) splits the stack into 7 layers,
but TCP/IP-centric view of networking often only distinguishes 4, as
outlined above. The link layer roughly corresponds to OSI layers 1
(physical) and 2 (data), the internet layer is OSI layer 3, the
transport layer is OSI layer 4 and the rest (OSI layers 5 through 7)
is lumped under the application layer.

We will follow the simplified TCP/IP model, «but» whenever we refer
to layers by number, those are the OSI numbers, as is customary
(specifically, IP is layer 3 and TCP is layer 4).

│ Networking and Operating Systems
│
│  • a «network stack» is a standard part of an OS
│  • large part of the stack lives in the «kernel»
│    ◦ although this only applies to «monolithic» kernels
│    ◦ microkernels use «user-space» networking
│  • another chunk is in system «libraries» & «utilities»

For the last two decades or so, networking has been a standard
service provided by general-purpose operating systems. In systems
with a monolithic kernel, a significant part of the network stack
(everything up to and including the transport layer) is part of the
kernel and is exposed to user programs via the sockets API.

Additional application-layer functionality is usually available in
system libraries: most importantly domain name resolution (DNS) and
encryption (TLS, short for transport-layer security, which is
confusingly enough an application-layer technology).

│ Kernel-Side Networking
│
│  • device «drivers» for networking «hardware»
│  • network and transport «protocol» layers
│  • «routing» and packet filtering (firewalls)
│  • networking-related «system calls» (sockets)
│  • network «file systems» (SMB, NFS)

The link layer is generally covered by device drivers and the client
and server sides of TCP/IP are exposed via the socket API. There are
additional components in TCP/IP networks, though: some of them, like
routing and packet filtering can be often done in software, and if
this is the case, they are usually implemented in the kernel.
Bridging and switching (which belong to the link layer) can be done
in software too, but is rarely practical. However, many operating
systems implement one or both to better support virtualisation.

A few application-layer network services may be implemented in the
kernel too, most notably network file systems, but sometimes also
other protocols (e.g. kernel-level HTTP acceleration).

│ System Libraries
│
│  • the «socket» and related APIs
│  • host «name resolution» (a DNS client)
│  • «encryption» and data «authentication» (SSL, TLS)
│  • «certificate» handling and validation

Strictly speaking, the socket API is the domain of system libraries
(though in most monolithic kernels, the C functions will map 1:1 to
system calls; however, in microkernels, the networking stack is
split differently and system libraries are likely to pick up a
bigger share of the work).

Since nearly all network-related programs need to be able to resolve
hostnames (translate the human-readable name to an IP address), this
service is usually provided by system libraries. Likewise,
encryption is ubiquitous in the modern internet, and most operating
systems provide an SSL/TLS stack, including certificate management.

│ System Utilities & Services
│
│  • network «configuration» (‹ifconfig›, ‹dhclient›, ‹dhcpd›)
│  • route management (‹route›, ‹bgpd›)
│  • «diagnostics» (‹ping›, ‹traceroute›)
│  • packet logging and inspection (‹tcpdump›)
│  • other network services (‹ntpd›, ‹sshd›, ‹inetd›)

The last component of the network stack is located in system
utilities and services (daemons). Those are concerned with
configuration (including assigning addresses to interfaces and
autoconfiguration, e.g. DHCP or SLAAC) and route management
(especially important for software-based routers and multi-homed
systems).

A suite of diagnostic tools is also usually present, at very least
the ‹ping› and ‹traceroute› programs which are useful for checking
connectivity, perhaps tools like ‹tcpdump› which allow the operator
to inspect packets arriving at an interface.

│ Networking Aspects
│
│  • packet format
│    ◦ what are the «units of communication»
│  • addressing
│    ◦ how are the sender and recipient «named»
│  • packet delivery
│    ◦ how a message is «delivered»

When looking at a network protocol, there are three main aspects to
consider: the first is, what constitutes the unit of communication,
i.e. how the packets look, what information they carry and so on.
The second is addressing: how are target computers and/or programs
designated. Finally, packet delivery is concerned with how messages
are delivered from one address to another: this could involve
routing and/or address translation (e.g. between link addresses and
IP addresses).

│ Protocol Nesting
│
│  • protocols run «on top» of each other
│  • this is why it is called a network «stack»
│  • higher levels make use of the lower levels
│    ◦ HTTP uses abstractions provided by TCP
│    ◦ TCP uses abstractions provided by IP

Since we are talking about a «protocol stack», it is important to
understand how the individual layers of the stack interact with each
other. Each of the above aspects cuts through the stack slightly
differently – we will discuss each in a bit more detail in the
following few slides.

│ Packet Nesting
│
│  • higher-level «packets» are just «data» to the lower level
│  • an Ethernet «frame» can carry an «IP packet» in it
│  • the «IP packet» can carry a «TCP packet»
│  • the «TCP packet» can carry (a fragment of) an «HTTP request»

When we consider packet structure, it is most natural to start with
the bottom layers: the packets of the higher layers are simply data
for the lower layer. The overall packet structure looks like a
matryoshka: an ethernet frame is wrapped around an IP packet is
wrapped around an UDP packet and so on.

From the point of view of the upper layers, packet size is an
important consideration: when packet-oriented protocols are nested
in other packet-oriented protocols, it is useful if they can match
their packet sizes (most protocols have a limit on packet size).
With the size limitations in mind, in the view ‘from top’, a packet
is handed down to the lower layer as data, the upper layer being
oblivious to the additional framing (headers) that the lower layer
adds.

│ Stacked Delivery
│
│  • delivery is, in the abstract, «point-to-point»
│    ◦ routing is mostly «hidden» from upper layers
│    ◦ the upper layer requests «delivery» to an «address»
│  • lower-layer protocols are usually «packet-oriented»
│    ◦ packet size mismatches can cause «fragmentation»
│  • a packet can pass through «different» low-level «domains»

When it comes to delivery, the relationships between layers are
perhaps the most complicated. In this case, the view from top to
bottom is the most appropriate, since lower layers provide delivery
as a service to the upper layer. 

Since the delivery on the internet layer (OSI layers 3 and up) is
usually much wider in scope than that of the link layer, it is quite
common that a single IP packet will traverse a number of link-layer
domains.

│ Layers vs Addressing
│
│  • not as straightforward as packet nesting
│    ◦ address relationships are tricky
│  • «special protocols» exist to translate addresses
│    ◦ DNS for hostname vs IP address mapping
│    ◦ ARP for IP vs MAC address mapping

Finally, since (packet, data) delivery is a service provided by the
lower layers to the upper layers, the upper layer must understand
and provide correct lower-level addresses. The easiest way to look
at this aspect is pairwise: the link layer and the internet layer
obviously need to interact, usually through a special protocol which
executes on the link layer, but logically belongs to the internet
layer, since it deals with IP addresses.

Situation between the internet and transport layers is much simpler:
the address at the transport layer simply contains the internet
layer address as a field (e.g. a TCP address is an IP address + a
port number).

Finally, the relationship between the application layer and the
transport layer is analogous (but not entirely the same) to the
internet/link situation. The application layer primarily uses host
names to identify computers, and uses a special protocol, known as
DNS, which operates using transport-layer addresses, but otherwise
belongs to the application layer.

│ ARP (Address Resolution Protocol)
│
│  • finds the MAC that corresponds to an IP
│  • required to allow «packet delivery»
│    ◦ IP uses the «link layer» to deliver its packets
│    ◦ the link layer must be given a «MAC address»
│  • the OS builds a «map» of IP $→$ MAC «translations»

The address resolution protocol, which straddles the link/internet
boundary, enables the internet layer to deliver its packets using
the services of the link layer. Of course, to request link-layer
delivery of a packet, a link address is required, but the IP packet
only contains an IP address. The ARP protocol is used to find link
addresses of IP nodes which exist in the local network (this
includes routers, which operate on the internet layer – in other
words, packets destined to leave the local network are sent to a
router, using the router's IP address, which is translated into a
link-layer address using ARP as usual).

│ Ethernet
│
│  • «link-level» communication protocol
│  • largely implemented «in hardware»
│  • the OS uses a well-defined interface
│    ◦ packet receive and submit
│    ◦ using MAC addresses (ARP is part of the OS)

Perhaps the most common link layer protocol is ethernet. Most of the
protocol is implemented directly in hardware and the operating
system simply uses an unified interface exposed by device drivers to
send and receive ethernet frames.

│ Packet Switching
│
│  • «shared media» are inefficient due to «collisions»
│  • ethernet is typically «packet switched»
│    ◦ a «switch» is usually a «hardware device»
│    ◦ but also in software (usually for virtualisation)
│    ◦ physical connections form a «star topology»

High-speed networks are almost exclusively «packet switched», that
is, a node sends packets (frames) to a «switch», which has a number
of physical ports and keeps track of which MAC addresses are
reachable on which physical ports. When a frame arrives to a
switch, the recipient MAC address is extracted, and the packet is
forwarded to the physical port(s) which are associated to that MAC
address.

│ Bridging
│
│  • bridges operate at the «link layer» (layer 2)
│  • a bridge is a two-port device
│    ◦ each port is connected to a «different LAN»
│    ◦ the bridge joins the LANs by «forwarding» frames
│  • can be done in hardware or software
│    ◦ ‹brctl› on Linux, ‹ifconfig› on OpenBSD

Bridges are analogous to switches, with one major difference: the
expectation for a switch is that there are many physical ports, but
each has only one MAC address attached to it (with perhaps the
exception of a special ‘uplink’ port). A bridge, on the other hand,
is optimized for the case of two physical ports, but each side will
have many MAC addresses associated with it.

│ Tunneling
│
│  • tunnels are «virtual layer 2 or 3 devices»
│  • they «encapsulate» traffic using a higher-level protocol
│  • tunneling can implement «Virtual Private Networks»
│    ◦ a «software bridge» can operate over an UDP tunnel
│    ◦ the tunnel is usually «encrypted»

Tunnelling is a technique which allows lower-layer traffic to be
nested in the application layer of an existing network. The typical
use case is to tie physically distant computers into a single
broadcast (link layer) or routing (internet layer) domain.

In this case, there are two instances of the network stack: the VPN
software implements an application layer protocol running in the
outer stack, while also acting as a link-layer interface (or an
internet-layer subnet) that is bridged (routed) as if it was just
another physical interface.

│ PPP (Point-to-Point Protocol)
│
│  • a «link-layer» protocol for «2-node networks»
│  • available over many «physical connections»
│    ◦ phone lines, cellular connections, DSL, Ethernet
│    ◦ often used to connect endpoints to the ISP
│  • supported by most operating systems
│    ◦ split between the «kernel» and «system utilities»

The point-to-point protocol is another somewhat important and
ubiquitous example of a link-layer protocol and is usually found on
connections between LANs, or between a LAN and a WAN.

│ Wireless
│
│  • WiFi is mostly like (slow, unreliable) Ethernet
│  • needs «encryption» since anyone can listen
│  • also «authentication» to prevent «rogue connections»
│    ◦ PSK (pre-shared key), EAP / 802.11x
│  • encryption needs «key management»

Finally, WiFi is, from the point of view of the rest of the stack,
essentially a slow, unreliable version of ethernet, though
internally, the protocol is much more complicated.

## The TCP/IP Stack

In this section, we will look at the TCP/IP stack proper, and we
will also discuss DNS in a bit more detail.

│ IP (Internet Protocol)
│
│  • uses 4 byte (v4) or 16 byte (v6) addresses
│    ◦ split into «network» and «host» parts
│  • it is a packet-based protocol
│  • is a «best-effort» protocol
│    ◦ packets may get lost, reordered or corrupted

IP is a low-overhead, packet-oriented protocol in wide use across
the internet and most local area networks (whether they are attached
to the internet or not). Quite importantly, its low-overhead nature
means that it does not guarantee delivery, nor the integrity of the
data it transports.

│ IP Networks
│
│  • IP networks roughly correspond to LANs
│    ◦ hosts on the «same network» are located with ARP
│    ◦ «remote» networks are reached via «routers»
│  • a «netmask» splits the address into network/host parts
│  • IP typically runs on top of Ethernet or PPP

Within a single IP network, delivery is handled by the link layer –
the local network being identified by a common address prefix (the
length of this prefix is part of the network configuration, and is
known as the netmask).

│ Routing
│
│  • routers «forward» packets «between networks»
│  • somewhat like «bridges» but «layer 3»
│  • routers act as normal «LAN endpoints»
│    ◦ but represent entire remote IP networks
│    ◦ or even the entire internet

Packets for recipients outside the local network (i.e. those which
do not share the network part of the address with the local host)
are «routed»: a layer 3 device, analogous to a layer 2 switch,
forwards the packet to one of its interfaces (into another
link-layer domain). The «routing tables» are, however, much more
complex than the information maintained by a switch, and their
maintenance across the internet is outside the scope of this
subject.

│ ICMP: Internet Control Message Protocol
│
│  • «control» messages (packets)
│    ◦ destination host/network unreachable
│    ◦ time to live exceeded
│    ◦ fragmentation required
│  • «diagnostic» packets, e.g. the ‹ping› command
│    ◦ ‹echo request› and ‹echo reply›
│    ◦ combine with TTL for ‹traceroute›

ICMP is the ‘service protocol’ used for diagnostics, error reporting
and network management. The role of ICMP was substantially extended
with the introduction of IPv6 (e.g. to include automatic network
configuration, via router advertisements and router solicitation
packet types). ICMP does not directly provide any services to the
application layer.

│ Services and TCP/UDP Port Numbers
│
│  • networks are generally used to «provide services»
│    ◦ each computer can host multiple
│  • different «services» can run on different «ports»
│  • port is a 16-bit number and some are given names
│    ◦ port 25 is SMTP, port 80 is HTTP, ...

As we have briefly mentioned earlier, transport-layer addresses have
two components: the IP address of the destination computer and a
«port number», which designates a particular service or application
running on the destination node.

│ TCP: Transmission Control Protocol
│
│  • a «stream»-oriented protocol on top of IP
│  • works like a «pipe» (transfers a byte sequence)
│    ◦ must respect «delivery order»
│    ◦ and also «re-transmit» lost packets
│  • must establish «connections»

The two main transport protocols in the TCP/IP protocol family are
TCP and UDP, with the former being more common and also considerably
more complicated. Since TCP is stream-oriented and reliable, it
needs to implement the logic to slice a byte stream into individual
packets (for delivery using IP, which is packet-oriented),
consistency checks (packet checksums) and retransmission logic (in
case IP packets carrying TCP data are lost).

│ TCP Connections
│
│  • the endpoints must establish a «connection» first
│  • each connection serves as a separate «data stream»
│  • a connection is «bidirectional»
│  • TCP uses a 3-way handshake: SYN, SYN/ACK, ACK

To provide stream semantics to the user, TCP must implement a
mechanism which creates the illusion of a byte stream on top of a
packet-based foundation. This mechanism is known as a «connection»,
and essentially consists of some state shared by the two endpoints.
To establish this shared state, TCP uses a 3-way handshake.

│ Sequence Numbers
│
│  • TCP packets carry «sequence numbers»
│  • these numbers are used to «re-assemble» the stream
│    ◦ IP packets can arrive «out of order»
│  • they are also used to «acknowledge reception»
│    ◦ and subsequently to manage re-transmission

Sequence numbers are part of the connection state, and allow the
byte stream to be reassembled in the correct order, even if IP
packets carrying the stream get reordered during delivery.

│ Packet Loss and Re-transmission
│
│  • packets can get «lost» for a variety of reasons
│    ◦ a «link goes down» for an extended period of time
│    ◦ «buffer overruns» on routing equipment
│  • TCP sends «acknowledgments» for received packets
│    ◦ the ACKs use «sequence numbers» to identify packets

Besides packet reordering, TCP also needs to deal with «packet
loss»: an event where an IP packet is sent, but vanishes without
trace en-route to its destination. A lost packet is detected as a
gap in sequence numbers. However, it is the «sender» which must
learn about a lost packet, so that it can be retransmitted: for this
reason, the recipient of the packet must «acknowledge» its receipt,
by sending a packet back (or more often, by piggybacking the
acknowledgement on a data packet that it would send anyway),
carrying the sequence numbers of packets that have been received.

If an acknowledgement is not received within certain time
(dynamically adjusted) from the sending of the original packet, the
packet is sent again (retransmitted).

│ UDP: User (Unreliable) Datagram Protocol
│
│  • TCP comes with non-trivial «overhead»
│    ◦ and its guarantees are «not always required»
│  • UDP is a much «simpler» protocol
│    ◦ a very thin wrapper around IP
│    ◦ with «minimal overhead» on top of IP

Not all applications need the comparatively strong guarantees that
TCP provides, or conversely, cannot tolerate the additional latency
introduced by the algorithms that TCP employs to ensure reliable,
in-order delivery. For those cases, UDP presents a very light-weight
layer on top of IP, essentially only adding the port number to the
addresses, and a 16-bit checksum to the packet header (which is, in
its entirety, only 64 bits long).

│ Firewalls
│
│  • the «name» comes from building construction
│    ◦ a fire-proof barrier between parts of a building
│  • the idea is to «separate networks» from each other
│    ◦ making attacks harder from the outside
│    ◦ «limiting damage» in case of compromise

Firewall is a device which separates two networks from each other,
typically by acting as the (only) router between them, but also
examining the packets and dropping or rejecting them if they appear
malicious, or attempt to use services that are not supposed to be
visible externally. Often, one of these networks is the internet.
Sometimes, the other network is just a single computer.

│ Packet Filtering
│
│  • packet filtering is an «implementation» of a «firewall»
│  • can be done on a «router» or at an «endpoint»
│  • «dedicated» routers + packet filters are «more secure»
│    ◦ a «single» such «firewall» protects the «entire network»
│    ◦ less opportunity for mis-configuration

Like with other services, it usually pays off to centralize (within a
single network) the responsibility for packet filtering, reducing
the administrative burden and the space for misconfigured nodes to
endanger the entire network. Of course, it is reasonable to run
local firewalls on each node, as a second line of defence.

│ Packet Filter Operation
│
│  • packet filters operate on a set of «rules»
│    ◦ the rules are generally «operator»-provided
│  • each incoming packet is «classified» using the rules
│  • and then «dispatched» accordingly
│    ◦ may be «forwarded», dropped, «rejected» or edited

A packet filter is, essentially, a finite state machine (perhaps
with a bit of memory for connection tracking, in which case it is a
«stateful» packet filter) which examines each packet and decides
what action to take on it. The specific classification rules are
usually provided by the network administrator; in simple cases, they
match on source and destination IP addresses and port numbers, and
on the connection status (which is remembered by the packet filter),
for TCP packets.

After they are classified, the packets can be forwarded to their
destination (as a standard router would), quietly dropped, rejected
(sending an ICMP notification to the sender) or adjusted before
being sent along (most commonly for network address translation, or
NAT, the details of which are out of scope of this subject).

│ Packet Filter Examples
│
│  • packet filters are often part of the «kernel»
│  • the rule parser is a system utility
│    ◦ it loads rules from a «configuration file»
│    ◦ and sets up the kernel-side filter
│  • there are multiple «implementations»
│    ◦ ‹iptables›, ‹nftables› in Linux
│    ◦ ‹pf› in OpenBSD, ‹ipfw› in FreeBSD

There are usually two components to a packet filter: one is a system
utility which reads a human-readable description of the rules, and
based on those, compiles an efficient matcher for use in the kernel
component which does the actual classification.

│ Name Resolution
│
│  • users do not want to remember «numeric addresses»
│    ◦ phone numbers are bad enough
│  • host «names» are used instead
│  • can be stored in a file, e.g. ‹/etc/hosts›
│    ◦ not very practical for more than 3 computers
│    ◦ but there are millions of computers on the internet

In the last part of this section, let's have a look at hostname
resolution and the DNS protocol. What we need is a directory (a
yellow pages sort of thing), but one that can be efficiently updated
(many updates are done every hour) and also efficiently queried by
computers on the network. The system must be scalable enough to
handle many millions of names.

│ DNS: Domain Name System
│
│  • hierarchical «protocol» for name resolution
│    ◦ runs on top of TCP or UDP
│  • domain «names are split» into parts using dots
│    ◦ each domain knows whom to ask for the next bit
│    ◦ the name database is effectively «distributed»

Essentially, at the internet scale, we need some sort of a
distributed system (i.e. a distributed database). Unlike relational
databases though, delays in update propagation are acceptable,
making the design simpler.

The name space of host names is organized hierarchically, and the
structure of DNS follows this organisation: going from right to
left, starting with the top-level domain (a single dot, often left
out), one of the DNS servers for that domain is consulted about the
name immediately to the left, usually resulting in the address of
another DNS server which can get us more information. The process is
repeated until the entire name is resolved, usually resulting in an
IP address of the host.

│ DNS Recursion
│
│  • take ‹www.fi.muni.cz.› as an example domain
│  • resolution starts from the right at «root servers»
│    ◦ the root servers refer us to the ‹cz.› servers
│    ◦ the ‹cz.› servers refer us to ‹muni.cz›
│    ◦ finally ‹muni.cz.› tells us about ‹fi.muni.cz›

The process described above is called «recursion» and is usually
performed by a special type of DNS server, which performs the
recursion on behalf of its clients and caches the results for
subsequent queries. This also means that it can, most of the time,
start from the middle, since the name servers of the one or two
topmost domains are most likely in the cache.

│ DNS Recursion Example
│
│     $ dig www.fi.muni.cz. A +trace
│     .               IN NS j.root-servers.net.
│     cz.             IN NS b.ns.nic.cz.
│     muni.cz.        IN NS ns.muni.cz.
│     fi.muni.cz.     IN NS aisa.fi.muni.cz.
│     www.fi.muni.cz. IN A  147.251.48.1

To observe recursion in practice (and perform other diagnostics on
DNS), we can use the ‹dig› tool, which is part of the ISC (Internet
Software Consortium) suite of DNS-related tools.

│ DNS Record Types
│
│  • ‹A› is for (IP) Address
│  • ‹AAAA› is for an IPv6 Address
│  • ‹CNAME› is for an alias
│  • ‹MX› is for mail servers
│  • and many more

Besides ‹NS› records, which tell the system whom to ask for further
information, there are many types of DNS records, each carrying
different type of information about the name in question. Besides
IPv4 and IPv6 addresses, there are free-form TXT records (which are
used, for instance, by spam filtering systems to learn about
authorized mail servers for a domain), SRV records for service
discovery in local networks, and so on.

## Using Networks

In this section, we will briefly look at the socket API which allows
applications to use and provide network services (on POSIX operating
systems, that is) and at a couple examples of application-level
network services.

│ Sockets Reminder
│
│  • the «socket API» comes from early BSD Unix
│  • socket represents a (possible) «network connection»
│  • you get a «file descriptor» for an open socket
│  • you can ‹read()› and ‹write()› to sockets
│    ◦ but also ‹sendmsg()› and ‹recvmsg()›
│    ◦ and ‹sendto()› and ‹recvfrom()›

Remember that socket is a file-like object, accessible through a
«file descriptor». On connected stream sockets, programs can use the
usual ‹read› and ‹write› system calls, with semantics akin to pipes.
While these are also possible on datagram sockets, a different API
is often preferred, one of the reasons being that with ‹read›, it is
impossible to distinguish datagrams coming from different sources.

The system calls ‹sendto›, ‹recvfrom› allow the program to specify
(or learn, in case of ‹recvfrom›) the address of the recipient
(sender) of the packet.

│ Socket Types
│
│  • sockets can be «internet» or «unix domain»
│    ◦ internet sockets work across networks
│  • «stream» sockets are like files
│    ◦ you can write a continuous «stream» of data
│    ◦ usually implemented using TCP
│  • «datagram» sockets send individual «messages»
│    ◦ usually implemented using UDP

Communication on IP networks is done using «internet sockets» (with
‹domain› set to ‹AF_INET› or ‹AF_INET6›). If the socket is a «stream
socket» (its ‹type› is ‹SOCK_STREAM›) the communication is executed
using TCP (stream-type sockets must be explicitly «connected» by a
call to the ‹connect› or ‹accept› system call, which in case of
internet sockets perform the TCP handshake).

Datagram sockets (‹type› set to ‹SOCK_DGRAM›) may be optionally
‘connected’, though this only sets up a default destination for
datagrams to be sent to. Communication is performed using UDP.

│ Creating Sockets
│
│  • a socket is created using the ‹socket()› function
│  • it can be turned into a «server» using ‹listen()›
│    ◦ individual «connections» are established with ‹accept()›
│  • or into a «client» using ‹connect()›

All types of sockets are created using the ‹socket› system call, and
specialize into server and client sockets based on the subsequent
API calls performed on them. A server socket is obtained through
‹listen› and ‹bind›, while a client socket is obtained using
‹connect›. The server then repeatedly calls ‹accept› which returns a
«new file descriptor» which then represents the TCP connection.

│ Resolver API
│
│  • ‹libc› contains a «resolver»
│    ◦ available as ‹gethostbyname› (and ‹getaddrinfo›)
│    ◦ also ‹gethostbyaddr› for «reverse lookups»
│  • can look in many different places
│    ◦ most systems support at least ‹/etc/hosts›
│    ◦ and DNS-based lookups

The socket API only deals with numeric IP addresses. If an
application needs to be able to connect to computers using their
host names, it needs to use the «resolver API» which, behind the
scenes, uses the appropriate database or protocol to find the
corresponding IP addresses. The exact sequence of steps depends on
system configuration, but usually the resolver consults the
‹/etc/hosts› file and a recursive DNS server (the IP address of
which is again part of system configuration).

│ Network Services
│
│  • servers «listen» on a socket for incoming connections
│    ◦ a client actively establishes a «connection» to a server
│  • the network simply «transfers data» between them
│  • interpretation of the data is a «layer 7» issue
│    ◦ could be «commands», file transfers, ...

Most network services operate in a client-server regime, on top of
TCP: a server passively awaits connections on a particular
transport-layer address (i.e. an IP address coupled with a port
number). The client, on the other hand, actively connects to a
listening server, establishing a bidirectional channel (the TCP
connection) between them. From that point on, the network stack
simply transfers data across that channel. The data usually conforms
to some application-level protocol (SMTP, HTTP, ...) though it does
not need to be standardized or well-known.

│ Network Service Examples
│
│  • (secure) remote shell – ‹sshd›
│  • the internet «email suite»
│    ◦ MTA = Mail Transfer Agent, speaks SMTP
│    ◦ SMTP = Simple Mail-Transfer Protocol
│  • the «world wide web»
│    ◦ web servers provide content (files)
│    ◦ clients and servers speak HTTP and HTTPS

│ Client Software
│
│  • the ‹ssh› command uses the SSH protocol
│    ◦ a very useful system utility on virtually all UNIXes
│  • «web browser» is the client for world wide web
│    ◦ browsers are complex «application» programs
│    ◦ some of them bigger than even operating systems
│  • «email client» is also known as a MUA (Mail User Agent)


## Network File Systems

We have learned earlier that file systems are an important,
ubiquitous abstraction. It is only natural to allow a file system to
be accessed remotely (from another computer) using the API that
is used for local access, making the ‘network’ part almost entirely
transparent to the program.

│ Why Network Filesystems?
│
│  • copying files back and forth is impractical
│    ◦ and also «error-prone» (which is the latest version?)
│  • how about storing data in a «central location»
│  • and «sharing» it with all the computers on the LAN

Perhaps the most compelling case for network file systems arises
from the need to make workstations (desktop computers) at an
institution fungible: that is, allow any user to log in onto any of
the available workstations and immediately have all their data and
settings at hand.

│ NAS (Network-Attached Storage)
│
│  • a (small) «computer» dedicated to «storing files»
│  • usually running a cut down operating system
│    ◦ often based on Linux or FreeBSD
│  • provides «file access» to the network
│  • sometimes additional «app-level services»
│    ◦ e.g. photo management, media streaming, ...

Another use case comes from the desire to store data which is shared
by multiple users on a central device, where it is easy to back up
and accessible from all computers (and hence by all users, even when
some of the other computers are powered down).

│ NFS (Network File System)
│
│  • the traditional UNIX «networked filesystem»
│  • hooked quite deep into the kernel
│    ◦ assumes generally reliable network (LAN)
│  • filesystems are «exported» for use over NFS
│  • the client side «mounts» the NFS-exported volume

NFS is one of the first implementations of a network file system. It
is based, essentially, on hooking up the VFS interface and exporting
it over a remote procedure call interface to other kernels on the
network. To create an NFS share, the local file system must be
«exported» on the would-be NFS server; afterwards, it can be
«mounted» by clients, making the share part of their local file
system hierarchy.

│ NFS History
│
│  • originated in «Sun Microsystems» in the 80s
│  • v2 implemented in System V, DOS, ...
│  • v3 appeared in '95 and is «still in use»
│  • v4 arrives in 2000, improving «security»

Network file system is a rather old technology (nearly 40 years
old), but it has seen significant evolution over the first 20 or so
years, with version 4 mainly addressing security concerns.

│ VFS Reminder
│
│  • «implementation mechanism» for multiple FS types
│  • an object-oriented approach
│    ◦ ‹open›: «look up» the file for access
│    ◦ ‹read›, ‹write› – self-explanatory
│    ◦ ‹rename›: rename a file or directory

Recall, from lecture 4, that VFS (virtual file system switch) is a
mechanism inside the kernel that allows multiple file system
implementations to present a unified interface to the rest of the
kernel. NFS takes advantage of this existing interface and makes it
available over the network. Of course, unlike VFS itself, the
semantics of the NFS functions is standardized across
implementations (NFS clients and servers are mostly compatible
across different UNIX-like operating systems).

│ RPC (Remote Procedure Call)
│
│  • any «protocol» for «calling functions» on «remote hosts»
│    ◦ ONC-RPC = Open Network Computing RPC
│    ◦ NFS is based on ONC-RPC (also known as Sun RPC)
│  • NFS basically runs VFS operations using RPC
│    ◦ «easy to implement» on UNIX-like systems

The way the NFS interface is exposed to the network is via a remote
procedure call mechanism, which essentially takes a procedure call
(the name of the function, along with the arguments that it should
be called with), packs them into a byte string and sends it over the
network to another computer, which then actually performs the call
and sends the result back. The protocol has a mechanism to send
data buffers, in addition to primitive values (integers).

│ Port Mapper
│
│  • ONC-RPC is executed over TCP or UDP
│    ◦ but it is more «dynamic» wrt. available services
│  • TCP/UDP «port numbers» are assigned «on demand»
│  • ‹portmap› «translates» from RPC services to port numbers
│    ◦ the port mapper itself listens on port 111

In modern systems, ONC-RPC is implemented exclusively on top of
the TCP/IP stack. Since the protocol can expose multiple services on
each machine, the need arises to translate between those RPC
services and TCP/UDP port numbers. In most cases, an RPC service
called ‘portmapper’ takes care of this need, itself running on a
fixed port (number 111).

│ The NFS Daemon
│
│  • also known as ‹nfsd›
│  • provides NFS access to a «local file system»
│  • can run as a system service
│  • or it can be part of the kernel
│    ◦ this is more typical for «performance» reasons

Given an RPC stack, NFS is provided by an ‹nfsd›, which registers
itself as a service with the RPC stack. The daemon can be a proper,
user-space daemon, but it can also be part of the kernel (running as
a kernel thread).

│ SMB (Server Message Block)
│
│  • a «network file system» from Microsoft
│  • available in Windows since version 3.1 (1992)
│    ◦ originally ran on top of NetBIOS
│    ◦ later versions used TCP/IP
│  • SMB1 accumulated a lot of cruft and «complexity»

SMB is a completely different implementation of a network
transparency layer for file systems. Like NFS, it is not tied to a
particular on-disk format. SMB saw many incremental changes with
each new Microsoft operating system that came along, while at the
same time it was kept backward compatible, so that older operating
systems could interoperate, both as clients and as servers. This
made the protocol extremely complicated, making further extensions
impractical.

│ SMB 2.0
│
│  • «simpler» than SMB1 due to «fewer retrofits» and compat
│  • better «performance» and «security»
│  • support for «symbolic links»
│  • available since Windows Vista (2006)

Microsoft designed a new protocol for networked filesystems in their
Windows Vista operating system, under the name SMB 2.0. Like NFSv4 a
few years earlier, SMB 2 addressed many of the security weaknesses
of its predecessor, while also improving performance and extending
the protocol to support new file system features, such as symlinks.

│ Review Questions
│
│  29. What is ARP (Address Resolution Protocol)?
│  30. What is IP (Internet Protocol)?
│  31. What is TCP (Transmission Control Protocol)?
│  32. What is DNS (Domain Name Service)?