# Network Stack In this lecture, we will look at networking from the point of view of the operating system. We will mainly focus on the internet stack: that is TCP/IP and related protocols and host name resolution. We will also look at network file systems (i.e. file systems which are stored by one computer on a network, but can be used by multiple other computers on the same network). │ Lecture Overview │ │ 1. Networking Intro │ 2. The TCP/IP Stack │ 3. Using Networks │ 4. Network File Systems We will first do a quick recap of networking terminology and of the basic concepts in general terms. Afterwards we will look at the TCP/IP stack more specifically, and how it matches the more general notions introduced earlier. The next part of the lecture will focus on network-related application programming interfaces. Finally, we will look at file system sharing in a network environment. ## Networking Intro In this section, we will mostly deal with familiar network-related concepts, so that we have sufficient context down the line, when we delve into a bit more detail and into OS-level specifics. │ Host and Domain Names │ │ • «hostname» = human readable computer name │ • «hierarchical» system, little endian: ‹www.fi.muni.cz› │ • FQDN = fully-qualified domain name │ • the «local» «suffix» may be omitted (‹ping aisa›) The first thing we need to understand is how to identify computers within a network. The primary means to do this is via «hostnames»: human-readable names, which come in two flavours: the name of the computer itself, and a fully-qualified name, which includes the name of the network to which the computer is connected, so to speak. │ Network Addresses │ │ • address = «machine»-friendly and numeric │ • IPv4 address: 4 octets (bytes): ‹192.168.1.1› │ ◦ the octets are ordered MSB-first (big endian) │ • IPv6 address: 16 octets │ • Ethernet (MAC): 6 octets, ‹c8:5b:76:bd:6e:0b› While humans prefer to refer to computers using human-readable names, those are not suitable for actual communication. Instead, when computers need to refer to other computers, they use numeric addresses (just like with memory locations or disk sectors). Depending on the protocol, the size and structure of the address may be different: traditional IPv4 uses 4 octets, while the addresses in the newer IPv6 use up 16 (128 bits). One other type of address that you can commonly encounter is MAC (from media access control), which is best known from the Ethernet protocol. │ Network Types │ │ • LAN = Local Area Network │ ◦ Ethernet: «wired», up to 10Gb/s │ ◦ WiFi (802.11): «wireless», up to 1Gb/s │ • WAN = Wide Area Network (the internet) │ ◦ PSTN, xDSL, PPPoE │ ◦ GSM, 2G (GPRS, EDGE), 3G (UMTS), 4G (LTE) │ ◦ also LAN technologies – Ethernet, WiFi Networks are broadly categorized into two types: local area, spanning an office, a household, maybe a building. LAN is usually a single «broadcast domain», which means, roughly speaking, that each computer can directly reach any other computer attached to the same LAN. The most common technologies (layers 1 and 2) used in LANs are the wired «ethernet» (the most common variety running at 1Gb/s, less common but still mainstream versions at 10Gb/s) and the wireless «WiFi» (formally known as IEEE 802.11). Wide-area networks, on the other hand, span large distances and connect a large number of computers. The canonic WAN is the internet, or the network of an ISP (internet service provider). Wide area networks often use a different set of low-level technologies. │ Networking Layers │ │ 1. Link (Ethernet, WiFi) │ 2. Internet / Network (IP) │ 3. Transport (TCP, UDP, ...) │ 4. Application (HTTP, SMTP, ...) The standard model of networking (known as Open Systems Interconnection, or OSI for short) splits the stack into 7 layers, but TCP/IP-centric view of networking often only distinguishes 4, as outlined above. The link layer roughly corresponds to OSI layers 1 (physical) and 2 (data), the internet layer is OSI layer 3, the transport layer is OSI layer 4 and the rest (OSI layers 5 through 7) is lumped under the application layer. We will follow the simplified TCP/IP model, «but» whenever we refer to layers by number, those are the OSI numbers, as is customary (specifically, IP is layer 3 and TCP is layer 4). │ Networking and Operating Systems │ │ • a «network stack» is a standard part of an OS │ • large part of the stack lives in the «kernel» │ ◦ although this only applies to «monolithic» kernels │ ◦ microkernels use «user-space» networking │ • another chunk is in system «libraries» & «utilities» For the last two decades or so, networking has been a standard service provided by general-purpose operating systems. In systems with a monolithic kernel, a significant part of the network stack (everything up to and including the transport layer) is part of the kernel and is exposed to user programs via the sockets API. Additional application-layer functionality is usually available in system libraries: most importantly domain name resolution (DNS) and encryption (TLS, short for transport-layer security, which is confusingly enough an application-layer technology). │ Kernel-Side Networking │ │ • device «drivers» for networking «hardware» │ • network and transport «protocol» layers │ • «routing» and packet filtering (firewalls) │ • networking-related «system calls» (sockets) │ • network «file systems» (SMB, NFS) The link layer is generally covered by device drivers and the client and server sides of TCP/IP are exposed via the socket API. There are additional components in TCP/IP networks, though: some of them, like routing and packet filtering can be often done in software, and if this is the case, they are usually implemented in the kernel. Bridging and switching (which belong to the link layer) can be done in software too, but is rarely practical. However, many operating systems implement one or both to better support virtualisation. A few application-layer network services may be implemented in the kernel too, most notably network file systems, but sometimes also other protocols (e.g. kernel-level HTTP acceleration). │ System Libraries │ │ • the «socket» and related APIs │ • host «name resolution» (a DNS client) │ • «encryption» and data «authentication» (SSL, TLS) │ • «certificate» handling and validation Strictly speaking, the socket API is the domain of system libraries (though in most monolithic kernels, the C functions will map 1:1 to system calls; however, in microkernels, the networking stack is split differently and system libraries are likely to pick up a bigger share of the work). Since nearly all network-related programs need to be able to resolve hostnames (translate the human-readable name to an IP address), this service is usually provided by system libraries. Likewise, encryption is ubiquitous in the modern internet, and most operating systems provide an SSL/TLS stack, including certificate management. │ System Utilities & Services │ │ • network «configuration» (‹ifconfig›, ‹dhclient›, ‹dhcpd›) │ • route management (‹route›, ‹bgpd›) │ • «diagnostics» (‹ping›, ‹traceroute›) │ • packet logging and inspection (‹tcpdump›) │ • other network services (‹ntpd›, ‹sshd›, ‹inetd›) The last component of the network stack is located in system utilities and services (daemons). Those are concerned with configuration (including assigning addresses to interfaces and autoconfiguration, e.g. DHCP or SLAAC) and route management (especially important for software-based routers and multi-homed systems). A suite of diagnostic tools is also usually present, at very least the ‹ping› and ‹traceroute› programs which are useful for checking connectivity, perhaps tools like ‹tcpdump› which allow the operator to inspect packets arriving at an interface. │ Networking Aspects │ │ • packet format │ ◦ what are the «units of communication» │ • addressing │ ◦ how are the sender and recipient «named» │ • packet delivery │ ◦ how a message is «delivered» When looking at a network protocol, there are three main aspects to consider: the first is, what constitutes the unit of communication, i.e. how the packets look, what information they carry and so on. The second is addressing: how are target computers and/or programs designated. Finally, packet delivery is concerned with how messages are delivered from one address to another: this could involve routing and/or address translation (e.g. between link addresses and IP addresses). │ Protocol Nesting │ │ • protocols run «on top» of each other │ • this is why it is called a network «stack» │ • higher levels make use of the lower levels │ ◦ HTTP uses abstractions provided by TCP │ ◦ TCP uses abstractions provided by IP Since we are talking about a «protocol stack», it is important to understand how the individual layers of the stack interact with each other. Each of the above aspects cuts through the stack slightly differently – we will discuss each in a bit more detail in the following few slides. │ Packet Nesting │ │ • higher-level «packets» are just «data» to the lower level │ • an Ethernet «frame» can carry an «IP packet» in it │ • the «IP packet» can carry a «TCP packet» │ • the «TCP packet» can carry (a fragment of) an «HTTP request» When we consider packet structure, it is most natural to start with the bottom layers: the packets of the higher layers are simply data for the lower layer. The overall packet structure looks like a matryoshka: an ethernet frame is wrapped around an IP packet is wrapped around an UDP packet and so on. From the point of view of the upper layers, packet size is an important consideration: when packet-oriented protocols are nested in other packet-oriented protocols, it is useful if they can match their packet sizes (most protocols have a limit on packet size). With the size limitations in mind, in the view ‘from top’, a packet is handed down to the lower layer as data, the upper layer being oblivious to the additional framing (headers) that the lower layer adds. │ Stacked Delivery │ │ • delivery is, in the abstract, «point-to-point» │ ◦ routing is mostly «hidden» from upper layers │ ◦ the upper layer requests «delivery» to an «address» │ • lower-layer protocols are usually «packet-oriented» │ ◦ packet size mismatches can cause «fragmentation» │ • a packet can pass through «different» low-level «domains» When it comes to delivery, the relationships between layers are perhaps the most complicated. In this case, the view from top to bottom is the most appropriate, since lower layers provide delivery as a service to the upper layer. Since the delivery on the internet layer (OSI layers 3 and up) is usually much wider in scope than that of the link layer, it is quite common that a single IP packet will traverse a number of link-layer domains. │ Layers vs Addressing │ │ • not as straightforward as packet nesting │ ◦ address relationships are tricky │ • «special protocols» exist to translate addresses │ ◦ DNS for hostname vs IP address mapping │ ◦ ARP for IP vs MAC address mapping Finally, since (packet, data) delivery is a service provided by the lower layers to the upper layers, the upper layer must understand and provide correct lower-level addresses. The easiest way to look at this aspect is pairwise: the link layer and the internet layer obviously need to interact, usually through a special protocol which executes on the link layer, but logically belongs to the internet layer, since it deals with IP addresses. Situation between the internet and transport layers is much simpler: the address at the transport layer simply contains the internet layer address as a field (e.g. a TCP address is an IP address + a port number). Finally, the relationship between the application layer and the transport layer is analogous (but not entirely the same) to the internet/link situation. The application layer primarily uses host names to identify computers, and uses a special protocol, known as DNS, which operates using transport-layer addresses, but otherwise belongs to the application layer. │ ARP (Address Resolution Protocol) │ │ • finds the MAC that corresponds to an IP │ • required to allow «packet delivery» │ ◦ IP uses the «link layer» to deliver its packets │ ◦ the link layer must be given a «MAC address» │ • the OS builds a «map» of IP $→$ MAC «translations» The address resolution protocol, which straddles the link/internet boundary, enables the internet layer to deliver its packets using the services of the link layer. Of course, to request link-layer delivery of a packet, a link address is required, but the IP packet only contains an IP address. The ARP protocol is used to find link addresses of IP nodes which exist in the local network (this includes routers, which operate on the internet layer – in other words, packets destined to leave the local network are sent to a router, using the router's IP address, which is translated into a link-layer address using ARP as usual). │ Ethernet │ │ • «link-level» communication protocol │ • largely implemented «in hardware» │ • the OS uses a well-defined interface │ ◦ packet receive and submit │ ◦ using MAC addresses (ARP is part of the OS) Perhaps the most common link layer protocol is ethernet. Most of the protocol is implemented directly in hardware and the operating system simply uses an unified interface exposed by device drivers to send and receive ethernet frames. │ Packet Switching │ │ • «shared media» are inefficient due to «collisions» │ • ethernet is typically «packet switched» │ ◦ a «switch» is usually a «hardware device» │ ◦ but also in software (usually for virtualisation) │ ◦ physical connections form a «star topology» High-speed networks are almost exclusively «packet switched», that is, a node sends packets (frames) to a «switch», which has a number of physical ports and keeps track of which MAC addresses are reachable on which physical ports. When a frame arrives to a switch, the recipient MAC address is extracted, and the packet is forwarded to the physical port(s) which are associated to that MAC address. │ Bridging │ │ • bridges operate at the «link layer» (layer 2) │ • a bridge is a two-port device │ ◦ each port is connected to a «different LAN» │ ◦ the bridge joins the LANs by «forwarding» frames │ • can be done in hardware or software │ ◦ ‹brctl› on Linux, ‹ifconfig› on OpenBSD Bridges are analogous to switches, with one major difference: the expectation for a switch is that there are many physical ports, but each has only one MAC address attached to it (with perhaps the exception of a special ‘uplink’ port). A bridge, on the other hand, is optimized for the case of two physical ports, but each side will have many MAC addresses associated with it. │ Tunneling │ │ • tunnels are «virtual layer 2 or 3 devices» │ • they «encapsulate» traffic using a higher-level protocol │ • tunneling can implement «Virtual Private Networks» │ ◦ a «software bridge» can operate over an UDP tunnel │ ◦ the tunnel is usually «encrypted» Tunnelling is a technique which allows lower-layer traffic to be nested in the application layer of an existing network. The typical use case is to tie physically distant computers into a single broadcast (link layer) or routing (internet layer) domain. In this case, there are two instances of the network stack: the VPN software implements an application layer protocol running in the outer stack, while also acting as a link-layer interface (or an internet-layer subnet) that is bridged (routed) as if it was just another physical interface. │ PPP (Point-to-Point Protocol) │ │ • a «link-layer» protocol for «2-node networks» │ • available over many «physical connections» │ ◦ phone lines, cellular connections, DSL, Ethernet │ ◦ often used to connect endpoints to the ISP │ • supported by most operating systems │ ◦ split between the «kernel» and «system utilities» The point-to-point protocol is another somewhat important and ubiquitous example of a link-layer protocol and is usually found on connections between LANs, or between a LAN and a WAN. │ Wireless │ │ • WiFi is mostly like (slow, unreliable) Ethernet │ • needs «encryption» since anyone can listen │ • also «authentication» to prevent «rogue connections» │ ◦ PSK (pre-shared key), EAP / 802.11x │ • encryption needs «key management» Finally, WiFi is, from the point of view of the rest of the stack, essentially a slow, unreliable version of ethernet, though internally, the protocol is much more complicated. ## The TCP/IP Stack In this section, we will look at the TCP/IP stack proper, and we will also discuss DNS in a bit more detail. │ IP (Internet Protocol) │ │ • uses 4 byte (v4) or 16 byte (v6) addresses │ ◦ split into «network» and «host» parts │ • it is a packet-based protocol │ • is a «best-effort» protocol │ ◦ packets may get lost, reordered or corrupted IP is a low-overhead, packet-oriented protocol in wide use across the internet and most local area networks (whether they are attached to the internet or not). Quite importantly, its low-overhead nature means that it does not guarantee delivery, nor the integrity of the data it transports. │ IP Networks │ │ • IP networks roughly correspond to LANs │ ◦ hosts on the «same network» are located with ARP │ ◦ «remote» networks are reached via «routers» │ • a «netmask» splits the address into network/host parts │ • IP typically runs on top of Ethernet or PPP Within a single IP network, delivery is handled by the link layer – the local network being identified by a common address prefix (the length of this prefix is part of the network configuration, and is known as the netmask). │ Routing │ │ • routers «forward» packets «between networks» │ • somewhat like «bridges» but «layer 3» │ • routers act as normal «LAN endpoints» │ ◦ but represent entire remote IP networks │ ◦ or even the entire internet Packets for recipients outside the local network (i.e. those which do not share the network part of the address with the local host) are «routed»: a layer 3 device, analogous to a layer 2 switch, forwards the packet to one of its interfaces (into another link-layer domain). The «routing tables» are, however, much more complex than the information maintained by a switch, and their maintenance across the internet is outside the scope of this subject. │ ICMP: Internet Control Message Protocol │ │ • «control» messages (packets) │ ◦ destination host/network unreachable │ ◦ time to live exceeded │ ◦ fragmentation required │ • «diagnostic» packets, e.g. the ‹ping› command │ ◦ ‹echo request› and ‹echo reply› │ ◦ combine with TTL for ‹traceroute› ICMP is the ‘service protocol’ used for diagnostics, error reporting and network management. The role of ICMP was substantially extended with the introduction of IPv6 (e.g. to include automatic network configuration, via router advertisements and router solicitation packet types). ICMP does not directly provide any services to the application layer. │ Services and TCP/UDP Port Numbers │ │ • networks are generally used to «provide services» │ ◦ each computer can host multiple │ • different «services» can run on different «ports» │ • port is a 16-bit number and some are given names │ ◦ port 25 is SMTP, port 80 is HTTP, ... As we have briefly mentioned earlier, transport-layer addresses have two components: the IP address of the destination computer and a «port number», which designates a particular service or application running on the destination node. │ TCP: Transmission Control Protocol │ │ • a «stream»-oriented protocol on top of IP │ • works like a «pipe» (transfers a byte sequence) │ ◦ must respect «delivery order» │ ◦ and also «re-transmit» lost packets │ • must establish «connections» The two main transport protocols in the TCP/IP protocol family are TCP and UDP, with the former being more common and also considerably more complicated. Since TCP is stream-oriented and reliable, it needs to implement the logic to slice a byte stream into individual packets (for delivery using IP, which is packet-oriented), consistency checks (packet checksums) and retransmission logic (in case IP packets carrying TCP data are lost). │ TCP Connections │ │ • the endpoints must establish a «connection» first │ • each connection serves as a separate «data stream» │ • a connection is «bidirectional» │ • TCP uses a 3-way handshake: SYN, SYN/ACK, ACK To provide stream semantics to the user, TCP must implement a mechanism which creates the illusion of a byte stream on top of a packet-based foundation. This mechanism is known as a «connection», and essentially consists of some state shared by the two endpoints. To establish this shared state, TCP uses a 3-way handshake. │ Sequence Numbers │ │ • TCP packets carry «sequence numbers» │ • these numbers are used to «re-assemble» the stream │ ◦ IP packets can arrive «out of order» │ • they are also used to «acknowledge reception» │ ◦ and subsequently to manage re-transmission Sequence numbers are part of the connection state, and allow the byte stream to be reassembled in the correct order, even if IP packets carrying the stream get reordered during delivery. │ Packet Loss and Re-transmission │ │ • packets can get «lost» for a variety of reasons │ ◦ a «link goes down» for an extended period of time │ ◦ «buffer overruns» on routing equipment │ • TCP sends «acknowledgments» for received packets │ ◦ the ACKs use «sequence numbers» to identify packets Besides packet reordering, TCP also needs to deal with «packet loss»: an event where an IP packet is sent, but vanishes without trace en-route to its destination. A lost packet is detected as a gap in sequence numbers. However, it is the «sender» which must learn about a lost packet, so that it can be retransmitted: for this reason, the recipient of the packet must «acknowledge» its receipt, by sending a packet back (or more often, by piggybacking the acknowledgement on a data packet that it would send anyway), carrying the sequence numbers of packets that have been received. If an acknowledgement is not received within certain time (dynamically adjusted) from the sending of the original packet, the packet is sent again (retransmitted). │ UDP: User (Unreliable) Datagram Protocol │ │ • TCP comes with non-trivial «overhead» │ ◦ and its guarantees are «not always required» │ • UDP is a much «simpler» protocol │ ◦ a very thin wrapper around IP │ ◦ with «minimal overhead» on top of IP Not all applications need the comparatively strong guarantees that TCP provides, or conversely, cannot tolerate the additional latency introduced by the algorithms that TCP employs to ensure reliable, in-order delivery. For those cases, UDP presents a very light-weight layer on top of IP, essentially only adding the port number to the addresses, and a 16-bit checksum to the packet header (which is, in its entirety, only 64 bits long). │ Firewalls │ │ • the «name» comes from building construction │ ◦ a fire-proof barrier between parts of a building │ • the idea is to «separate networks» from each other │ ◦ making attacks harder from the outside │ ◦ «limiting damage» in case of compromise Firewall is a device which separates two networks from each other, typically by acting as the (only) router between them, but also examining the packets and dropping or rejecting them if they appear malicious, or attempt to use services that are not supposed to be visible externally. Often, one of these networks is the internet. Sometimes, the other network is just a single computer. │ Packet Filtering │ │ • packet filtering is an «implementation» of a «firewall» │ • can be done on a «router» or at an «endpoint» │ • «dedicated» routers + packet filters are «more secure» │ ◦ a «single» such «firewall» protects the «entire network» │ ◦ less opportunity for mis-configuration Like with other services, it usually pays off to centralize (within a single network) the responsibility for packet filtering, reducing the administrative burden and the space for misconfigured nodes to endanger the entire network. Of course, it is reasonable to run local firewalls on each node, as a second line of defence. │ Packet Filter Operation │ │ • packet filters operate on a set of «rules» │ ◦ the rules are generally «operator»-provided │ • each incoming packet is «classified» using the rules │ • and then «dispatched» accordingly │ ◦ may be «forwarded», dropped, «rejected» or edited A packet filter is, essentially, a finite state machine (perhaps with a bit of memory for connection tracking, in which case it is a «stateful» packet filter) which examines each packet and decides what action to take on it. The specific classification rules are usually provided by the network administrator; in simple cases, they match on source and destination IP addresses and port numbers, and on the connection status (which is remembered by the packet filter), for TCP packets. After they are classified, the packets can be forwarded to their destination (as a standard router would), quietly dropped, rejected (sending an ICMP notification to the sender) or adjusted before being sent along (most commonly for network address translation, or NAT, the details of which are out of scope of this subject). │ Packet Filter Examples │ │ • packet filters are often part of the «kernel» │ • the rule parser is a system utility │ ◦ it loads rules from a «configuration file» │ ◦ and sets up the kernel-side filter │ • there are multiple «implementations» │ ◦ ‹iptables›, ‹nftables› in Linux │ ◦ ‹pf› in OpenBSD, ‹ipfw› in FreeBSD There are usually two components to a packet filter: one is a system utility which reads a human-readable description of the rules, and based on those, compiles an efficient matcher for use in the kernel component which does the actual classification. │ Name Resolution │ │ • users do not want to remember «numeric addresses» │ ◦ phone numbers are bad enough │ • host «names» are used instead │ • can be stored in a file, e.g. ‹/etc/hosts› │ ◦ not very practical for more than 3 computers │ ◦ but there are millions of computers on the internet In the last part of this section, let's have a look at hostname resolution and the DNS protocol. What we need is a directory (a yellow pages sort of thing), but one that can be efficiently updated (many updates are done every hour) and also efficiently queried by computers on the network. The system must be scalable enough to handle many millions of names. │ DNS: Domain Name System │ │ • hierarchical «protocol» for name resolution │ ◦ runs on top of TCP or UDP │ • domain «names are split» into parts using dots │ ◦ each domain knows whom to ask for the next bit │ ◦ the name database is effectively «distributed» Essentially, at the internet scale, we need some sort of a distributed system (i.e. a distributed database). Unlike relational databases though, delays in update propagation are acceptable, making the design simpler. The name space of host names is organized hierarchically, and the structure of DNS follows this organisation: going from right to left, starting with the top-level domain (a single dot, often left out), one of the DNS servers for that domain is consulted about the name immediately to the left, usually resulting in the address of another DNS server which can get us more information. The process is repeated until the entire name is resolved, usually resulting in an IP address of the host. │ DNS Recursion │ │ • take ‹www.fi.muni.cz.› as an example domain │ • resolution starts from the right at «root servers» │ ◦ the root servers refer us to the ‹cz.› servers │ ◦ the ‹cz.› servers refer us to ‹muni.cz› │ ◦ finally ‹muni.cz.› tells us about ‹fi.muni.cz› The process described above is called «recursion» and is usually performed by a special type of DNS server, which performs the recursion on behalf of its clients and caches the results for subsequent queries. This also means that it can, most of the time, start from the middle, since the name servers of the one or two topmost domains are most likely in the cache. │ DNS Recursion Example │ │ $ dig www.fi.muni.cz. A +trace │ . IN NS j.root-servers.net. │ cz. IN NS b.ns.nic.cz. │ muni.cz. IN NS ns.muni.cz. │ fi.muni.cz. IN NS aisa.fi.muni.cz. │ www.fi.muni.cz. IN A 147.251.48.1 To observe recursion in practice (and perform other diagnostics on DNS), we can use the ‹dig› tool, which is part of the ISC (Internet Software Consortium) suite of DNS-related tools. │ DNS Record Types │ │ • ‹A› is for (IP) Address │ • ‹AAAA› is for an IPv6 Address │ • ‹CNAME› is for an alias │ • ‹MX› is for mail servers │ • and many more Besides ‹NS› records, which tell the system whom to ask for further information, there are many types of DNS records, each carrying different type of information about the name in question. Besides IPv4 and IPv6 addresses, there are free-form TXT records (which are used, for instance, by spam filtering systems to learn about authorized mail servers for a domain), SRV records for service discovery in local networks, and so on. ## Using Networks In this section, we will briefly look at the socket API which allows applications to use and provide network services (on POSIX operating systems, that is) and at a couple examples of application-level network services. │ Sockets Reminder │ │ • the «socket API» comes from early BSD Unix │ • socket represents a (possible) «network connection» │ • you get a «file descriptor» for an open socket │ • you can ‹read()› and ‹write()› to sockets │ ◦ but also ‹sendmsg()› and ‹recvmsg()› │ ◦ and ‹sendto()› and ‹recvfrom()› Remember that socket is a file-like object, accessible through a «file descriptor». On connected stream sockets, programs can use the usual ‹read› and ‹write› system calls, with semantics akin to pipes. While these are also possible on datagram sockets, a different API is often preferred, one of the reasons being that with ‹read›, it is impossible to distinguish datagrams coming from different sources. The system calls ‹sendto›, ‹recvfrom› allow the program to specify (or learn, in case of ‹recvfrom›) the address of the recipient (sender) of the packet. │ Socket Types │ │ • sockets can be «internet» or «unix domain» │ ◦ internet sockets work across networks │ • «stream» sockets are like files │ ◦ you can write a continuous «stream» of data │ ◦ usually implemented using TCP │ • «datagram» sockets send individual «messages» │ ◦ usually implemented using UDP Communication on IP networks is done using «internet sockets» (with ‹domain› set to ‹AF_INET› or ‹AF_INET6›). If the socket is a «stream socket» (its ‹type› is ‹SOCK_STREAM›) the communication is executed using TCP (stream-type sockets must be explicitly «connected» by a call to the ‹connect› or ‹accept› system call, which in case of internet sockets perform the TCP handshake). Datagram sockets (‹type› set to ‹SOCK_DGRAM›) may be optionally ‘connected’, though this only sets up a default destination for datagrams to be sent to. Communication is performed using UDP. │ Creating Sockets │ │ • a socket is created using the ‹socket()› function │ • it can be turned into a «server» using ‹listen()› │ ◦ individual «connections» are established with ‹accept()› │ • or into a «client» using ‹connect()› All types of sockets are created using the ‹socket› system call, and specialize into server and client sockets based on the subsequent API calls performed on them. A server socket is obtained through ‹listen› and ‹bind›, while a client socket is obtained using ‹connect›. The server then repeatedly calls ‹accept› which returns a «new file descriptor» which then represents the TCP connection. │ Resolver API │ │ • ‹libc› contains a «resolver» │ ◦ available as ‹gethostbyname› (and ‹getaddrinfo›) │ ◦ also ‹gethostbyaddr› for «reverse lookups» │ • can look in many different places │ ◦ most systems support at least ‹/etc/hosts› │ ◦ and DNS-based lookups The socket API only deals with numeric IP addresses. If an application needs to be able to connect to computers using their host names, it needs to use the «resolver API» which, behind the scenes, uses the appropriate database or protocol to find the corresponding IP addresses. The exact sequence of steps depends on system configuration, but usually the resolver consults the ‹/etc/hosts› file and a recursive DNS server (the IP address of which is again part of system configuration). │ Network Services │ │ • servers «listen» on a socket for incoming connections │ ◦ a client actively establishes a «connection» to a server │ • the network simply «transfers data» between them │ • interpretation of the data is a «layer 7» issue │ ◦ could be «commands», file transfers, ... Most network services operate in a client-server regime, on top of TCP: a server passively awaits connections on a particular transport-layer address (i.e. an IP address coupled with a port number). The client, on the other hand, actively connects to a listening server, establishing a bidirectional channel (the TCP connection) between them. From that point on, the network stack simply transfers data across that channel. The data usually conforms to some application-level protocol (SMTP, HTTP, ...) though it does not need to be standardized or well-known. │ Network Service Examples │ │ • (secure) remote shell – ‹sshd› │ • the internet «email suite» │ ◦ MTA = Mail Transfer Agent, speaks SMTP │ ◦ SMTP = Simple Mail-Transfer Protocol │ • the «world wide web» │ ◦ web servers provide content (files) │ ◦ clients and servers speak HTTP and HTTPS │ Client Software │ │ • the ‹ssh› command uses the SSH protocol │ ◦ a very useful system utility on virtually all UNIXes │ • «web browser» is the client for world wide web │ ◦ browsers are complex «application» programs │ ◦ some of them bigger than even operating systems │ • «email client» is also known as a MUA (Mail User Agent) ## Network File Systems We have learned earlier that file systems are an important, ubiquitous abstraction. It is only natural to allow a file system to be accessed remotely (from another computer) using the API that is used for local access, making the ‘network’ part almost entirely transparent to the program. │ Why Network Filesystems? │ │ • copying files back and forth is impractical │ ◦ and also «error-prone» (which is the latest version?) │ • how about storing data in a «central location» │ • and «sharing» it with all the computers on the LAN Perhaps the most compelling case for network file systems arises from the need to make workstations (desktop computers) at an institution fungible: that is, allow any user to log in onto any of the available workstations and immediately have all their data and settings at hand. │ NAS (Network-Attached Storage) │ │ • a (small) «computer» dedicated to «storing files» │ • usually running a cut down operating system │ ◦ often based on Linux or FreeBSD │ • provides «file access» to the network │ • sometimes additional «app-level services» │ ◦ e.g. photo management, media streaming, ... Another use case comes from the desire to store data which is shared by multiple users on a central device, where it is easy to back up and accessible from all computers (and hence by all users, even when some of the other computers are powered down). │ NFS (Network File System) │ │ • the traditional UNIX «networked filesystem» │ • hooked quite deep into the kernel │ ◦ assumes generally reliable network (LAN) │ • filesystems are «exported» for use over NFS │ • the client side «mounts» the NFS-exported volume NFS is one of the first implementations of a network file system. It is based, essentially, on hooking up the VFS interface and exporting it over a remote procedure call interface to other kernels on the network. To create an NFS share, the local file system must be «exported» on the would-be NFS server; afterwards, it can be «mounted» by clients, making the share part of their local file system hierarchy. │ NFS History │ │ • originated in «Sun Microsystems» in the 80s │ • v2 implemented in System V, DOS, ... │ • v3 appeared in '95 and is «still in use» │ • v4 arrives in 2000, improving «security» Network file system is a rather old technology (nearly 40 years old), but it has seen significant evolution over the first 20 or so years, with version 4 mainly addressing security concerns. │ VFS Reminder │ │ • «implementation mechanism» for multiple FS types │ • an object-oriented approach │ ◦ ‹open›: «look up» the file for access │ ◦ ‹read›, ‹write› – self-explanatory │ ◦ ‹rename›: rename a file or directory Recall, from lecture 4, that VFS (virtual file system switch) is a mechanism inside the kernel that allows multiple file system implementations to present a unified interface to the rest of the kernel. NFS takes advantage of this existing interface and makes it available over the network. Of course, unlike VFS itself, the semantics of the NFS functions is standardized across implementations (NFS clients and servers are mostly compatible across different UNIX-like operating systems). │ RPC (Remote Procedure Call) │ │ • any «protocol» for «calling functions» on «remote hosts» │ ◦ ONC-RPC = Open Network Computing RPC │ ◦ NFS is based on ONC-RPC (also known as Sun RPC) │ • NFS basically runs VFS operations using RPC │ ◦ «easy to implement» on UNIX-like systems The way the NFS interface is exposed to the network is via a remote procedure call mechanism, which essentially takes a procedure call (the name of the function, along with the arguments that it should be called with), packs them into a byte string and sends it over the network to another computer, which then actually performs the call and sends the result back. The protocol has a mechanism to send data buffers, in addition to primitive values (integers). │ Port Mapper │ │ • ONC-RPC is executed over TCP or UDP │ ◦ but it is more «dynamic» wrt. available services │ • TCP/UDP «port numbers» are assigned «on demand» │ • ‹portmap› «translates» from RPC services to port numbers │ ◦ the port mapper itself listens on port 111 In modern systems, ONC-RPC is implemented exclusively on top of the TCP/IP stack. Since the protocol can expose multiple services on each machine, the need arises to translate between those RPC services and TCP/UDP port numbers. In most cases, an RPC service called ‘portmapper’ takes care of this need, itself running on a fixed port (number 111). │ The NFS Daemon │ │ • also known as ‹nfsd› │ • provides NFS access to a «local file system» │ • can run as a system service │ • or it can be part of the kernel │ ◦ this is more typical for «performance» reasons Given an RPC stack, NFS is provided by an ‹nfsd›, which registers itself as a service with the RPC stack. The daemon can be a proper, user-space daemon, but it can also be part of the kernel (running as a kernel thread). │ SMB (Server Message Block) │ │ • a «network file system» from Microsoft │ • available in Windows since version 3.1 (1992) │ ◦ originally ran on top of NetBIOS │ ◦ later versions used TCP/IP │ • SMB1 accumulated a lot of cruft and «complexity» SMB is a completely different implementation of a network transparency layer for file systems. Like NFS, it is not tied to a particular on-disk format. SMB saw many incremental changes with each new Microsoft operating system that came along, while at the same time it was kept backward compatible, so that older operating systems could interoperate, both as clients and as servers. This made the protocol extremely complicated, making further extensions impractical. │ SMB 2.0 │ │ • «simpler» than SMB1 due to «fewer retrofits» and compat │ • better «performance» and «security» │ • support for «symbolic links» │ • available since Windows Vista (2006) Microsoft designed a new protocol for networked filesystems in their Windows Vista operating system, under the name SMB 2.0. Like NFSv4 a few years earlier, SMB 2 addressed many of the security weaknesses of its predecessor, while also improving performance and extending the protocol to support new file system features, such as symlinks. │ Review Questions │ │ 29. What is ARP (Address Resolution Protocol)? │ 30. What is IP (Internet Protocol)? │ 31. What is TCP (Transmission Control Protocol)? │ 32. What is DNS (Domain Name Service)?