6. Peer-to-peer (P2P) networks I. PA159: Net-Centric Computing I. Eva Hladká Faculty of Informatics Masaryk University Autumn 2010 Lecture Overview I l Client-Server vs. Peer-to-Peer o Client-Server Systems o P2P Systems Comparison Generic P2P Architecture o Overlays and Peer Discovery Service/Resource Discovery 3) Taxonomy of P2P Systems Centralized P2P Systems Decentralized P2P Systems Hybrid P2P Systems 2 6. P2P networks I. Autumn 2010 2 / 46 Client-Server vs. Peer-to-Peer Lecture Overview I 1 Client-Server vs. Peer-to-Peer o Client-Server Systems o P2P Systems Comparison 2^ Generic P2P Architecture • Overlays and Peer Discovery Service/Resource Discovery 3^ Taxonomy of P2P Systems Centralized P2P Systems Decentralized P2P Systems Hybrid P2P Systems 6. P2P networks I. Autumn 2010 3 / 46 Client-Server vs. Peer-to-Peer Distributed Applications I. o a distributed application consists of multiple software modules located on different computers o the modules interact with each other over a communication network connecting the different computers □ the communication network is used for synchronisation and communication between the modules o it is possible that multiple users may use the application concurrently on different computers o to build a distributed application, it is necessary to decide: o how to place those software modules on the different computers in the network how each software module discovers the other modules it needs to communicate with 6. P2P networks I. Autumn 2010 4 / 46 Client-Server vs. Peer-to-Peer Distributed Applications II. o two basic approaches: o Client-Server architecture o Peer-to-Peer (P2P) architecture o hybrids are possible and indeed useful 6. P2P networks I. Autumn 2010 5 / 46 Client-Server vs. Peer-to-Peer Client-Server Systems Client-Server Architecture I. A client-server system comprises of two types of software modules: o server module o one centralized instance o but might be internally replicated for scaling purposes o passively listens for connections from clients o multiple client requests may be handled: sequentially concurrently (multithreaded servers) o by several replicated servers at different locations • pending clients' requests may be queued up o servers are assumed to be reliable, often running in a data centre (dedicated/virtualized hardware) 6. P2P networks I. Autumn 2010 6 / 46 Client-Server vs. Peer-to-Peer Client-Server Systems Client-Server Architecture II. o client module o multiple distributed instances, possibly controlled by different users o actively initiates a connection to a server no direct communication between clients clients need to know the network address and port number of a server o service discovery is typically performed through client configuration o clients may be unreliable without affecting overall system stability o examples of client-server systems: web server/web browsers web server/client applications (web services) o SSH/Telnet/FTP server/clients o NFS/SMB server/clients o . . . 6. P2P networks I. Autumn 2010 7 / 46 Client-Server vs. Peer-to-Peer P2P Systems P2P Architecture o a P2P system consists of many identical software modules (peers) running on different computers peers communicate directly with each other o each peer is a server as well as a client: provides services to other peers o requests services from other peers o unlike dedicated servers, peers tend to be unreliable o service discovery is more complicated since there are many servers continuously appearing and disappearing at different network locations o provide natural scalability due to multiple servers can work without allocating dedicated server machinery 6. P2P networks I. Autumn 2010 8 / 46 Peer-to-Peer Systems Definition Peer-to-peer (P2P) systems are distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority. Eva Hladká (Fl MU) 6. P2P networks I Autumn 2010 10 / 46 Client-Server vs. Peer-to-Peer P2P Systems P2P Properties o Symmetric role each participating node typically acts both as a server and as a client o however, in many designs this property is relaxed by the use of special peer roles ("super peers" or "relay peers") o Scalability q P2P systems can scale to thousands of nodes • the P2P protocols cannot require "all-to-all" communication or coordination o Heterogeneity a P2P system is (usually) heterogeneous in terms of the hardware capacity of the nodes o Distributed control (Decentralization) ideally, no centralized structures should exist in P2P systems Dynamism the topology of P2P systems may change very fast due to joining of new nodes or leaving existing ones Resource sharing each peer contributes system resources (computing power, data, bandwidth, presence, etc.) to the operation of the P2P system Self-organization o the organization of the P2P system increases over time using local knowledge Eva Hladka (FIMU) 6. P2P networks I. Autumn 2010 11/46 Client-Server vs. Peer-to-Peer P2P Systems P2P Applications Figure: P2P Applications. 6. P2P networks I. Autumn 2010 12 / 46 Client-Server vs. Peer-to-Peer Comparison Client-Server vs. Peer-to-peer Comparison I. The systems can be compared from several points of view: o Ease of development o C-S is more established and familiar than P2P o C-S exhibits simple interaction patterns for clients and server, while P2P involves more complex interaction patterns between peers o Manageability o it is easier to maintain a centralized server in a C-S environment than keeping a track of and maintaining several distributed peers in a P2P system o Scalability o C-S scalability is limited by fixed server hardware, though scaling can be achieved through load balancing over multiple servers at increased cost P2P is scalable by nature, since as the number of peers grows, so does the "server" capacity 6. P2P networks I. Autumn 2010 13 / 46 Client-Server vs. Peer-to-Peer Comparison Client-Server vs. Peer-to-peer Comparison II. o Security o responsibility for the C-S security lies within the server, which is centrally hosted in a secure environment o responsibility for P2P security is distributed across peers in different administrative domains, some of which might be compromised o Reliability • the C-S's reliability is achieved through the use of multiple redundant servers (possibly hosted at different locations) with automatic fail-over, at additional cost with P2P, resilience comes free of charge, since multiple peers are usually able to provide the same service in the case that some peers fail 6. P2P networks I. Autumn 2010 14 / 46 Generic P2P Architecture Lecture Overview I Client-Server vs. Peer-to-Peer • Client-Server Systems • P2P Systems Comparison Generic P2P Architecture o Overlays and Peer Discovery Service/Resource Discovery 2 Taxonomy of P2P Systems • Centralized P2P Systems • Decentralized P2P Systems Hybrid P2P Systems 6. P2P networks I. Autumn 2010 15 / 46 Generic P2P Architecture P2P Architecture Application Layer Middleware Layer Base Overlay Layer Underlying Network o libraries exist that provide reusable P2P functionality (e.g. JXTA) o some applications integrate all of the above (e.g., Gnutella, Bittorrent, etc.) 6. P2P networks I. Autumn 2010 16 / 46 Generic P2P Architecture P2P Architecture Base Overlay Layer I. o the base overlay layer is responsible for: o discovering new peers o maintaining the P2P overlay (virtual) network □ forwarding messages between peers • the overlay network is a virtual network laid over the "physical" network (e.g. TCP/IP) • overlay network "wires" are implemented using underlying network facilities (e.g. TCP connections or UDP messages) o overlay network distance is measured in the number of hops from peer to peer o peers, that are distant in the physical network may be neighbours in the overlay network, and vice-versa o the performance of the P2P system is influenced by the structure of the overlay network 6. P2P networks I. Autumn 2010 17 / 46 Generic P2P Architecture Generic P2P Architecture Generic P2P Architecture P2P Architecture Base Overlay Layer II. Generic P2P Architecture P2P Architecture Base Overlay Layer II. Figure: Overlay vs. Underlying Network. Eva Hladká (FI MU) 6. P2P networks I. Autumn 2010 21 / 46 Generic P2P Architecture P2P Architecture Middleware Layer o the middleware layer facilitates P2P application development by hiding overlay and service discovery issues o it provides access to the services/resources provided by peers, and may be responsible for functions such as: o security: controlling access to services/ resources o service/resource discovery: searching and indexing services/resources distributed across peers o peer groups: coordinating peers that provide or consume a particular service/resource o may provide fault tolerance and persistent state o e.g., JXTA (Java P2P platform), Windows P2P Networking, P2P.NET, etc. Eva Hladká (FI MU) 6. P2P networks I. Autumn 2010 22 / 46 Generic P2P Architecture P2P Architecture Application Layer o the middleware services can be used to build complete applications: o file sharing - e.g., Napster, Gnutella, Kazaa, . .. o routing protocols o instant messaging, videoconferencing applications - e.g., Skype o distributed file systems o distributed backup systems o distributed computing - e.g., grid computing, SETI@Home, ... and many many more. . . 6. P2P networks I. Autumn 2010 23 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery • a P2P network is typically a "virtual" network overlaid on an existing network (e.g. the Internet) o the overlay is used for indexing and peer discovery and make the P2P system independent from the physical network topology content is typically exchanged directly over the underlying IP network o a new peer needs to discover at least one existing peer in order to join a P2P network o network location information: IP address, listening port number, etc. if no peers are found immediately, the new peer either passively waits for new participants, or proactively looks for potential new participants it is hard to locate existing peers in a large network such as the Internet 6. P2P networks I. Autumn 2010 24 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery Initial Peer Discovery I. Static configuration: o each peer is preconfigured with a list of the network locations (IP address and port number) of every other peer in the system o on startup (and possibly periodically) each peer attempts to connect to some other peers in its list, some of which may be running o due to the manual configuration, this is only suitable for P2P networks with a small number of peers which do not change frequently can alternatively be used to initially contact a small number of "well-known" peers that are guaranteed to be online 6. P2P networks I. Autumn 2010 25 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery Initial Peer Discovery II. Centralized directory: o each peer is preconfigured with the network location of a centralized server o each peer contacts the server on startup (and possibly periodically) to: o obtain an updated list of currently active peers indicate to the server that it is active o most subsequent communications bypass the server, using the P2P overlay network to route messages instead o occasionally, other services are also provided by the server (e.g. a list of files hosted by each peer) o peers may go offline • cleanly, the peer's shutdown procedure contacts the server to remove it from the active peer list • without warning (crash, network or power failure), making the server's active peer list obsolete (it's necessary to use active peer list item expiry and periodic liveness checks) usually, a peer only needs to connect to a few peers on the overlay network o the other members can be discovered by the member propagation techniques centralized directory server is a single point of failure Eva Hladká (FIMU) 6. P2P networks I. Autumn 2010 26 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery Initial Peer Discovery III. Member Propagation Techniques with Initial Member Discovery: o in general, it is not necessary to discover all of the participating members in the network o in many cases, discovering a subset of the participating members is adequate o after discovering just one existing peer, information about the rest of the P2P network can be obtained from it o if each peer maintains a full member list — easy for any new peer to obtain a full member list from any other peer o alternatively each peer can maintain a partial member list, replacing offline peers with new ones from neighbouring peers' lists 6. P2P networks I. Autumn 2010 27 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery - Overlay Network Topology o intermediate peers in the overlay network forward messages between indirectly connected peers o the overlay topology significantly affects P2P system performance o two key properties determine the effectiveness of the overlay mesh: o Diameter: longest distance between any two peers (overlay hops or latency) □ should be minimized o Average Degree: average number of links per peer (high AD increases message load, but improves fault tolerance) o should be kept at a moderate level it is necessary to avoid linear formations and splits in the mesh common topologies: o Random Mesh o Tiered o Ordered Lattice 6. P2P networks I. Autumn 2010 28 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery - Overlay Network Topology Random Mesh o each peer discovers a number of other peers and attempts to connect to them indiscriminately this (hopefully) results in a random structure with uniform degree distant peers on underlying network could be overlay neighbours o solution: connect to peers with lowest latency random mesh is suitable for linking a large number of peers with uniform resources and connectivity o search message flooding can easily be used to discover resources/services on other peers o but generates a lot of traffic 6. P2P networks I. Autumn 2010 29 / 46 Generic P2P Architecture Overlays and Peer Discovery 6. P2P networks I. Autumn 2010 30 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery - Overlay Network Topology Tiered Structure o peers are ordered into tiers of a tree depending on their advertised resources and connectivity (e.g. Kazaa's nodes and supernodes, 2-tier) o tier 0 is the foundation tier containing (possibly well-known) reliable peers with adequate resources and message forwarding capacity o at each tier, every peer is linked to a number of peers of a lower tier and forwards messages up and down • poorly-resourced leaf peers only link to their 'super-peer' and do not forward other peers' messages; they are omitted from peer discovery o the system needs to recover from peers leaving abruptly and disrupting the tree structure 9 the hierarchy may be optimized to follow the underlying network's structure (e.g. P2P video streaming) 6. P2P networks I. Autumn 2010 31 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery - Overlay Network Topology Tiered Structure Autumn 2010 32 / 46 Generic P2P Architecture Overlays and Peer Discovery Overlays and Peer Discovery - Overlay Network Topology Ordered Lattice 9 in a two dimensional lattice, peers organize themselves in a rectangular grid: o each node maintains direct connections to 4 neighboring peers (except edge peers) o peers on opposite edges can also link to form a torus can be extended to n dimensions messages are routed parallel to the lattice axes o peer additions and deletions must be handled on the fly, possibly distorting the structure o insertions and deletions of nodes imply that different rows/columns have different numbers of members between themselves peer coordinates in a multi-dimensional lattice may be used as a key to locate resources in content addressable networks (CAN) o sometimes also denoted as Distributed Hash Table (DHT) 6. P2P networks I. Autumn 2010 33 / 46 Generic P2P Architecture Overlays and Peer Discovery Eva Hladka (FI MU) 6. P2P networks I. Autumn 2010 34 / 46 Generic P2P Architecture Service/Resource Discovery Service/Resource Discovery a peer must advertise its services to enable their discovery and subsequent use by other peers • e.g., in file sharing applications, the "service" is a shared file/block service discovery is itself a service o centralized - a server is asked for service location □ Napster, UDDI for web services o pure P2P - a request is flooded or hashed through the peers o flooding, overlay multicast, CAN/DHT o when a search message reaches a matching advertisement on a peer, the server's location is returned to the originator o actual service messages are either routed through the overlay or directly via underlying network by the application o can be optimized by caching advertisements/data (e.g. file/block) along search/return path on the overlay 6. P2P networks I. Autumn 2010 35 / 46 Taxonomy of P2P Systems Lecture Overview I Client-Server vs. Peer-to-Peer • Client-Server Systems • P2P Systems Comparison 2^ Generic P2P Architecture • Overlays and Peer Discovery Service/Resource Discovery 3^ Taxonomy of P2P Systems Centralized P2P Systems Decentralized P2P Systems Hybrid P2P Systems 6. P2P networks I. Autumn 2010 36 / 46 Taxonomy of P2P Systems Taxonomy of P2P Systems I. Generally, P2P systems can be divided into two main categories: o centralized - one or more central servers are available providing various services decentralized - no central servers are employed they have to consider two main design issues: o the structure - flat (single tier) vs. hierarchical (multitier) o the overlay topology - unstructured vs. structured o besides these two, hybrid P2P systems also exist o they combine both centralized and decentralized approach to leverage the advantages of both architectures 6. P2P networks I. Autumn 2010 37 / 46 Taxonomy of P2P Systems Taxonomy of P2P Systems II. Figure: A taxonomy of P2P systems. 6. P2P networks I. Autumn 2010 38 / 46 Taxonomy of P2P Systems Centralized P2P Systems Taxonomy of P2P Systems III. Centralized P2P Systems I. Centralized P2P Systems o combine the features of centralized (client-server) and decentralized systems o like a centralized system, there are one or more central servers, which help peers to locate their desired resources or act as task scheduler to coordinate actions among them a peer sends messages to the central server to determine the addresses of peers that contain the desired resources like a decentralized system, once a peer has its information/data, it can communicate directly with other peers o i.e., without going through the server anymore o drawbacks: susceptible to malicious attacks and single point of failure a bottleneck for a large number of peers (performance degradation) lacks scalability and robustness o examples: o scientific computation — SETI@home, BOINC, Folding@home, GenomeOhome o digital content sharing — Napster, Openext • others — Jabber (IM), Net-Z and StarCraft (entertainment), etc. Eva Hladka (FIMU) 6. P2P networks I. Autumn 2010 39 / 46 Taxonomy of P2P Systems Centralized P2P Systems Taxonomy of P2P Systems III. Centralized P2P Systems II. Peer A Figure: Centralized P2P Systems: Peer A submits a request to the central server to acquire a list of nodes that satisfy the request. Once it obtains the list (which contains Peers B and C), it communicates directly with them. 6. P2P networks I. Autumn 2010 40 / 46 Taxonomy of P2P Systems Decentralized P2P Systems Taxonomy of P2P Systems III. Decentralized P2P Systems I. Decentralized (Pure) P2P Systems o peers have equal rights and responsibilities o each peer has only a partial view of the P2P network and offers data/services that may be relevant to only some queries/peers o == locating peers offering services/data quickly is a critical and challenging issue o advantages: o immune to single point of failure o (usually) provide high performance, scalability, robustness, and other desirable features o examples: Gnutella, Crescendo, PAST, FreeNet, Canon, etc. 6. P2P networks I. Autumn 2010 41 / 46 Taxonomy of P2P Systems Decentralized P2P Systems Taxonomy of P2P Systems III. Decentralized P2P Systems II. Two dimensions in the design of decentralized P2P systems: o flat (single-tier) vs. hierarchical (multi-tier) network structure o flat structure — the functionality and load are uniformly distributed among the participating nodes o hierarchical structure — multiple layers of routing structures o example: national level (interconnecting states), states level (interconnecting universities), universities level (interconnecting departments), etc. o offers certain advantages (fault isolation and security, effective caching and bandwidth utilization, hierarchical storage, etc.) 6. P2P networks I. Autumn 2010 42 / 46 Taxonomy of P2P Systems Decentralized P2P Systems Taxonomy of P2P Systems III. Decentralized P2P Systems II. o structured vs. unstructured logical topology □ unstructured P2P system — each peer is responsible for its own data, and keeps track of a set of neighbors that it may forward queries to o no strict mapping between the identifiers of objects and those of peers o == locating data is a challenge (its difficult to precisely predict which peers maintain the queried data) = there is no guarantee on the completeness of answers (unless the entire network is searched) = there is no guarantee on response time (except for the worst case where the entire network is searched) structured P2P system — data placement is under the control of certain predefined strategies (generally, a distributed hash table - DHT) there is a mapping between data and peers o == these systems can provide a guarantee (precise or probabilistic) on search cost = however, typically at the expense of maintaining certain additional information o (systems employing a mix between structured and unstructured topology also exist) Eva Hladká (FIMU) 6. P2P networks I. Autumn 2010 43 / 46 Taxonomy of P2P Systems Decentralized P2P Systems Taxonomy of P2P Systems III. Decentralized P2P Systems III. Peer D Peer H Figure: Decentralized P2P Systems: Peer A requests for some data that Peer D and Peer H have. The query will be broadcasted to the neighbors of Peer A, and gradually, to the other peers in the whole network (Gnutella). Eva Hladká (Fl MU) ô. P2P networks I. Autumn 2010 44 / 4ô Taxonomy of P2P Systems Hybrid P2P Systems Taxonomy of P2P Systems III. Hybrid P2P Systems Hybrid P2P Systems o the main advantage of centralized P2P systems: quick and reliable resource locating o BUT with the limitation in terms of scalability o the main advantage of decentralized P2P systems: scalability □ BUT with the limitation in terms of longer time necessary for resource locating o == Hybrid P2P systems: o to maintain the scalability, there are no central servers o however, more powerful peer nodes are selected to act as servers to serve others o = super peers o == resource locating can be done by both decentralized and centralized search techniques (asking super peers) 6. P2P networks I. Autumn 2010 45 / 46 Taxonomy of P2P Systems Hybrid P2P Systems Taxonomy of P2P Systems III. Hybrid P2P Systems III. Figure: Hybrid P2P Systems: LIGLO servers are used to identify peers independently of their IP address (thus, even though a peer changes its IP address, the system still recognizes it as a unique peer) using a global and unique identifier. (BestPeer) Eva Hladká (Fl MU) 6. P2P networks I. Autumn 2010 46 / 46