PA160: Net-Centric Computing II. Distributed Systems Luděk Matýska Slides by: Tomáš Rebok Faculty of Informatics Masaryk University Spring 2012 Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 1 / 100 Lecture overview Qí Distributed Systems • Key characteristics • Challenges and Issues • Distributed System Architectures • Inter-process Communication 0 Middleware • Remote Procedure Calls (RPC) • Remote Method Invocation (RMI) • Common Object Request Broker Architecture (CORBA) Q( Web Services Q Grid Services Qi Issues Examples • Scheduling/Load-balancing in Distributed Systems • Fault Tolerance in Distributed Systems Q Conclusion Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 2 / 100 Lecture overview Qi Distributed Systems • Key characteristics • Challenges and Issues • Distributed System Architectures • Inter-process Communication 0 Middleware • Remote Procedure Calls (RPC) • Remote Method Invocation (RMI) • Common Object Request Broker Architecture (CORBA) 0 Web Servi ces 0 Grid Services 01 Issues Examples • Scheduling/Load-balancing in Distributed Systems • Fault Tolerance in Distributed Systems 01 Conclusion Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 3 / 100 Distributed Systems Distributed Systems - Definition A system in which hardware and software components located at networked computers communicate and coordinate their actions only by message passing. Distributed System by Tanenbaum and Steen A collection of independent computers that appears to its uses as a single coherent system. • the independent/autonomous machines are interconnected by communication networks and equipped with software systems designed to produce an integrated and consistent computing environment Core objective of a distributed system: resource sharing Luděk Matýska (Fl MU) 3. Distributed Systerr Spring 2012 4 / 100 Distributed Systems Key characteristics Distributed Systems - Key characteristics Key Characteristics of Distributed Systems: • Autonomicity - there are several autonomous computational entities, each of which has its own local memory • Heterogeneity - the entities may differ in many ways • computer HW (different data types' representation), network interconnection, operating systems (different APIs), programming languages (different data structures), implementations by different developers, etc. • Concurrency - concurrent (distributed) program execution and resource access • No global clock - programs (distributed components) coordinate their actions by exchanging messages • message communication can be affected by delays, can suffer from variety of failures, and is vulnerable to security attacks • Independent failures - each component of the system can fail independently, leaving the others still running (and possibly not informed about the failure) • How to know/differ the states when a network has failed or became unusually slow? • How to know if a remote server crashed immediately? Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 5 / 100 Distributed Systems Challenges and Issues Distributed Systems - Challenges and Issues What do we want from a Distributed System (DS)? a Resource Sharing 9 Openness • Concurrency 9 Scalability 9 Fault Tolerance 9 Security 9 Transparency Luděk Matýska (Fl MU) Distributed Systei 2012 6 / 100 Distributed Systems Challenges and Issues Distributed Systems - Challenges and Issues Resource Sharing and Openness Resource Sharing • main motivating factor for constructing DSs • is should be easy for the users (and applications) to access remote resources, and to share them in a controlled and efficient way • each resource must be managed by a software that provides interfaces which enable the resource to be manipulated by clients • resource — anything you can imagine (e.g., storage facilities, data, files, Web pages, etc.) Openness • whether the system can be extended and re-implemented in various ways and new resource-sharing services can be added and made available for use by a variety of client programs » specification and documentation of key software interfaces must be published • using an Interface Definition Language (IDL) • involves HW extensibility as well • i.e., the ability to add hardware from different vendors Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 7 / 100 Distributed Systems Challenges and Issues Distributed Systems - Challenges and Issues Concurrency and Scalability Concurrency • every resource in a DS must be designed to be safe in a concurrent environment • applies not only to servers, but to objects in applications as well a ensured by standard techniques, like semaphores Scalability • a DS is scalable if the cost of adding a user (or resource) is a constant amount in terms of resources that must be added • and is able to utilize the extra hard ware/software efficiently • and remains manageable Date Computers Web servers 1979, Dec. 1989, July 1999, July 2003, Jan 0 130,000 56,218,000 171,638,297 0 5,560,866 35,424,956 Luděk Matýska (Fl MU) 3. Distributed System: Spring 2012 8/ Distributed Systems Challenges and Issues Distributed Systems - Challenges and Issues Fault Tolerance and Security Fault Tolerance • a characteristic where a distributed system provides an appropriately handling of errors that occurred in the system • the failures can be detected (sometimes hard or even impossible), masked (made hidden or less severe), or tolerated • achieved by deploying two approaches: hardware redundancy and software recovery Security • involves confidentiality, integrity, authentication, and availability Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 9 / 100 Distributed Systems Challenges and Issues Distributed Systems - Challenges and Issues Transparency I. Transparency • certain aspects of the DS should be made invisible to the user / application programmer • i.e., the system is perceived as a whole rather than a collection of independent components • several forms of transparency: • Access transparency - enables local and remote resources to be accessed using identical operations a Location transparency - enables resources to be accessed without knowledge of their location • Concurrency transparency - enables several processes to operate concurrently using shared resources without interference between them • Replication transparency - enables multiple instances of resources to be used to increase reliability and performance • without knowledge of the replicas by users / application programmers Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 10 / 100 Distributed Systems Challenges and Issues Distributed Systems - Challenges and Issues Transparency II. • forms of transparency cont'd.: • Failure transparency - enables the concealment of faults, allowing users and application programs to complete their tasks despite of a failure of HW/SW components • Mobility/migration transparency - allows the movement of resources and clients within a system without affecting the operation of users or programs • Performance transparency - allows the system to be reconfigured to improve performance as loads vary • Scaling transparency - allows the system and applications to expand in scale without changes to the system structure or application algorithms Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 11 / 100 Distributed Systems Distributed System Architectures Distributed Systems - Architecture Models An architecture model: • defines the way in which the components of systems interact with one another, and • defines the way in which the components are mapped onto an underlying network of computers • the overal goal is to ensure that the structure will meet present and possibly future demands • the major concerns are to make system reliable, manageable, adaptable, and cost-effective • principal architecture models: • client-server model - most important and most widely used • a service may be further provided by multiple servers • the servers may in turn be clients for another servers • proxy servers (caches) may be employed to increase availability and performance • peer processes - all the processes play similar roles • based either on structured (Chord, CAN, etc.), unstructured, or hybrid _architectures_ Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 12 / 100 Distributed Systems Distributed System Architectures Distributed Systems - Architecture Models Client-Server model Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 13 / 100 Distributed Systems Distributed System Architectures Distributed Systems - Architecture Models Client-Server model - A Service provided by Multiple Servers Service i-------1 Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 14 / 100 Distributed Systems Distributed System Architectures Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 Distributed Systems Inter-process Communication Distributed Systems - Inter-process Communication (IPC) • the processes (components) need to communicate • the communication may be: • synchronous - both send and receive are blocking operations • asynchronous - send is non-blocking and receive can have blocking (more common) and non-blocking variants • the simplest forms of communication: UDP and TCP sockets 0 f. binding socl client server :ket Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 16 / 100 Distributed Systems Inter-process Communication Distributed Systems - Inter-process Communication UDP and TCP sockets UDP/TCP sockets • provide unreliable/reliable communication services + the complete control over the communication lies in the hands of applications - too primitive to be used in developing a distributed system software • higher-level facilities (marshalling/unmarshalling data, error detection, error recovery, etc.) must be built from scratch by developers on top of the existing socket primitive facilities • force read/write mechanism instead of a procedure call - another problem arises when the software needs to be used in a platform different from where it was developed • the target platform may provide different socket implementation => these issues are eliminated by the use of a Middleware Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 17 / 100 Lecture overview 0 Distributed Systems • Key characteristics • Challenges and Issues • Distributed System Architectures • Inter-process Communication 0 Middleware • Remote Procedure Calls (RPC) • Remote Method Invocation (RMI) • Common Object Request Broker Architecture (CORBA) 0 Web Servi ces 0 Grid Services 0 Issues Examples • Scheduling/Load-balancing in Distributed Systems • Fault Tolerance in Distributed Systems 0 Conclusion Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 18 / 100 Middleware Middleware • a software layer that provides a programming abstraction as well as masks the heterogeneity of the underlying networks, hardware, operating systems, and programming languages • =>• provides transparency services • represented by processes/objects that interact with each other to implement communication and resource sharing support • provides building blocks for the construction of SW components that can work with one another • middleware examples: • Sun RPC (ONC RPC) » DCE RPC • MS COM/DCOM • Java RMI • CORBA • etc. Ludek Matyska (Fl MU) Distributed applications Middleware Services 3. Distributed Systerr Network Spring 2012 Middleware Basic Services Basic services provided: • Directory services - services required to locate application services and resources, and route messages • ss service discovery • Data encoding services - uniform data representation services for dealing with incompatibility problems on remote systems • e.g., Sun XDR, ISO's ASN.l, CORBA's CDR, XML, etc. • data marshalling/unmarshalling • Security services - provide inter-application client-server security mechanisms • Time services - provide a universal format for representing time on different platforms (possibly located in various time zones) in order to keep synchronisation among application processes • Transaction services - provide transaction semantics to support commit, rollback, and recovery mechanisms Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 20 / 100 Middleware Basic Services - A Need for Data Encoding Services Data encoding services are required, because remote machines may have: • different byte ordering • different sizes of integers and other types • different floating point representations • different character sets • alignment requirements main() { unsigned int n; char *a = (char *)&n; n = 0x11223344; printf("%02x, %02x, %02x, %02x\n", a[0], a[l], a[2], a[3]); } Output on a Pentium: 44, 33, 22, 11 Output on a G4: 11, 22, 33, 44 Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 21 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) • very simple idea similar to a well-known procedure call mechanism • a client sends a request and blocks until a remote server sends a response • the goal is to allow distributed programs to be written in the same style as conventional programs for centralised computer systems • while being transparent - the programmer need not be aware that the called procedure is executing on a local or a remote computer • the idea: • the remote procedure is represented as a stub on the client side • behaves like a local procedure, but rather than placing the parameters into registers, it packs them into a message, issues a send primitive, and blocks itself waiting for a reply • the server passes the arrived message to a server stub (known as skeleton as well) • the skeleton unpacks the parameters and calls the procedure in a conventional manner • the results are returned to the skeleton, which packs them into a message directed to the client stub Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 22 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Calling , Procedure parameters result Local Procedure Call Called Procedure Client Calling Procedure parameters Server Called Procedure result Ii result I ¥ pan Stub 'request message reply message Skeleton Network Remote Procedure Call Luděk Matýska (Fl MU) 3. Distributed Systems Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) The remote procedure call in detail Client program calLrpcO Client Packet parameters send{) receive() Unpack parameters Server Server program call receiveO Unpackel parameters Unpack parameters return send() Communication network Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Components • client program, client stub 8 communication modules • server stub, service procedure • dispatcher — selects one of the server stub procedures according to the procedure identifier in the request message Sun RPC: the procedures are identified by: • program number - can be obtained from a central authority to allow every program to have its own unique number • procedure number - the identifier of the particular procedure within the program • version number - changes when a procedure signature changes Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 25 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Location Services - Portmapper Clients need to know the port number of a service provided by the server => Portmapper • a server registers its program#, version#, and port# to the local portmapper • a client finds out the port# by sending a request • the portmapper listens on a well-known port (111) • the particular procedure required is identified in the subsequent procedure call 0 0 well-known port CI port mapper [prog#; v#, port #] client 0 server Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 26 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Parameters passing How to pass parameters to remote procedures? • pass by value - easy: just copy data to the network message • pass by reference - makes no sense without shared memory Pass by reference: the steps O copy referenced items (marshalled) to a message buffer O ship them over, unmarshal data at server O pass local pointer to server stub function Q send new values back • to support complex structures: a copy the structure into pointerless representation • transmit • reconstruct the structure with local pointers on the server Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Parameters passing - eXternal Data Representation (XDR) Sun RPC: to avoid compatibility problems, the eXternal Data Representation (XDR) is used • XDR primitive functions examples: • xdr_int(), xdr_char(), xdr_u_short(), xdr_bool(), xdr_long(), xdr_u_int(), xdr_wrapstring(), xdr_short(), xdr_enum(), xdr_void() • XDR aggregation functions: • xdr_array(), xdr_string(), xdr_union(), xdr_vector(), xdr_opaque() • only a single input parameter is allowed in a procedure call • =>• procedures requiring multiple parameters must include them as components of a single structure Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 28 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) When Things Go Wrong I. • local procedure calls do not fail • if they core dump, entire process dies • there are more opportunities for errors with RPC • server could generate an error • problems in network (lost/delayed requests/replies) • server crash • client might crash while server is still executing code for it • transparency breaks here • applications should be prepared to deal with RPC failures Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 29 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) When Things Go Wrong II. Semantics of local procedure calls: exactly once • difficult to achieve with RPC Four remote calls semantics available in RPC: • at-least-once semantic • client keeps trying sending the message until a reply has been received • failure is assumed after n re-sends • guarantees that the call has been made "at least once", but possibly multiple times • ideal for idempotent operations • at-most-once semantic • client gives up immediately and reports back a failure • guarantees that the call has been made "at most once", but possibly none at all • exactly-once semantic • the most desirable, but the most difficult to implement • maybe semantic • no message delivery guarantees are provided at all • (easy to implement) Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 30 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) When Things Go Wrong III. Client Server request Timeout Timeout request ---. /ofU ^* different response Timeout Timeout suits of the execution of the requests could be Table of current request acknowledgment a b Figure: Message-passing semantics, (a) at-least-once; (b) exactly-once 3. Distributed Systems Spring 2012 31 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) When Things Go Wrong IV. - Complications O it is necessary to understand the application • idempotent functions - in the case of a failure, the message may be retransmitted and re-run without a harm • non-idempotent functions - has side-effects => the retransmission has to be controlled by the server • the duplicity request (retransmission) has to be detected • once detected, the server procedure is NOT re-run; just the results are resent (if available in a server cache) O in the case of a server crash, the order of execution vs. crash matters REQ No REP -<------ REQ Receive - Receive Execute ICrashl . iCrashl No REP •«------- in the case of a client crash, the procedure keeps running on the server • consumes resources (e.g., CPU time), possesess resources (e.g., locked files) etc. • may be overcome by employing soft-state principles • keep-alive messages Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 32 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Code Generation I. • RPC drawbacks: • complex API, not easy to debug • the use of XDR is difficult • but, it's often used in a similar way • => the server/client code can be automatically generated • assumes well-defined interfaces (IDL) • the application programmer has to supply the following: • interface definition file - defines the interfaces (data structures, procedure names, and parameters) of the remote procedures that are offered by the server • client program - defines the user interfaces, the calls to the remote procedures of the server, and the client side processing functions ° server program - implements the calls offered by the server • compilers: • rpcgen for C/C++, jrpcgen for Java Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 33 / 100 Middleware Remote Procedure Calls (RPC) Remote Procedure Calls (RPC) Code Generation II. Client Program Interface Definition i C^^UIC- Remote Method Invocation (RMI) • essentially the same as the RPC, except that it operates on objects instead of applications/procedures • the RMI model represents a distributed object application • it allows an object inside a JVM (a client) to invoke a method on an object running on a remote JVM (a server) and have the results returned to the client • the server application creates an object and makes it accesible remotely (i.e., registers it) • the client application receives a reference to the object on the server and invokes methods on it • the reference is obtained through looking up in the registry • important: a method invocation on a remote object has the same syntax as a method invocation on a local object Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 35 / 100 Middleware Remote Method Invocation (RMI) Remote Method Invocation (RMI) Architecture I. The interface, through which the client and server interact, is (similarly to RPC) provided by stubs and skeletons: Client Server r ^ JVM C \ JVM r \ Calling Object Called Object parameters 1 f result r parameter^ ' i result Stub Skeleton V request message reply message r \ r Network Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 36 / 100 Middleware Remote Method Invocation (RMI) Remote Method Invocation (RMI) Architecture II. Two fundamental concepts as the heart of distributed object model: • remote object reference - an identifier that can be used throughout a distributed system to refer to a particular unique remote object • its construction must ensure its uniqueness 32 bits 32 bits 32 bits 32 bits Internet address port number time object number interface of remote object • remote interface - specifies, which methods of the particular object can be invoked remotely Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 37 / 100 Middleware Remote Method Invocation (RMI) Remote Method Invocation (RMI) Architecture III. • the remote objects can be accessed concurrently • the encapsulation allows objects to provide methods for protecting themselves against incorrect accesses • e.g., synchronization primitives (condition variables, semaphores, etc.) • RMI transaction semantics similar to the RPC ones • at-least-once, at-most-once, exactly-once, and maybe semantics • data encoding services: • stubs use Object Serialization to marshal the arguments • object arguments' values are rendered into a stream of bytes that can be transmitted over a network • => the arguments must be primitive types or objects that implement Serializable interface • parameters passing: • local objects passed by value • remote objects passed by reference Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 38 / 100 Middleware Common Object Request Broker Architecture (CORBA) Common Object Request Broker Architecture (CORBA) Common Object Request Broker Architecture (CORBA) • an industry standard developed by the OMG (Object Management Group - a consortium of more than 700 companies) to aid in distributed objects programming • OMG was established in 1988 o initial CORBA specification came out in 1992 • but significant revisions have taken place from that time • provides a platform-independent and language-independent architecture (framework) for writing distributed, object-oriented applications • i.e., application programs can communicate without restrictions to: • programming languages, hardware platforms, software platforms, networks they communicate over • but CORBA is just a specification for creating and using distributed objects; it is not a piece of software or a programming language • several implementations of the CORBA standard exist (e.g., IBM's SOM and DSOM architectures) Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 39 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components CORBA is composed of five major components: • Object Request Broker (ORB) 9 Interface Definition Language (IDL) • Dynamic Invocation Interface (Dll) • Interface Repositories (IR) • Object Adapters (OA) 1 Implementation 1 f interface 1 I repository I i repository J Or dynamic skeleton Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 40 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Object Request Broker (ORB) I. Object Request Broker (ORB) • the heart of CORBA • introduced as a part of OMG's Object Management Architecture (OMA), which the CORBA is based on • a distributed service that implements all the requests to the remote object(s) • it locates the remote object on the network, communicates the request to the object, waits for the results and (when available) communicates those results back to the client • implements location transparency • exactly the same request mechanism is used regardless of where the object is located • might be in the same process with the client or across the planet • implements programming language independence • the client issuing a request can be written in a different programming language from the implementation of the CORBA object • both the client and the object implementation are isolated from the ORB by an IDL interface • Internet Inter-ORB Protocol (HOP) - the standard communication protocol between ORBs Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 41 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Object Request Broker (ORB) II. Object Implementation n r IDL Object Implementation llDL* ' in t........_ ORB 1 ORB NETWORK Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 42 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Interface Definition Language (IDL) I. Interface Definition Language (IDL) • as with RMI, CORBA objects have to be specified with interfaces • interface « a contract between the client (code using a object) and the server (code implementing the object) • indicates a set of operations the object supports and how they should be invoked (but NOT how they are implemented) • defines modules, interfaces, types, attributes, exceptions, and method signatures • uses same lexical rules as C++ • with additional keywords to support distribution (e.g. interface, any, attribute, in, out, inout, readonly, raises) • defines language bindings for many different programming languages (e.g., C/C++, Java, etc.) • via language mappings, the IDL translates to different constructs in the different implementation languages • it allows an object implementor to choose the appropriate programming language for the object, and • it allows the developer of the client to choose the appropriate and possibly different programming language for the client Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 43 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Interface Definition Language (IDL) II. Interface Definition Language (IDL) example: module StockObjects { struct Quote { string symbol; long at_time; double price; long volume; }; exception Unknown-Q; interface Stock { // Returns the current stock quote. Quote get_quote() raises(Unknown); // Sets the current stock quote. void set_quote(in Quote stock_quote); // // Provides the stock description, e.g. company name, readonly attribute string description; }; interface StockFactory { Stock create_stock(in string symbol, in string description); }; }; Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 44 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Interface Definition Language (IDL) III. - Stubs and Skeletons IDL compiler automatically compiles the IDL into client stubs and object skeletons: client program call language mapping operation signatures 1 object implementation language mapping entry points Location Service Transport Layer ORB Stabs and Skeletons are automatically generatt Basic Object Adapter Multithreading iteratedfrom IDL interfaces f Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 45 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Interface Definition Language (IDL) IV. - Development Process Using IDL Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 46 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Dynamic Invocation Interface (DM) & Dynamic Skeleton Interface (DSI) Dynamic Invocation Interface (DM) • CORBA supports both the dynamic and the static invocation interfaces • static invocation interfaces are determined at compile time • dynamic interfaces allow client applications to use server objects without knowing the type of those objects at compile time • DM - an API which allows dynamic construction of CORBA object invocations Dynamic Skeleton Interface (DSI) • DSI is the server side's analogue to the client side's DM • allows an ORB to deliver requests to an object implementation that does not have compile-time knowledge of the type of the object it is implementing Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 47 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Interface Repository (IR) Interface Repository (IR) • a runtime component used to dynamically obtain information on IDL types (e.g. object interfaces) • using the IR, a client should be able to locate an object that is unknown at compile time, find information about its interface, and build a request to be forwarded through the ORB • this kind of information is necessary when a client wants to use the DM to construct requests dynamically Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 48 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Components Object Adapters (OAs) Object Adapters (OAs) • the interface between the ORB and the server process • OAs listen for client connections/requests and map the inbound requests to the desired target object instance • provide an API that object implementations use for: a generation and interpretation of object references • method invocation • security of interactions • object and implementation activation and deactivation • mapping object references to the corresponding object implementations • registration of implementations • two basic kinds of OAs: • basic object adapter (BOA) - leaves many features unsupported, requiring proprietary extensions • portable object adapter (POA) - intended to support multiple ORB implementations (of different vendors), allow persistent objects, etc. Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 49 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Object & Object Reference CORBA Objects are fully encapsulated • accessed through well-defined interfaces only • interfaces & implementations are totally separate • for one interface, multiple implementations possible • one implementation may be supporting multiple interfaces IDL Interface ^^^^ — CORBA Object Reference is the distributed computing equivalent of a pointer • CORBA defines the Interoperable Object Reference (IOR) • an IOR contains a fixed object key, containing: • the object's fully qualified interface name (repository ID) • user-defined data for the instance identifier • can also contain transient information: • the host and port of its server, metadata about the server's ORB (for potential optimizations), etc. • => the IOR uniquely identifies one object instance Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 50 / 100 CORBA Services Middleware Common Object Request Broker Architecture (CORBA) CORBA Services (COS) • the OMG has defined a set of Common Object Services to support the integration and interoperation of distributed objects • — frequently used components needed for building robust applications • typically supplied by vendors • OMG defines interfaces to services to ensure interoperability Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 51 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Services Popular Services Example Service Description Object life cycle Defines how CORBA objects are created, removed, moved, and copied Naming Defines how CORBA objects can have friendly symbolic names Events Decouples the communication between distributed objects Relationships Provides arbitrary typed n-ary relationships between CORBA objects Externalization Coordinates the transformation of CORBA objects to and from external media Transactions Coordinates atomic access to CORBA objects Concurrency Control Provides a locking service for CORBA objects in order to ensure serializable access Property Supports the association of name-value pairs with CORBA objects Trader Supports the finding of CORBA objects based on properties describing the service offered by the object Query Supports queries on objects Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 52 / 100 Middleware Common Object Request Broker Architecture (CORBA) CORBA Architecture Summary interface repository idl compiler implementation repository in args client M operationO iREFy out args + return value OBJECT (servant) -o DLT ldl STUBS _/ 3, ORB interface idl skeleton dsi -j\ object adapter c Giop/nop ( ) standard interface \\ ( ) standard language mapping ( ) ORB-specific interface ( ) standard protocol Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 53 / 100 Lecture overview Q| Distributed Systems • Key characteristics • Challenges and Issues • Distributed System Architectures • Inter-process Communication 01 Middleware • Remote Procedure Calls (RPC) • Remote Method Invocation (RMI) • Common Object Request Broker Architecture (CORBA) Q Web Services Qi Grid Services Q Issues Examples • Scheduling/Load-balancing in Distributed Systems • Fault Tolerance in Distributed Systems Q Conclusion Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 54 / 100 Lecture overview 0 Distributed Systems • Key characteristics • Challenges and Issues • Distributed System Architectures • Inter-process Communication 0 Middleware • Remote Procedure Calls (RPC) • Remote Method Invocation (RMI) • Common Object Request Broker Architecture (CORBA) 0 Web Servi ces Q Grid Services 01 Issues Examples • Scheduling/Load-balancing in Distributed Systems • Fault Tolerance in Distributed Systems 0 Conclusion Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 55 / 100 Lecture overview 0 Distributed Systems • Key characteristics • Challenges and Issues • Distributed System Architectures • Inter-process Communication 0 Middleware • Remote Procedure Calls (RPC) • Remote Method Invocation (RMI) • Common Object Request Broker Architecture (CORBA) 0 Web Servi ces 0 Grid Services 01 Issues Examples • Scheduling/Load-balancing in Distributed Systems • Fault Tolerance in Distributed Systems 01 Conclusion Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 56 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems I. For concurrent execution of interacting processes: • communication and synchronization between processes are the two essential system components Before the processes can execute, they need to be: » scheduled and • allocated with resources Why scheduling in distributed systems is of special interest? • because of the issues that are different from those in traditional multiprocessor systems: • the communication overhead is significant • the effect of underlying architecture cannot be ignored • the dynamic behaviour of the system must be addressed • local scheduling (on each node) + global scheduling Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 57 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems II. Scheduling/Load-balancing in Distributed Systems • let's have a pool of jobs • there are some inter-dependencies among them • let's have a set of nodes (processors), which are able to reciprocally communicate Load-balancing The term load-balancing means assigning the jobs to the processors in the way, which minimizes the time/communication overhead necessary to compute them. • load-balancing - divides the jobs among the processors • scheduling - defines, in which order the jobs have to be executed (on each processor) • load-balancing and planning are tightly-coupled (usually considered as synonyms in DSs) • objectives: • enhance overall system performance metric • process completion time and processor utilization • location and performance transparency Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 58 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems III. • the scheduling/load-balancing task can be represented using graph theory: • the pool of N jobs with dependencies can be described as a graph G(V, U), where • the nodes represent the jobs (processes) • the edges represent the dependencies among the jobs/processes (e.g., an edge from / to j requires that the process / has to complete before j can start executing) • the graph G has to be splitted into p parts, so that: • N = A/i U N2 U • • • U Np • which satisfy the condition, that |A/,| ss —, where » | A/,-1 is the number of jobs assigned to the processor /, and o p is the number of processors, and • the number/cost of the edges connecting the parts is minimal • the objectives: » uniform jobs' load-balancing • minimizing the communication (the minimal number of edges among the parts) • the splitting problem is NP-complete Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 59 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems III. An illustration (a) Precedence (b) Communication (c) Disjoint process model process model process model Figure: An illustration of splitting 4 jobs onto 2 processors. Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 60 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems IV. • the "proper" approach to the scheduling/load-balancing problem depends on the following criteria: • jobs' cost • dependencies among the jobs • jobs' locality Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 61 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems IV. Jobs' Cost • the job's cost may be known: • before the whole problem set's execution • during problem's execution, but before the particular job's execution • just after the particular job finishes • cost's variability - all the jobs may have (more or less) the same cost or the costs may differ • the problem classes based on jobs' cost: • all the jobs have the same cost: easy • the costs are variable, but, known: more complex • the costs are unknown in advance: the most complex Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 62 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems IV. Dependencies Among the Jobs • is the order of jobs' execution important? • the dependencies among the jobs may be known: • before the whole problem set's execution • during problem's execution, but before the particular job's execution • are fully dynamic • the problem classes based on jobs' dependencies: • the jobs are fully independent on each other: easy » the dependencies are known or predictable: more complex • flooding • in-trees, out-trees (balanced or unbalanced) • generic oriented trees (DAG) • the dependencies dynamically change: the most complex • e.g., searching/lookup problems Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 63 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in Distributed Systems IV. Locality • communicate all the jobs in the same/similar way? • is it suitable/necessary to execute some jobs "close" to each other? • when the job's communication dependencies are known? • the problem classes based on jobs' locality: • the jobs do not communicate (at most during initialization): easy • the communications are known/predictable: more complex • regular (e.g., a grid) or irregular • the communications are unknown in advance: the most complex • e.g., a discrete events' simulation Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 64 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods • in general, the "proper" solving method depends on the time, when the particular information is known • basic solving algorithms' classes: • static - offline algorithms • semi-static - hybrid approaches • dynamic - online algorithms • some (but not all) variants: • static load-balancing • semi-static load-balancing • self-scheduling • distributed queues • DAG planning Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 65 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods Semi-static load-balancing Semi-static load-balancing • suitable for problem sets with slow changes in parameters, and with locality importance • iterative approach • uses static algorithm • the result (from the static algorithm) is used for several steps (slight unbalance is accepted) • after the steps, the problem set is recalculated with the static algorithm again • often used for: • particle simulation • calculations of slowly-changing grids (but in a different sense than in the previous lectures) Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 66 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods Self-scheduling I. Self-scheduling • a centralized pool of jobs • idle processors pick the jobs from the pool • new (sub)jobs are added to the pool + ease of implementation • suitable for: • a set of independent jobs • jobs with unknown costs • jobs where locality does not matter • unsuitable for too small jobs - due to the communication overhead • => coupling jobs into bulks • fixed size a controlled coupling » tapering • weighted distribution Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 67 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods Self-scheduling II. - Fixed size & Controlled coupling Fixed size • typical offline algorithm • requires much information (number and cost of each job, . ..) a it is possible to find the optimal solution • theoretically important, not suitable for practical solutions Controlled coupling • uses bigger bulks in the beginning of the execution, smaller bulks in the end of the execution • lower overhead in the beginning, finer coupling in the end • the bulk's size is computed as: K-, = where: • /?,-... the number of remaining jobs » p . .. the number of processors Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 68 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods Self-scheduling II. - Tapering & Weighted distribution Tapering • analogical to the Controlled coupling, but the bulks' size is further a function of jobs' variation • uses historical information • low variance => bigger bulks • high variance => smaller bulks Weighted distribution • considers the nodes' computational power • suitable for heterogenous systems • uses historical information as well Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 69 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods Distributed Queues Distributed Queues • « self-scheduling for distributed memory • instead of a centralized pool, a queue on each node is used (per-processor queues) • suitable for: • distributed systems, where the locality does not matter • for both static and dynamic dependencies • for unknown costs • an example: diffuse approach • in every step, the cost of jobs remaining on each processor is computed • processors exchange this information and perform the balancing • locality must not be important Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 70 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods Centralised Pool vs. Distributed Queues < < Figure: Centralised Pool (left) vs. Distributed Queues (right). Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Solving Methods DAG Planning DAG Planning • another graph model • the nodes represent the jobs (possibly weighted) • the edges represent the dependencies and/or the communication (may be also weighted) • e.g., suitable for digital signal processing • basic strategy - divide the DAG so that the communication and the processors' occupation (time) is minimized a NP-complete problem • takes the dependencies among the jobs into account Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 72 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Design Issues I. When the scheduling/load-balancing is necessary? • for middle-loaded systems • lowly-loaded systems - rarely job waiting (there's always an idle processor) • highly-loaded systems - little benefit (the load-balancing cannot help) What is the performance metric? • mean response time What is the measure of load? • must be easy to measure • must reflect performance improvement • example: queue lengths at CPU, CPU utilization Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 73 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Design Issues I. Components Types of policies: • static (decisions hardwired into system), dynamic (uses load information), adaptive (policy varies according to load) Policies: • Transfer policy: when to transfer a process? • threshold-based policies are common and easy • Selection policy: which process to transfer? • prefer new processes • transfer cost should be small compared to execution cost • => select processes with long execution times • Location policy: where to transfer the process? • polling, random, nearest neighbor, etc. • Information policy: when and from where? • demand driven (only a sender/receiver may ask for), time-driven (periodic), state-change-driven (send update if load changes) Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Design Issues II. Sender-initiated Policy Sender-initiated Policy a Transfer policy • Selection policy: newly arrived process • Location policy: three variations • Random - may generate lots of transfers • => necessary to limit max transfers • Threshold - probe n nodes sequentially • transfer to the first node below the threshold, if none, keep job • Shortest - poll Np nodes in parallel • choose least loaded node below T • if none, keep the job Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 75 / 100 process new t try to xfer Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Design Issues II. Receiver-initiated Policy Receiver-initiated Policy • Transfer policy: if departing process causes load < T, find a process from elsewhere • Selection policy: newly arrived or partially executed process • Location policy: • Threshold - probe up to Np other nodes sequentially » transfer from first one above the threshold; if none, do nothing • Shortest - poll n nodes in parallel • choose the node with heaviest load above T Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 76 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Design Issues II. Symmetric Policy Symmetric Policy • combines previous two policies without change » nodes act as both senders and receivers • uses average load as the threshold T sender - initiated — avg. load --■ racvr initiated Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 77 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Case study V-System (Stanford) V-System (Stanford) • state-change driven information policy • significant change in CPU/memory utilization is broadcast to all other nodes • M least loaded nodes are receivers, others are senders • sender-initiated with new job selection policy • Location policy: • probe random receiver • if still receiver (below the threshold), transfer the job • otherwise try another Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 78 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Case study Sprite (Berkeley) I. Sprite (Berkeley) • Centralized information policy: coordinator keeps info • state-change driven information policy • Receiver: workstation with no keyboard/mouse activity for the defined time period (30 seconds) and below the limit (active processes < number of processors) • Selection policy: manually done by user => workstation becomes sender • Location policy: sender queries coordinator • the workstation with the foreign process becomes sender if user becomes active Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 79 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Case study Sprite (Berkeley) II. Sprite (Berkeley) cont'd. • Sprite process migration: • facilitated by the Sprite file system • state transfer: • swap everything out • send page tables and file descriptors to the receiver • create/establish the process on the receiver and load the necessary pages • pass the control • the only problem: communication-dependencies • solution: redirect the communication from the workstation to the receiver Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 80 / 100 Issues Examples Scheduling/Load-balancing in Distributed Systems Scheduling/Load-balancing in DSs - Code and Process Migration Code and Process Migration • key reasons: performance and flexibility • flexibility: • dynamic configuration of distributed system » clients don't need preinstalled software (download on demand) • process migration (strong mobility) • process — code + data + stack • examples: Condor, DQS • code migration (weak mobility) • transferred program always starts from its initial state • migration in heterogeneous systems: • only weak mobility is supported in common systems (recompile code, no run time information) • the virtual machines may be used: interprets (scripts) or intermediate code (Java) Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 81 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in Distributed Systems I. • single machine systems • failures are all or nothing • OS crash, disk failures, etc. • distributed systems: multiple independent nodes • partial failures are also possible (some nodes fail) • probability of failure grows with number of independent components (nodes) in the system • fault tolerance: system should provide services despite faults • transient faults • intermittent faults • permanent faults Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 82 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in Distributed Systems I. Failure Types Type of failure Description Crash failure A server halts, but is working correctly until it halts Omission failure Receive omission Send omission A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages Timing failure A server's response lies outside the specified time interval Response failure Value failure State transition failure The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control Arbitrary failure A server may produce arbitrary responses at arbitrary times Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 83 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in Distributed Systems II. • handling faulty processes: through redundancy • organize several processes into a group • all processes perform the same computation • all messages are sent to all the members of the particular group • majority needs to agree on results of a computation • ideally, multiple independent implementations of the application are desirable (to prevent identical bugs) • use process groups to organize such processes Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 84 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in Distributed Systems III. Hierarchical group Coordinator -Worker (a) (b) Figure: Flat Groups vs. Hierarchical Groups. Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 85 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems a How should processes agree on results of a computation? • K-fault tolerant: system can survive k faults and yet function • assume processes fail silently » => need (k + 1) redundancy to tolerant k faults • Byzantine failures: processes run even if sick • produce erroneous, random or malicious replies • byzantine failures are most difficult to deal with Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Byzantine Faults Byzantine Generals Problem: • four generals lead their divisions of an army • the divisions camp on the mountains on the four sides of an enemy-occupied valley • the divisions can only communicate via messengers • messengers are totally reliable, but may need an arbitrary amount of time to cross the valley • they may even be captured and never arrive • if the actions taken by each division is not consistent with that of the others, the army will be defeated • we need a scheme for the generals to agree on a common plan of action (attack or retreat) • even if some of the generals are traitors who will do anything to prevent loyal generals from reaching the agreement • the problem is nontrivial even if messengers are totally reliable • with unreliable messengers, the problem is very complex • Fischer, Lynch, Paterson: in asynchronous systems, it is impossible to reach a consensus in a finite amount of time Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 87 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Formal Definition I. Formal definition of the agreement problem in DSs: • let's have a set of distributed processes with initial states G 0,1 • the goal: all the processes have to agree on the same value » additional requirement: it must be possible to agree on both 0 or 1 states • basic assumptions: • system is asynchronous • no bounds on processes' execution delays exist • no bounds on messages' delivery delay exist • there are no synchronized clocks o no communication failures - every process can communicate with its neighbors • processes fail by crashing - we do not consider byzantine failures Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 88 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Formal Definition II. Formal definition of the agreement problem in DSs: cont'd. • implications: => there is no deterministic algorithm which resolves the consensus problem in an asynchronous system with processes, which may fail • because it is impossible to distinguish the cases: • a process does not react, because it has failed • a process does not react, because it is slow • practically overcomed by establishing timeouts and by ignoring/killing too slow processes • timeouts used in so-called Failure Detectors (see later) Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 89 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Fault-tolerant Broadcast Fault-tolerant Broadcast: • if there was a proper type of fault-tolerant broadcast, the agreement problem would be solvable 9 various types of broadcasts: • reliable broadcast • FIFO broadcast • casual broadcast • atomic broadcast - the broadcast, which would solve the agreement problem in asynchronous systems Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 90 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Fault-tolerant Broadcast - Reliable Broadcast Reliable Broadcast: • basic features: • Validity - if a correct process broadcasts m, then it eventually delivers m • Agreement - if a correct process delivers m, then all correct processes eventually deliver m • (Uniform) Integrity - m is delivered by a process at most once, and only if it was previously broadcasted • possible to implement using send/receive primitives: • the process p sending the broadcast message marks the message by its identifier and sequence number • and sends it to all its neighbors • once a message is received: • if the message has not been previously received (based in sender's ID and sequence number), the message is delivered • if the particular process is not message's sender, it delivers it to all its neighbors Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 91 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Fault-tolerant Broadcast - FIFO Broadcast FIFO Broadcast: • the reliable broadcast cannot assure the messages' ordering • it is possible to receive a subsequent message (from the sender's view) before the previous one is received • FIFO broadcast: the messages from a single sender have to be delivered in the same order as they were sent • FIFO broadcast = Reliable broadcast + FIFO ordering • if a process p broadcasts a message m before it broadcasts a message m', then no correct process delivers m' unless it has previously delivered m • broadcastp(m) —> broadcastp(m') => deliverq(m) —> deliverq(m') • a simple extension of the reliable broadcast Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 92 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Fault-tolerant Broadcast - Casual Broadcast Casual Broadcast: • the FIFO broadcast is still not sufficient: it is possible to receive a message from a third party, which is a reaction to a particular message before receiving that particular message • => Casual broadcast • Casual broadcast = Reliable broadcast + casual ordering • if the broadcast of a message m happens before the broadcast of a message m', then no correct process delivers m' unless it has previously delivered m • broadcastp(m) —> broadcastq(m') => deliverr(m) —> deliverr(m') • can be implemented as an extension of the FIFO broadcast Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 93 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Fault-tolerant Broadcast - Atomic Broadcast Atomic Broadcast: • even the casual broadcast is still not sufficient: sometimes, it is necessary to guarantee the proper in-order delivery of all the replicas • two bank offices: one of them receives the information about adding an interest before adding a particular amount of money to the account, the second one receives these messages contrariwise • => inconsistency • => Atomic broadcast • Atomic broadcast = Reliable broadcast + total ordering • if correct processes p and q both deliver messages m, m', then p delivers m before m' if and only if q delivers m before m' • deliverp(m) —> deliverp(m') => deliverq(m) —> deliverq(m') • does not exist in asynchronous systems Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 94 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Fault-tolerant Broadcast - Timed Reliable Broadcast Timed Reliable Broadcast: • a way to practical solution • introduces an upper limit (time), before which every message has to be delivered • Timed Reliable broadcast = Reliable broadcast + timeliness • there is a known constant A such that if a message is broadcasted at real-time t, then no correct (any) process delivers m after real-time t + A • feasible in asynchronous systems • A kind of "approximation" of atomic broadcast Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 95 / 100 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems Failure Detectors I. • impossibility of consensus caused by inability to detect slow process and a failed process • synchronous systems: let's use timeouts to determine whether a process has crashed • => Failure Detectors Failure Detectors (FDs): • a distributed oracle that provides hints about the operational status of processes (which processes had failed) • FDs communicate via atomic/time reliable broadcast • every process maintains its own FD » and asks just it to determine, whether a process had failed • however: • hints may be incorrect » FD may give different hints to different processes • FD may change its mind (over & over) about the operational status of a process Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 96 / 100 Issues Examples Fault Tolerance in Distributed Systems Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 Issues Examples Fault Tolerance in Distributed Systems Fault Tolerance in DSs - Agreement in Faulty Systems (Perfect) Failure Detector Perfect Failure Detector: • properties: • Eventual Strong Completeness - eventually every process that has crashed is permanently suspected by all non-crashed processes • Eventual Strong Accuracy - no correct process is ever suspected • hard to implement • is perfect failure detection necessary for consensus? No. • => weaker Failure Detector weaker Failure Detector: • properties: • Strong Completeness - there is a time after which every faulty process is suspected by every correct process • Eventual Strong Accuracy - there is a time after which no correct process is suspected • can be used to solve the consensus a this is the weakest FD that can be used to solve the consensus Ludek Matyska (Fl MU) 3. Distributed Systems Spring 2012 98 / 100 Conclusion Lecture overview Distributed Systems 9 Key characteristics 9 Challenges and Issues 9 Distributed System Architectures 9 Inter-process Communication Middleware • Remote Procedure Calls (RPC) • Remote Method Invocation (RMI) 0 Common Object Request Broker Architecture (CORBA) Web Services Grid Services Issues Examples 9 Scheduling/Load-balancing in Distributed Systems 9 Fault Tolerance in Distributed Systems Conclusion Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 99 / 100 Distributed Systems - Further Information 9 Fl courses: • PA150: Advanced Operating Sytems Concepts (doc. Staudek) • PA053: Distributed Systems and Middleware (doc. Tůma) • IA039: Supercomputer Architecture and Intensive Computations (prof. Matýska) • PA177: High Performance Computing (LSU, prof. Sterling) • IV100: Parallel and distributed computations (doc. Královic) • IB109: Design and Implementation of Parallel Systems (dr. Barnat) » etc. • (Used) Literature: • W. Jia and W. Zhou. Distributed Network Systems: From concepts to implementations. Springer, 2005. » A. S. Tanenbaum and M. V. Steen. Distributed Systems: Principles and paradigms. Pearson Prencite Hall, 2007. • G. Coulouris, J. Dollimore, and T. Kindberg. Distributed Systems: Concepts and design. Addison-Wesley publishers, 2001. • Z. Tari and 0. Bukhres. Fundamentals of Distributed Object Systems: The CORBA perspective. John Wiley & Sons, 2001. a etc. Luděk Matýska (Fl MU) 3. Distributed Systems Spring 2012 100 / 100