Semantic Web, SW Services, Grid, Cloud Martin Kuba, ÚVT MU makub@ics.muni.cz Semantic Web ● idea introduced by Tim Berners Lee (inventor of WWW) in 2001 ● “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation” ● web instead of platform for distributed presentations would be platform for distributed knowledge Semantic Continuum ● semantics = meaning ● semantic in SW means machine-processable ● semantic continuum (Uschold 2003) a. implicit semantics in the minds of humans b. explicit informal semantics (text description in natural language, e.g. HTML specification) c. formal semantics for humans (in formal language processed by humans) d. formal semantics for machine processing ● goal is to create robotic decision-making devices ● metadata - data about data Expressing Semantics ● folksonomies ● microdata ● RDF triples and RDF Schema vocabularies ● OWL-DL ontologies for automated reasoning Folksonomies ● keyword metadata as tags ● e.g. an image of a dog may be tagged with tags dog, collie or pet ● (+) low entry barrier, no user training ● (-) no synonym control, flat structure ● tag clouds Microdata ● competing Microdata, Microformats, RDFa ● nesting semantics within existing content on web pages ● RDFa only inside XML, not in HTML5 ● Microdata provides JavaScript API ● Microdata use namespace-qualified vocabularies predefined at data-vocabulary.org or schema.org ● supported by Google search engine ● opposite vision than in 2000: ○ XML with CSS or XSLT - semantic markup with presentational metadata ○ HTML5 with Microdata - presentational markup with semantic metadata Comparison of Microdata and others RDF - Resource Description Framework ● statements about web resources ● triples subject-predicate-object ● subject and predicate are URIs ● object can be a URI or a data value ● reification - an RDF statement is assigned a URI and treated as a resource ● producers and consumers of RDF statements must agree on the semantics of the resource identifiers, conveyed by some controlled vocabulary RDF Schema ● tool for defining controlled vocabularies ● defines ○ classes of things ○ properties (binary predicates) ○ subsumption relationships (subclasses, subproperties) ○ rdf:type - resource is an instance of a class ● SPARQL (SPARQL Protocol and RDF Query Language) is an SQL-like language for querying RDF graphs ● entailment rules allow to entail e.g. that when a resource is in a particular class, then it is also in all its superclasses RDF Schema example ● RDFS can define two classes: ○ Person ○ Student as subclass of Person ● a RDF statement may state that a resource representing John Doe is of rdf:type Student ● by entailment, John Doe is also a Person OWL ● Web Ontology Language defined by W3C ● ontology is a term from artificial intelligence ● ontology is “an explicit (written) formal conceptualization”, used for capturing knowledge about some domain of interest ● “conceptualization is an abstract simplified view of some selected part of the world, containing the objects, concepts, and other entities that are presumed of interest for some particular purpose and the relationships between them” ● OWL 1 released in 2004, OWL 2 in 2009 ● two different (incompatible) semantics ○ RDF based - OWL Full ○ DL (Description Logics) based - OWL DL OWL DL ● Description Logics is a family of logics ● decidable fragment of First Order Predicate Logic (FOL) plus decidable extensions ● reasoners - software able to entail complete inferrable knowledge in finite time ● OWL DL uses: ○ classes ○ individuals ○ properties (binary relations) ■ object properties (between two objects) ■ data properties (between object and data literal) ● can use SWRL (Sem. Web. Rule Language) Prefix(:=) Prefix(xsd:= Ontology( Declaration( Class( :Person ) ) Declaration( Class( :MarriedPerson )) Declaration( NamedIndividual( :Martin ) ) Declaration( NamedIndividual(:Lenka ) ) Declaration( ObjectProperty(:hasSpouse ) ) Declaration( DataProperty(:hasEmail ) ) SymmetricObjectProperty( :hasSpouse ) FunctionalObjectProperty( :hasSpouse ) ClassAssertion( :Person :Lenka ) ClassAssertion( :Person :Martin ) DifferentIndividuals( :Martin :Lenka ) ObjectPropertyAssertion( :hasSpouse :Martin :Lenka ) DataPropertyAssertion( :hasEmail :Martin "makub@ics.muni.cz"^^xsd:string ) SubClassOf( :MarriedPerson :Person ) EquivalentClasses( :MarriedPerson ObjectSomeValuesFrom( :hasSpouse :Person )) OWL DL Tools ● ontology editor with GUI - Protege 4.1 ○ http://protege.stanford.edu/ ● reasoners ○ Pellet ○ HermiT ○ FACT++ ○ Stardog ● Java API for OWL - OWL API 3 ○ http://owlapi.sourceforge.net/ Limits of OWL DL ● based on FOL ∀x∃y(P(x)→Q(f(y))) ● cannot express ○ fuzzy expressions - “It often rains in autumn.” ○ non-monotonicity - “Birds fly, penguin is a bird, but penguin does not fly.” ○ propositional attitudes - “Eve thinks that 2 is not a prime number.” ○ modal logic ■ possibility and necessity - “It is possible that it will rain today.” ■ epistemic modalities - “Eve knows that 2 is a prime number.” ■ temporal logic - “I am always hungry.” ■ deontic logic - “You must do this.” ● Transparent Intensional Logic (TIL) ○ can express anything that can be said ○ has no calculus or reasoning algorithms Semantic Web Services ● research efforts OWL-S, WSDL-S, WSMO ● semantics can enhance discovery ○ on the semantic continuum move it from b) to d) ○ e.g. search for "getHardDriveQuote" can find also "getQuoteForHardDrive" (synonym) and "getSCSIDriveQuote" (subsumed term) ● web service semantics ○ Data semantics - it defines meaning of the data, i.e. inputs and outputs of operations ○ Functional semantics - it defines meaning of the operations, i.e. how they transform input to output ○ QoS semantics - it provides meaning for quality aspects, like price, availability, level of trust etc. Service selection may be based on such characteristics. ○ Execution semantics - it provides details like preconditions and effects of service invocation, conversation patters of service invocation etc Grid ● term introduced in 1998 by Carl Kesselman and Ian Foster in book "The Grid: Blueprint for a New Computing Infrastructure" ● analogy to electrical power grid ● "A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities." ● in 2001 in article "The Anatomy of the Grid" added Virtual Organizations What is The Grid ? ● coordinates resources that are not subject to centralized control ● using standard, open, general-purpose protocols and interfaces ● to deliver nontrivial qualities of service. Grid Usage ● high performance computing (HPC) ○ research of medical drugs ○ gravitation waves research ○ earthquake prediction ○ electronic chip engineering ○ ... ● large data ○ Large Hadron Collider in CERN ● expensive scientific instruments ○ large microscope in Japan ● remote cooperation ○ teleconferences, remote surgery, ... Grid Middleware ● not a single middleware ○ in U.S.A. Globus ○ in Europe gLite ○ in Germany UNICORE ● services ○ information services (Globus: MDS, gLite: BDII) ○ gridFTP - striped transfer, third party transfer ○ resource allocation (Globus: GRAM, gLite: WMS) ○ virtual organization membership (VOMS) ● Computing Element ○ grid gate, batch system, cluster of worker nodes ● Storage Element ○ disk servers, disk arrays, tape storage Grid Security ● based on X509 certificates and PKI (Public Key Infrastructure) ● list of selected grid CA (Certification Authorities) maintained by IGTF (International Grid Trust Federation) resp. EUGridPMA ● allows so-called proxy certificates ○ short-lived (24 hours, 1 week) ○ a certificate signed by a user certificate or proxy cert. ○ can be delegated to a running job ● VOMS (Virtual Organisation Membership Service) issues attribute certificates specifying user privileges on resources European Grid History ● in 2001-2003 project DataGrid ○ for processing massive data produced by Large Hadron Collider in CERN ● in 2004-2010 projects EGEE I, II, III ● in 2010-2014 EGI (European Grid Infrastructure) built in project InSPIRE ● EGI in December 2014 ○ 486514 CPUs ○ 296 PB disk storage ● EGI consists of NGIs (National Grid Infrastructures) ● Czech NGI is MetaCentrum, operated by CESNET, collects 10712 CPUs, 10TB storage Cloud Computing ● use of computing resources (hardware and software) that are delivered as a service over a network ● in 1960 utility computing ● in 2006 Amazon released AWS (Amazon Web Service) ○ EC2 (Elastic Compute Cloud) ○ S3 (Simple Storage Service) Cloud definition ● cloud computing is a general term for anything that involves delivering hosted services over the Internet ● definiton by NIST (National Institute of Standards and Technology, U.S. Department of Commerce): Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. ● five essential characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service Cloud Service Models ● Software as a Service (SaaS) ● Platform as a Service (PaaS) ● Infrastructure as a Service (IaaS) SaaS - Software as a Service ● best known to computer users, the only one they directly use, provides device independence ● examples: ○ web mail – GMail, Hotmail ○ social networking and messaging – Facebook, Google+, Twitter ○ on-line office suites – Google Docs, Microsoft Office 365 ○ file services – Dropbox, Google Drive, Microsoft OneDrive, ownCloud ○ image libraries – Picasa, Flickr ○ video libraries – YouTube, Vimeo ○ communication tools – Adobe Connect, WebEx ○ business software – Salesforce, NetSuite PaaS - Platform as a Service ● platform is a software environment used to develop and run applications ● not visible to end users, targeted to application developers and maintainers delivering their SaaS applications ● examples: ○ Google App Engine (provides PHP, Python, Java, Go) ○ Amazon Elastic Beanstalk (provides Ruby, PHP, Python, .NET, Java, JavaScript) ○ Heroku (provides Ruby, PHP, Python, Java, JavaScript, Perl) ○ Microsoft Azure Websites (provides PHP, Python, .NET, JavaScript) ○ Red Hat OpenShift (provides Ruby, Python, PHP, JavaScript, Perl, Java, Haskell, .NET) IaaS - Infrastructure as a Service ● provides a virtual data center ● IaaS provider provides virtual machines (VMs) with complete operating systems ● many VMs can be hosted on a single physical machine running hypervisor software (Xen, KVM, VMWare) ● resources hired from an IaaS cloud can be used directly (e.g. on-demand movie rendering) or as a layer under a PaaS or SaaS cloud IaaS Providers Provide ● disk images with pre-installed popular operating systems (various versions of Linux, MS-Windows) ● networking services - virtual local area networks, virtual private networks, IP addresses, firewalls, load balancers, domain name service (DNS) ● storage services - virtual block storage, file storage, object storage, relational database storage,no-SQL storage, tape archive storage, content delivery network (CDN) IaaS Examples ● providers: ○ Amazon Elastic Compute Cloud ○ Google Compute Engine ○ Microsoft Azure ○ Rackspace Cloud Servers ● software: ○ OpenNebula ○ OpenStack ○ Eucalyptus ○ VMware vCloud Suite Cloud Service Models Summary ● Software-as-a-Service model provides on-demand access to software, either as downloadable code executed on client computers, or through remote API calls to code executed on servers ● Platform-as-a-Service model provides on-demand software environment for deploying applications. The environment includes concrete programming languages, their specific libraries, and additional services like SQL and no-SQL storage. PaaS cloud is usually used as a layer under SaaS cloud services. ● Infrastructure-as-a-Service model provides on-demand resources from a virtual data center. The resources can be used directly or as a layer under PaaS or SaaS cloud services. Masaryk University / CERIT-SC Cloud ● private IaaS cloud ● based on OpenNebula with KVM hypervisors ● disk images with Debian Linux, CentOS, SciLinux, and MS-Windows ● archive in the form of HSM (Hierarchical Storage Management) with layers ○ RAID disk arrays ○ massive arrays of idle disks (MAIDs) ○ magnetic tapes ● in April 2015 - 2940 CPUs in 214 machines That’s it. Thank you for your attention