Linked data and Topic Maps Lecture 3 at Masaryk University Content –ontology modelling revisitet –interoperability recapitulated –the data silo problem –linked data –Topic Maps 20 NOVEMBER, 2014 Ontology modelling in RDF, things to remember –classes represent the collection of objects or individuals –classes may contain other classes –instances can belong to multiple classes –sometimes things are better represented as properties... – –Start with creating the class hierarchy! 20 NOVEMBER, 2014 20 NOVEMBER, 2014 The Semantic Web –Facilitates –data sharing –data merging –data reuse 20 NOVEMBER, 2014 Challenges to the Web –too much noise –internal systems with bad communication capabilities (data silos) –large costs of communication 20 NOVEMBER, 2014 Silo problem 20 NOVEMBER, 2014 Tim Berners-Lee on the next Web 20 NOVEMBER, 2014 Linked data - bottom up approach to the semantic web –Solve the silo problem by: –making your data available in RDF format (as SPARQL end points) –linking them to other existing data 20 NOVEMBER, 2014 The basics of linked data 1.Use URIs to identify things. 2.Use HTTP URIs so that people can look up those names. 3.When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4.Include links to other URIs. so that they can discover more things – –Tim Berners-Lee (2006). Linked data - design issues 20 NOVEMBER, 2014 1. Use URIs as names for things –Unique identifiers are essential for finding and referring to "things" –URL - Uniform Resource Locator, known as a "web address", the same content may exist on several addresses –URN - Uniform Resourse Name, less well known. States the unique "name" of a resource. ISBN is a commonly used example 20 NOVEMBER, 2014 2. Use http URIs so that people can look up those names –Use the Web's standard protocol for hypertext transfer rather than other naming scheme. –facilitates access to resource describing the "thing" –secures decentralized resource management –enables "dereferencing" 20 NOVEMBER, 2014 Dereferencing URIs and content negotiation 1.lookup URI a.URI identifies the wanted information resource 1.server sends an http response code 200 as well as the resource b.URI identifies a resource as a non-information resource 1.servers sends an http response code 303 along with a URI to a description of a resource representing the non-information resource 2.client asks to get the representation resource in specified format, e.g. XML/RDF-format 3.server sends client the RDF/XML-document – –See more on How to publish Linked Data on the Web 20 NOVEMBER, 2014 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) –An Excel spreadsheet or a scanned PDF is better than nothing, but using standards is better 20 NOVEMBER, 2014 4. Include links to other URIs so that they can discover more things –If there are no links the data will remain in their silos... 20 NOVEMBER, 2014 Levels of linked open data (one to five stars) 1.Data openly available on the web 2.Data available in a machine-readable format 3.Data available in a non-proprietary format 4.Data available in W3C-format (RDF and SPARQL) 5.Data linked to other people's data 20 NOVEMBER, 2014 RDF revisited – Nils Pharo Semantic Web –equals: –<@prefix dc: . <@prefix rdf: . dc:Creator "Nils Pharo" ; dc:Subject "Semantic Web" . – –This is valid RDF, but it does not link with other data sets – 20 NOVEMBER, 2014 Linked data means connecting data sets! – 20 NOVEMBER, 2014 – 20 NOVEMBER, 2014 [USEMAP] Example 20 NOVEMBER, 2014 Three types of RDF links –relationship links –identity links –vocabulary links 20 NOVEMBER, 2014 Relationshiop links –purpose is to reuse other information sources to enrich your data – – 20 NOVEMBER, 2014 Identity links –Coupling resources that speak about the same things, establish URI aliases – – – –explicates different opinions –facilitates traceability –secures robustness 20 NOVEMBER, 2014 Vocabulary links –Basic principle: reuse existing vocabularies instead of creating your own! –Describe new vocabularies in RDFS/OWL –create mappings between terms from different vocabularies – –owl:equivalentClass can be used to map between classes 20 NOVEMBER, 2014 Vocabulary links example 20 NOVEMBER, 2014 Linked data in practice, challenges –The linked data restructuring cycle – 20 NOVEMBER, 2014 Conversion problems, example 20 NOVEMBER, 2014 Philosophy of open data –certain data should be freely available to anyone –non-textual data such as: –maps, formulaes, genomes –textual data –government data, facts, public library records? 20 NOVEMBER, 2014 Open data arguments –data belong to the human race –data was funded by public money –data was created by government –facts cannot legally be copyrighted –openness accellerates progress – –See also Wikipedia 20 NOVEMBER, 2014 The Linking open data project –goal: to convert data that are available under open licences to RDF –The state of the LoD cloud 2014 20 NOVEMBER, 2014 Exploring linked open data –Swoogle –Semantic Web search engines 20 NOVEMBER, 2014 Library data as linked open data –descriptive bibliographic metadata (title, edition and document languages) –authority control (author and title languages) –content indexing (subject languages) 20 NOVEMBER, 2014 Using RDF to represent subject languages - SKOS –"SKOS is an area of work developing specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web.“ –Simple Knowledge Organisation System –vocabulary for expressing controlled vocabularies in RDF –can be used for modelling ontologies up to thesaurus "level" –W3C Recommendation, 18. August 2009 –SKOS home page 20 NOVEMBER, 2014 SKOS concept example – – – abattoirsslaughterhouseabatoirsabbatoirsabbattoirs – – 20 NOVEMBER, 2014 SKOS hiearchical relationship example – – – mammals – – animals – – 20 NOVEMBER, 2014 SKOS associative relationships example – – – birds – – ornithology – – 20 NOVEMBER, 2014 SKOS is concept-oriented –In contrast to termed-based thesauri, SKOS focus on concepts which may have several labels. In a standard thesaurus a term would relate to another term using relationships. See also the SKOS FAQ – –Examples –The Integrated Public Service Vocabulary –Library of Congress have made their subject headings available 20 NOVEMBER, 2014 SKOS used in a subject portal – 20 NOVEMBER, 2014 SKOS used in Libris – 20 NOVEMBER, 2014 [USEMAP] Characteristics of examples –a rich variety of name spaces –foaf, sub, dct, dc, libris, bibo 20 NOVEMBER, 2014 Using RDF to represent document and work languages –Many attempts have been made... –the bibliographic ontology –BIBFRAME –Schema.org with extensions 20 NOVEMBER, 2014 Bibliographic description - from records to graphs? –MARC has been used for describing bibliographic records for a long time –international standard since 1973 –billions of MARC records exist –developed in the pre-web world 20 NOVEMBER, 2014 BIBO - the bibliographic ontology –BIBO specification –primarily developed for handling citations and bibliographic references –can be used for simple bibliographic description 20 NOVEMBER, 2014 BIBFRAME Model – 20 NOVEMBER, 2014 [USEMAP] BIBFRAME –bibframe.org –Library of Congress report –planned to be successor of MARC –reflects the FRBR model –develops a new namespace –Transformation tools 20 NOVEMBER, 2014 Schema.org's BIB extensions model –Schema.org - initiative for creating structured data embedded in web pages –supported by Google, Bing, Yahoo and Yandex –light weight, cors-sectoral schema which is extensible for new domains, thus –compatible with RDF –Report from OCLC 20 NOVEMBER, 2014 Discussion –What are the pros and cons of the BIBFRAME approach? 20 NOVEMBER, 2014 Topic maps –a standard for organising digital content –ISO certified in 2002 (ISO-standard 13250) –Used for structuring web sites and a large varity of knowledge management purposes – 20 NOVEMBER, 2014 Background –inspired by back-of-book indexes – –X – –XML. See Extensible – Markup Language (XML) –XML Topic Maps (XTM) 61, – 72, 78 –XML web services. See services –XPointer 99 –XTM. See XML Topic Maps – (XTM) –XUL widgets – widgets 37 – See also Extensible User – interface Language – (XUL) – –From Passin (2004). The explorer's guide to the Semantic Web 20 NOVEMBER, 2014 Two-layer model 20 NOVEMBER, 2014 –(index) – Metadata layer – – – Information layer –(content) The information layer –Contains information (sic) - occurences –independent of location, format or form –not necessarily digital – 20 NOVEMBER, 2014 The metadata layer –contains topics and associations which ties topics together 20 NOVEMBER, 2014 simple topic map The basic elements of topic maps –Topic maps consist of –Topics –Associations –Occurrences –i.e. the TAO of topic maps (Pepper, 2002) 20 NOVEMBER, 2014 Occurences link the two layers – 20 NOVEMBER, 2014 W:\bibin\digdok\topic_maps2.png Topic maps and ontologies –an ontology contains the topic map's input –minimum requirement: some topics with associations – 20 NOVEMBER, 2014 Topic types –Topics can be typed; Hamsun is an author, Vågå is a place, Hunger is a title. This is comparable to classes in RDF –Topic types are also topics 20 NOVEMBER, 2014 Association types –Associations types: Hamsun was born in Vågå, Hunger was written by Hamsun –Cf properties in RDF –Association types are also topics – 20 NOVEMBER, 2014 Occurence types –Occurence types: – – is a biography of Hamsun. is an encyclopedia article – –Occurence types are also topics – 20 NOVEMBER, 2014 Topic map sytaxes –HyTM - SGML based –XTM - XML Topic Maps –CTM - Compact syntax for Topic Maps – 20 NOVEMBER, 2014 XTM example 1 – Nils Pharo Nils has worked at Oslo UC since 1997 20 NOVEMBER, 2014 XTM example 2 – Oslo University College 20 NOVEMBER, 2014 XTM example 3 – 20 NOVEMBER, 2014 Characteristics of topic maps –topics have names –topics are knit together using associations –a topic may be categorised by an unlimited number of topic types –topic types are topics on a higher level of abstraction –association types are association on a higher level of abstraction –occurrences may be external or internal to the topic map –topics can be disambiguated using subject indicators – 20 NOVEMBER, 2014 Topic and topic types –a topic type defines a class/category of things –topic types need instances! –domain dependent –Choose an appropriate level of generality –"Countries" is better than "Countries in South-East Asia" –The domain of the topic map tells you which countries it includes –Don't make it too general! – 20 NOVEMBER, 2014 Type hierarchies –topic types can be arranged in hierarchies –subtype/supertype hierarchies have a specific syntax –If A is a superclass of B, then –Both A and B must be classes –If C is an instance of B, it must also be an instance of A –If C is a subclass of B, it must also be a subclass of A, (in which case an instance of C is also an instance of B and an instance of A) 20 NOVEMBER, 2014 Topic maps and RDF –"Surface" level similarities –both are standards for ontology modelling –both use XML –both use URIs for securing identity –both have constraint and query languages – –Important differences –they are optimized for different purposes; TM for human reading, RDF for machine processing –RDF is document centric whereas TM is subject centric – – 20 NOVEMBER, 2014 –Thank you! – –Any questions? 20 NOVEMBER, 2014