Linked data and Topic Maps
Lecture 3 at Masaryk University
Content
–ontology modelling revisitet
–interoperability recapitulated
–the data silo problem
–linked data
–Topic Maps
20 NOVEMBER, 2014
Ontology modelling in RDF, things to remember
–classes represent the collection of objects or individuals
–classes may contain other classes
–instances can belong to multiple classes
–sometimes things are better represented as properties...
–
–Start with creating the class hierarchy!
20 NOVEMBER, 2014
20 NOVEMBER, 2014
The Semantic Web
–Facilitates
–data sharing
–data merging
–data reuse
20 NOVEMBER, 2014
Challenges to the Web
–too much noise
–internal systems with bad communication capabilities (data silos)
–large costs of communication
20 NOVEMBER, 2014
Silo problem
20 NOVEMBER, 2014
Tim Berners-Lee on the next Web
20 NOVEMBER, 2014
Linked data - bottom up approach to the semantic web
–Solve the silo problem by:
–making your data available in RDF format (as SPARQL end points)
–linking them to other existing data
20 NOVEMBER, 2014
The basics of linked data
1.Use URIs to identify things.
2.Use HTTP URIs so that people can look up those names.
3.When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
4.Include links to other URIs. so that they can discover more things
–
–Tim Berners-Lee (2006). Linked data - design issues
20 NOVEMBER, 2014
1. Use URIs as names for things
–Unique identifiers are essential for finding and referring to "things"
–URL - Uniform Resource Locator, known as a "web address", the same content may exist on several
addresses
–URN - Uniform Resourse Name, less well known. States the unique "name" of a resource. ISBN is a
commonly used example
20 NOVEMBER, 2014
2. Use http URIs so that people can look up those names
–Use the Web's standard protocol for hypertext transfer rather than other naming scheme.
–facilitates access to resource describing the "thing"
–secures decentralized resource management
–enables "dereferencing"
20 NOVEMBER, 2014
Dereferencing URIs and content negotiation
1.lookup URI
a.URI identifies the wanted information resource
1.server sends an http response code 200 as well as the resource
b.URI identifies a resource as a non-information resource
1.servers sends an http response code 303 along with a URI to a description of a resource
representing the non-information resource
2.client asks to get the representation resource in specified format, e.g. XML/RDF-format
3.server sends client the RDF/XML-document
–
–See more on How to publish Linked Data on the Web
20 NOVEMBER, 2014
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
–An Excel spreadsheet or a scanned PDF is better than nothing, but using standards is better
20 NOVEMBER, 2014
4. Include links to other URIs so that they can discover more things
–If there are no links the data will remain in their silos...
20 NOVEMBER, 2014
Levels of linked open data (one to five stars)
1.Data openly available on the web
2.Data available in a machine-readable format
3.Data available in a non-proprietary format
4.Data available in W3C-format (RDF and SPARQL)
5.Data linked to other people's data
20 NOVEMBER, 2014
RDF revisited
–
Nils Pharo
Semantic Web
–equals:
–<@prefix dc: .
<@prefix rdf: .
dc:Creator "Nils Pharo" ;
dc:Subject "Semantic Web" .
–
–This is valid RDF, but it does not link with other data sets
–
20 NOVEMBER, 2014
Linked data means connecting data sets!
–
20 NOVEMBER, 2014
–
20 NOVEMBER, 2014
[USEMAP]
Example
20 NOVEMBER, 2014
Three types of RDF links
–relationship links
–identity links
–vocabulary links
20 NOVEMBER, 2014
Relationshiop links
–purpose is to reuse other information sources to enrich your data
–
–
20 NOVEMBER, 2014
Identity links
–Coupling resources that speak about the same things, establish URI aliases
–
–
–
–explicates different opinions
–facilitates traceability
–secures robustness
20 NOVEMBER, 2014
Vocabulary links
–Basic principle: reuse existing vocabularies instead of creating your own!
–Describe new vocabularies in RDFS/OWL
–create mappings between terms from different vocabularies
–
–owl:equivalentClass can be used to map between classes
20 NOVEMBER, 2014
Vocabulary links example
20 NOVEMBER, 2014
Linked data in practice, challenges
–The linked data restructuring cycle
–
20 NOVEMBER, 2014
Conversion problems, example
20 NOVEMBER, 2014
Philosophy of open data
–certain data should be freely available to anyone
–non-textual data such as:
–maps, formulaes, genomes
–textual data
–government data, facts, public library records?
20 NOVEMBER, 2014
Open data arguments
–data belong to the human race
–data was funded by public money
–data was created by government
–facts cannot legally be copyrighted
–openness accellerates progress
–
–See also Wikipedia
20 NOVEMBER, 2014
The Linking open data project
–goal: to convert data that are available under open licences to RDF
–The state of the LoD cloud 2014
20 NOVEMBER, 2014
Exploring linked open data
–Swoogle
–Semantic Web search engines
20 NOVEMBER, 2014
Library data as linked open data
–descriptive bibliographic metadata (title, edition and document languages)
–authority control (author and title languages)
–content indexing (subject languages)
20 NOVEMBER, 2014
Using RDF to represent subject languages - SKOS
–"SKOS is an area of work developing specifications and standards to support the use of knowledge
organization systems (KOS) such as thesauri, classification schemes, subject heading systems and
taxonomies within the framework of the Semantic Web.“
–Simple Knowledge Organisation System
–vocabulary for expressing controlled vocabularies in RDF
–can be used for modelling ontologies up to thesaurus "level"
–W3C Recommendation, 18. August 2009
–SKOS home page
20 NOVEMBER, 2014
SKOS concept example
–
–
–
– abattoirs
– slaughterhouse
– abatoirs
– abbatoirs
– abbattoirs
–
–
–
20 NOVEMBER, 2014
SKOS hiearchical relationship example
–
–
–
– mammals
–
–
–
–
– animals
–
–
–
–
20 NOVEMBER, 2014
SKOS associative relationships example
–
–
–
– birds
–
–
–
–
– ornithology
–
–
–
–
20 NOVEMBER, 2014
SKOS is concept-oriented
–In contrast to termed-based thesauri, SKOS focus on concepts which may have several labels. In a
standard thesaurus a term would relate to another term using relationships. See also the SKOS FAQ
–
–Examples
–The Integrated Public Service Vocabulary
–Library of Congress have made their subject headings available
20 NOVEMBER, 2014
SKOS used in a subject portal
–
20 NOVEMBER, 2014
SKOS used in Libris
–
20 NOVEMBER, 2014
[USEMAP]
Characteristics of examples
–a rich variety of name spaces
–foaf, sub, dct, dc, libris, bibo
20 NOVEMBER, 2014
Using RDF to represent document and work languages
–Many attempts have been made...
–the bibliographic ontology
–BIBFRAME
–Schema.org with extensions
20 NOVEMBER, 2014
Bibliographic description - from records to graphs?
–MARC has been used for describing bibliographic records for a long time
–international standard since 1973
–billions of MARC records exist
–developed in the pre-web world
20 NOVEMBER, 2014
BIBO - the bibliographic ontology
–BIBO specification
–primarily developed for handling citations and bibliographic references
–can be used for simple bibliographic description
20 NOVEMBER, 2014
BIBFRAME Model
–
20 NOVEMBER, 2014
[USEMAP]
BIBFRAME
–bibframe.org
–Library of Congress report
–planned to be successor of MARC
–reflects the FRBR model
–develops a new namespace
–Transformation tools
20 NOVEMBER, 2014
Schema.org's BIB extensions model
–Schema.org - initiative for creating structured data embedded in web pages
–supported by Google, Bing, Yahoo and Yandex
–light weight, cors-sectoral schema which is extensible for new domains, thus
–compatible with RDF
–Report from OCLC
20 NOVEMBER, 2014
Discussion
–What are the pros and cons of the BIBFRAME approach?
20 NOVEMBER, 2014
Topic maps
–a standard for organising digital content
–ISO certified in 2002 (ISO-standard 13250)
–Used for structuring web sites and a large varity of knowledge management purposes
–
20 NOVEMBER, 2014
Background
–inspired by back-of-book indexes
–
–X
–
–XML. See Extensible
– Markup Language (XML)
–XML Topic Maps (XTM) 61,
– 72, 78
–XML web services. See services
–XPointer 99
–XTM. See XML Topic Maps
– (XTM)
–XUL widgets
– widgets 37
– See also Extensible User
– interface Language
– (XUL)
–
–From Passin (2004). The explorer's guide to the Semantic Web
20 NOVEMBER, 2014
Two-layer model
20 NOVEMBER, 2014
–(index)
– Metadata layer
–
–
– Information layer
–(content)
The information layer
–Contains information (sic) - occurences
–independent of location, format or form
–not necessarily digital
–
20 NOVEMBER, 2014
The metadata layer
–contains topics and associations which ties topics together
20 NOVEMBER, 2014
simple topic map
The basic elements of topic maps
–Topic maps consist of
–Topics
–Associations
–Occurrences
–i.e. the TAO of topic maps (Pepper, 2002)
20 NOVEMBER, 2014
Occurences link the two layers
–
20 NOVEMBER, 2014
W:\bibin\digdok\topic_maps2.png
Topic maps and ontologies
–an ontology contains the topic map's input
–minimum requirement: some topics with associations
–
20 NOVEMBER, 2014
Topic types
–Topics can be typed; Hamsun is an author, Vågå is a place, Hunger is a title. This is comparable
to classes in RDF
–Topic types are also topics
20 NOVEMBER, 2014
Association types
–Associations types: Hamsun was born in Vågå, Hunger was written by Hamsun
–Cf properties in RDF
–Association types are also topics
–
20 NOVEMBER, 2014
Occurence types
–Occurence types:
–
– is a biography of
Hamsun. is an encyclopedia article
–
–Occurence types are also topics
–
20 NOVEMBER, 2014
Topic map sytaxes
–HyTM - SGML based
–XTM - XML Topic Maps
–CTM - Compact syntax for Topic Maps
–
20 NOVEMBER, 2014
XTM example 1
–
Nils Pharo
Nils has worked at Oslo UC since 1997
20 NOVEMBER, 2014
XTM example 2
–
Oslo University College
20 NOVEMBER, 2014
XTM example 3
–
20 NOVEMBER, 2014
Characteristics of topic maps
–topics have names
–topics are knit together using associations
–a topic may be categorised by an unlimited number of topic types
–topic types are topics on a higher level of abstraction
–association types are association on a higher level of abstraction
–occurrences may be external or internal to the topic map
–topics can be disambiguated using subject indicators
–
20 NOVEMBER, 2014
Topic and topic types
–a topic type defines a class/category of things
–topic types need instances!
–domain dependent
–Choose an appropriate level of generality
–"Countries" is better than "Countries in South-East Asia"
–The domain of the topic map tells you which countries it includes
–Don't make it too general!
–
20 NOVEMBER, 2014
Type hierarchies
–topic types can be arranged in hierarchies
–subtype/supertype hierarchies have a specific syntax
–If A is a superclass of B, then
–Both A and B must be classes
–If C is an instance of B, it must also be an instance of A
–If C is a subclass of B, it must also be a subclass of A,
(in which case an instance of C is also an instance of B
and an instance of A)
20 NOVEMBER, 2014
Topic maps and RDF
–"Surface" level similarities
–both are standards for ontology modelling
–both use XML
–both use URIs for securing identity
–both have constraint and query languages
–
–Important differences
–they are optimized for different purposes; TM for human reading, RDF for machine processing
–RDF is document centric whereas TM is subject centric
–
–
20 NOVEMBER, 2014
–Thank you!
–
–Any questions?
20 NOVEMBER, 2014