Introduction to Digital Libraries
and their Technologies
Miroslav Bartošek
Institute of Computer Science MU
Library and Information Centre
DLs and their Technologies 2
Picture Quiz: Can you recognise …
DLs and their Technologies 3
DLs and their Technologies 4
DLs and their Technologies 5
DLs and their Technologies 6
All is about INFORMATION!
DLs and their Technologies 7
What do all these system have
in common?
Topics
1. Introduction to Digital Libraries (DLs)
2. Architecture of DLs
3. Identifiers
4. Metadata
5. Interoperability
6. Searching
7. Economy and Legislation
8. Digital Preservation
9. DLs@MU
DLs and their Technologies 8
1. Introduction to Digital Libraries
DLs and their Technologies 9
Logo from www.ncstrl.org
1. DL Introduction
1.1 DL definition
1.2 DL Examples
1.3 DLs versus WEB
1.4 DLs versus Libraries
1.5 DL’s History
1.6 Literature
DLs and their Technologies 10
1.1 DL “definition”
DLs and their Technologies
Computer scientist’s view:
• Digital library is a managed collection of information, with associated
services, where the information is stored in digital formats and accessible
over a network. W.Y.Arms, 2000
‐ maintained collection
‐ services
‐ distant access
• Focused collection of digital objects, including text, video, and audio,
along with methods for access and retrieval, and for selection,
organization, and maintenance. I.W.Witten, 2002
‐ digital content(text, video, audio, 3D, simulation, dynamic visuzalization)
‐ user (access and retrieval)
‐ „librarian“ (selection, organization, and maintenance)
11
1.1 DL “definition”
DLs and their Technologies
Librarian’s view:
• Digital libraries are organizations that provide the resources, including the
specialized staff, to select, structure, offer intellectual access to, interpret,
distribute, preserve the integrity of, and ensure the persistence over time
of collections of digital works so that they are readily and economically
available for use by a defined community or set of communities
US Digital Library Federation, 1997
‐ DL as an „institution“ (a library, for example)
‐ organization of information and services
‐ aimed at a defined user community
12
1.1 DL “definition”
DLs and their Technologies
Archivist’s view:
• DL = the infrastructure, policies and procedures, and organisational,
political and economic mechanisms necessary to enable access to and
proservation of digital content. Ross, 2003
• DL as a preservation infrastructure / archives
13
1.1 DL “definition”
• A digital library is an online collection of digital objects, of assured quality,
that are created or collected and managed according to internationally
accepted principles for collection development and made accessible in a
coherent and sustainable manner, supported by services necessary to
allow users to retrieve and exploit the resources.
IFLA/UNESCO Digital Library Manifesto, 2011
International Federation
of Library Associations
and Institutions
DLs and their Technologies 14
1.1 DL General Features
DLs and their Technologies
• Organization of information is the key
• Not a single closed entity (‐> DLs)
• Heterogeneous, dynamic, multimedial information resources
• Interconnection of autonomous units
• Transparent interconnection
• Coherent access regardless of
– forms
– formats
– locations
• Long‐term preservation
15
1.2 DL Example (1)
DLs and their Technologies
American Memory Library of Congress
• Digitization „Apollo project“
(pilot 1990‐1994)
• 120+ historical collections
• > 10 milions of digital objects
• Books, photos, manusripts,
audio, video, maps
http://memory.loc.gov/
16
1.2 DL Example (2)
DLs and their Technologies
JSTOR Journal Storage
• DL of Academic Journals (founded 1995)
• Problems with printed journals in libraries
(cost, space, incompleteness, preservat.)
• Idea: digitize all our core journals
from 1st issue to … moving wall
• Non‐profit organization, Mellon F grant
(Ann Arbor, New York)
• 1900 digitized journals, 900 publishers
• 20 subject collections (arts, sci, soc‐sci)
• Sustainable economic model
• 8.000+ institutions from160 countries
http://www.jstor.org/
17
1.2 DL Example (3)
DLs and their Technologies
Europeana
• EU digital platform for cultural heritage
• Intitiated and supported by EC
• 2010 first version
• 2016:
• 3.500 contributing institutions
Libraries, museums, archives, galeries…
• 54 mil of objects
– 30 pictures, 22 texts, audio, video
• Metadata, thumbnail + link to resource
http://europeana.eu/
18
1.3 DL x Web
DLs and their Technologies
• Why DL?
We have the Web and there is „all“ !
• Is the Web a digital library ?
19
1.3 The web is great, but…
DLs and their Technologies
• Huge amount of information, easy access to anybody
• Unified technology
• Continuous exciting development
• And much more…
But:
– Advanced and Non‐textual search
– Rights Management
– Permanent Availability
– Authenticity
– Quality Control
?
20
1.4 DLs x Libraries
DLs and their Technologies
Common features
• Systematically built collection of data objects
• Metadata structures (catalogues, indexes)
• Services tailored to designated user community
• Thematic focus
• Quality Control
• Long‐Term Storage (centuries – in libraries)
Library of Alexandria: Egypt ‐ Ptolemaics, 295 BC ‐‐ ??
700.000 papyrus scrolls (originals from Euripides, Aeschylus,
Sophocles, Archimedes, Euclid…)
21
1.4 Advantages of libraries
DLs and their Technologies
• Centuries old tradition in organization and access to info
• Worldwide standards
• Elaborated system of libraries
• Established legislation
• Well‐balanced system of all key players
authors – publishers – libraries – users
22
1.4 Transformation to DLs
DLs and their Technologies
• „Paper libraries would disappear by 1984.“
Arthur Samuel (1964, The Banishment of the Paperwork.)
• „Some say that had books been invented after computers
were, they would have been hailed as a great advance.“
Ian H.Witten (2002, How to Build a Digital Library.)
• Transformation to DLs
– It’s not just the technological issue
– Human beings & social environment are main obstacles
23
1.5 DL’s History
DLs and their Technologies
• 1945 Vannevar Bush („As We May Think“, Memex)
• 1965 J.C.R. Licklieder („Libraries of the Future“)
• 60’s MARC, OPAC (LoC, OCLC)
• 80 ’s fulltexts
• 90 ’s Computing + Communications + Contents
(price, performance, availability
• 1994 Digital Library Initiative , www
24
1.6 Literature
DLs and their Technologies
• William. Y. Arms: Digital Libraries. MIT Press, 1999, 2000, 2001
Online edition (2005) available at http://www.cs.cornell.edu/wya/diglib/
• I.H.Witten: How to Build a Digital Library. Morgan Kaufmann Publ. 2002, 2010
• Michael Lesk: Understanding Digital Libraries. Morgan Kaufmann, 2nd ed. 2004
25
2. Architecture of DL
DLs and their Technologies 26
2. Architecture of DLs
2.1 Reference Models
2.2 Kahn‐Wilensky Framework
2.3 DL.org Model
DLs and their Technologies 27
2.1 Reference Models
DLs and their Technologies
Reference Model = general architecture (framework)
– Provision of a unified vocabulary (terms)
– Formalizing components and functions (semantics)
– Understanding important relationships
between entities in a particular environment
• Software implementation
• Development of standards
• Education
28
2.1 Kahn‐Wilensky Framework
• First informal model/architecture for the DL
• R.Kahn, R.Wilenski: A Framework for Distributed Digital Object
Services, Uni Berkeley, CS‐TR project, ARPA, 1995
http://www.cnri.reston.va.us/home/cstr/arch.html
• Digital object; Identification system (handles); Repository;
Services
• Implementated in FEDORA
DLs and their Technologies 29
2.1.1 Digital Object
DLs and their Technologies 30
metadata
data
2.1.2 Composed DO, Meta DO
DLs and their Technologies
Composed DO Meta DO
Example: DO=book
metadata
…
data
DO for page1
DO for page2
…
Example: DO=music composition
metadata
…
data
id for DO score
id for DO audio recording
id for DO performance TV record
31
2.1.3 Repository
• Takes care of
the DOs
stored in it
• RAP
Repository
Access
Protocol
DLs and their Technologies 32
2.1.4 DL Components
DLs and their Technologies
=> rendered DOUser interface4. display
=> digital object=> Repository ‐ RAPUser interface
=> repository ID=> Handle systemUser interface3. retrieve
=> handle=> ItemUser interface2. select
=> list of items=> Search systemUser interface1. search
33
2.2 DL.org Model
Formal DL model (2011). Result of several EU‐funded projets:
– DELOS
Network of Excellence on Digital Libraries 2004‐2008
http://www.delos.info
– DL.org
Coordination Action projekt EC 2008‐2011
Digital Library Interoperability, Best Practices & Modelling
Foundations
http://www.dlorg.eu
DLs and their Technologies 34
2.2 Documentation
DLs and their Technologies
Set of documents
– Digital Library Manifesto
– Digital Library Reference Model
– Digital Library Technology and Methodology Cookbook
– Digital Library Conformance Checklist
Digital Library Reference Model (very extensive document)
http://www.dlorg.eu/index.php/outcomes/reference‐model
Concise booklets:
– DL Reference Model in Nutshell (16 pages only)
http://www.dlorg.eu/uploads/Booklets/booklet21x21_nutshell_web.pdf
– Digital Library Manifesto
http://www.dlorg.eu/uploads/Booklets/booklet21x21_manifesto_web.pdf
– Digital Library Cookbook
http://www.dlorg.eu/uploads/Booklets/booklet21x21_cookbook.pdf
35
2.2 DELOS – DL vision
DLs and their Technologies
• Digital libraries should enable any citizen to access all
human knowledge anytime and anywhere, in a friendly,
multi‐modal, efficient, and effective way, by overcoming
barriers of distance, language, and culture and by using
multiple Internet‐connected devices
36
2.2 DL Domains
DLs and their Technologies 37
Example: Resource
DLs and their Technologies 38
Example: Information Object
DLs and their Technologies 39
Example: User
DLs and their Technologies 40
Example: Architecture
DLs and their Technologies 41
Design and implementation of DL
DLs and their Technologies 42
3. Identifiers
DLs and their Technologies 43
3. Identifiers
3.1 Introduction, properties of IDs
3.2 Classic Library IDs
ISBN, ISSN, SICI, …, ISTC, ISNI, …
3.3 Digital IDs
URN, PURL, Handles, DOI, ARK
DLs and their Technologies 44
3.1 Identifiers
• If there is one thing that distinquishes a digital library from a mere web
site, it is that libraries do their best to provide reliable, persistent access
through durable links. (J.A.Kunze, California Digital Library)
• Identifiers
– Unique names
– Basic building blocks keeping/binding things together
• Local x Global identifiers
• Eliminating physical contact = higher need for identification
– Precision
– Reliability
– (machine) Linking
DLs and their Technologies 45
3.1 Properties of identifiers
1. Form
(structured, dumb, computable)
2. Uniqueness (global)
(central / distributed assignement)
3. Persistency
(future validity and interoperability)
4. Resolution (action)
(machine system providing for ID its DO, clicable)
DLs and their Technologies 46
3.1 Hierarchical system of IDs
DLs and their Technologies 47
No one universal ID for everything => hierarchical system
– Organisations (library)
ISIL International Standard Identifier for Libraries and Related Organizations
– Collection, service
ISCI Intl Standard Collection Identifier
– Author
ISNI Intl Standard Name Identifier
– Work
ISTC Intl Standard Text Code
ISWC Intl Standard Musical Work Code
– Manifestation of work
ISBN Intl Standard Book Number
ISSN Intl Standard Serial Number
ISMN Intl Standard Music Number
– Component/article
SICI Serial Item and Contribution Identifier
DOI Digital Object Identifier
Interantional standards
(mostly ISO)
3.2 Examples: ISBN
• International Standard Book Number ISBN 80‐00‐01987‐6
• Classic library identifier, ISO standard since 1972
• Structured id, fixed length, distributed assignment
• Invented for printed environment – very successful, heavily used, useful
(publishers, business, libraries, citations)
• BUT: Serious problems in digital environment
– Web publishing – rapid increase of id‐requirements
– Exhaustion of available number space!
– ISBN‐13: Temporary remedy (rapid fix) ‐‐ ISBN 978‐80‐00‐01987‐3
– New ISBN desperately needed
• It takes a long time to agree on a new global standard
• It will be very costly to implement it
DLs and their Technologies 48
3.3 Examples: SICI
• Serial Item Component Identifier – components of journal issue
• 0730‐9295(199206)11:2<168:CRFAOC>2.0.TX;2‐#
M.Needleman. Computing Resources for an on‐line catalog – 10 years later.
Information technology and libraries. 11(2), červen 1992, pp. 168‐175
• Computable id, interesting novel approach
• US ANSI standard since 1996
• BUT: Didn’t gain global acceptance
– Replaced by more successful rival – DOI
DLs and their Technologies 49
3.4 Filling gaps – new IDs
• ISNI – International Standard Name Identifier
– Global unique identifier for authors, ISO standard since 2012; 9,5 mil assigned
– ISNI 0000 0000 7988 7687 (Bartošek, Miroslav)
– RA: Registration Authority (ISNI International Agency)
– RAGs: Registration Agencies (currently 12 – British Library, Bibliotheque
Nationale de France, ...)
– ISNI metadata set
• ISTC – International Standard Textual Code
– Intellectual works/creations (expressed mainly in textual form)
– ISTC 0A9‐2002‐12B4A105‐6
– RA: International ISTC Agency (2008)
– RAGs: currently 8; 0,2 mil ISTC‐ids assigned
– Huge and costly task to identify all works worldwide – who will do that?
DLs and their Technologies 50
3.5 Digital IDs – URL, URN, PURL
• URL – most frequently used on the Web as a “identifier”, BUT:
– Uniform Resource Locator – identifies location, not an object!
– Not persistent (broken links – 404 Not Found)
• URN – conceptualy known but not deployed
• PURL – Persistent URL
– Pragmatical solution how to improve persistency of URL (OCLC)
1. PURL is URL
2. PURL refers to the location where the second URL is located, which refers
to the location where the object resides
DLs and their Technologies 51
object
location of object
PURL
URL
Location on purl‐server
3.6 Handles
• hdl:cnri.dlib/magazine , http://hdl.handle.net/10338.dmlcz/141708
• Implementation of handle cocept (Kahn‐Wilensky)
(CNRI – Corporation for National Research Initiatives, USA, since 1994)
• Used by DSpace (repository software), DOI (identifier), and many other…
• Main features:
– Independent of the URN concept
– Resolvable (own resolution system independent of DNS used by URL/URN)
• Either a direct resolution by plug‐in in the www‐browser
• or indirect resolution using URL‐proxy
– http://www.handle.net/
DLs and their Technologies 52
3.7 DOI – Digital Object Identifier
The most successful identification system for the digital environment today
• DOI:10.1006/123456
• Initiated by the Association of American Publishers
• Built on handles technology
• Self‐financing system – open, but not free (DOI allocation fee)
• System for identifying any entities (books, articles, research data, …)
• In operation since 2000, ISO standard since 2012
• 140 million allocated DOIs, over 20,000 institutions involved
• RA: International DOI Foundation
• RAGs: 10 currently (CrossRef – scientific articles, DataCite – research datasets)
http://www.doi.org , http://www.crossref.org
DLs and their Technologies 53
3.7 DOI – Digital Object Identifier
• DOI:10.1006/123456
• doi:10.1000/ISBN‐1‐900512‐44‐0
• doi:10.5817/AM2013‐1‐17
• structure:
– prefix (globally unique, assigned to registering organization by a RAG)
– suffix (locally unique string assigned by the RO)
• DOI metadata – to be filled‐in when registering DOI number in RAG Register
DLs and their Technologies 54
4. Metadata
DLs and their Technologies 55
4. Metadata
1.1 Introduction
4.2 Classic Library Metadata
MARC, UNIMARC
4.3 Digital Metadata
Dublin Core, MODS, METS, RDF, …
DLs and their Technologies 56
4.1 Metadata – introduction
• Metadata = (structured) data about resources
• Metadata consists of statements we make about resources to help us find,
identify, use, manage, evaluate, and preserve them
• 3 basic categories of metadata
– Descriptive – resource description (to find, identify, evaluate):
MARC, Dublin Core, MODS,
– Administrative – resource managemenet (technical, administrative,
preservation, rights management, …): PREMIS
– Structural – resource internal structure (parts, hierarchy): METS, RDF
• Metadata schema (standard) – selected set of metadata elements with
a defined meaning for use in a particular area (MARC, Dublin Core, TEI,
MODS, MADS, RDF, Premis, ...)
• XML – Markup language (encoding structured documents, e.g. metadata records)
DLs and their Technologies 57
DLs and their Technologies 58
http://jennriley.com/metadatamap/
Metadata Typology – Domains & Communities
DLs and their Technologies 59
Metadata Typology – Functions & Purposes
http://jennriley.com/metadatamap/
4.2 MARC Standard Family
• MARC = MAchine Readable Cataloguing record (Library of Congress, 1965)
• General structure of the bibliographic record (descriptive metadata for library
materials – books, serials, audio, video, authorities)
– Internal format in Library management systems
– Exchange format for transfer of records between LMSs
• Widely used – collaboration between libraries and different systems
– Record exchange
– Union catalogues
• Very reach structure (hundreds of elements and subelements!)
• The whole family of MARC‐based standards:
– USMARC, CANMARC, UKMARC, … ‐> MARC21
– UNIMARC (IFLA, 1977, first as a bridge between MARCs, later as a full independent format)
DLs and their Technologies 60
4.2 MARC element/subelement
DLs and their Technologies 61
• MARC record consists of variable length elements
• Each element may be subdivided in subfields (with repetitioning)
700 #1 $aNovák$bJan$f1953-element
subelement
Element
identifier
indicators Subelement
identifier
4.2 Example – UNIMARC record
DLs and their Technologies 62
001 CASLIN0000001
005 19960312
010 $a80-7050-237-1
100 $a19960305d1996####k##y0czey0103####ba
101 0# $acze
102 $aCZ
200 1# $aZáznam pro souborný katalog$eUNIMARC$iTištěné monografie
$fPracovní skupina CASLIN pro standardizaci a jmenné ...
205 $a1. vyd.
210 $aPraha$cNárodní knihovna České republiky$d1996
215 $a31 s.
225 1# $aStandardizace$vč. 4
675 $a025.3$9v
711 02 $aCASLIN$bPracovní skupina pro standardizaci a ...
801 #0 $aCZ$bABA001$c19960312$gAACR2$91
801 #3 $aCZ$bABA001$c19960515
910 $aABA001
4.2 MARC Summary
• Detailed cataloguing rules – AACR2, RDA (how to use the format)
• Sophisticated set of tools (LCSH, authority files, …)
• Fragmentation into many format variations
• UNIMARC – more advanced, MARC – more successful (LoC)
• Systematic development (responses to changes)
• Hundreds of millions of existing MARC records worldwide
(OCLC WorldCat – 34 mil) – great legacy/burden
• Expensive creation of records (50‐100 USD/record),
only for highly qualified users
• Very successful but complex format ‐‐ too complicated for wider use!
• For most applications, we need something more simple
DLs and their Technologies 63
4.3 Dublin Core
DLs and their Technologies 64
Motto:
• "The association of standardized descriptive metadata with networked objects has the
potential for substantially improving resource discovery capabilities by enabling field‐based
(e.g., author, title) searches, permitting indexing of non‐textual objects, and allowing access
to the surrogate content that is distinct from access to the content of the resource itself."
(Weibel and Lagoze, 1997)
• MARC maximalist approach
• DC minimalist approach
– Simple
(core for description of resources on the Web)
– Universal
(for any kind of resources)
– Easy to use
(selfcataloguing by web users)
4.3 Dublin Core – 15 elements
DLs and their Technologies 65
content ownership instantiation
Title Creator Identifier
Subject Contributor Date
Description Publisher Language
Coverage Rights Format
Type
Source
Relation
Audience (Provenance, Rights Holder)
• Each element is optional, repeatable, on the order of the elements does not matter
• General semantics given for each element (Title = name given to a resource)
• Syntax not given by the standard (recommendations – XML, HTML‐heading,…)
• Qualified version of DC for more precise description
– Creator.Illustrator, Date.Created, Date.Updated, Subject.Abstract
– Date=1994‐04‐12:ISO8601, Subject=5.34:UDC
4.3 Dublin Core in HTML
DLs and their Technologies 66
Guidance on expressing the Dublin Core within the RDF
…
4.4 MODS
• Metadata Object Description Schema (LoC, 2002)
• Compromise between MARC complexity and DC simplicity
(19 top‐elements, 64 optional subelements)
• More accurate and more modern syntax
(defined as a XML schema)
• Granularity and Extensibility
(the level of detail in description; embedding of sub‐resource description
into XML tree)
• Set of tools
(MADS – Metadata Autority Description Schema)
DLs and their Technologies 67
4.4 MODS – 19 top elements
DLs and their Technologies 68
titleInfo note
name subject
typeOfResource classification
genre relatedItem
originInfo identifier
language location
physicalDescription accessCondition
abstract extension
tableOfContents recordInfo
targetAudience
Element atributes:
lang, script, transliteration, …
TitleInfo
‐‐ title
‐‐ subTitle
‐‐ PartNumber
‐‐ partName
‐‐ nonSort
4.4 Example – MODS record
DLs and their Technologies 69
Hiring and recruitment in academic libraries :
The User Quide
Raschke
Gregory K.
Gregory K. Raschke
text
journal article
Baltimore, Md.
Johns Hopkins University Press
2003
eng
15 p.
…
4.4 MARC – MARCXML – MODS
DLs and their Technologies 70
MARC
MARCXML
MODS
[245] 10 $aHelsinki :$ba cultural and literary history /$cNeil Kent
Helsinki
a cultural and literary history
Neil Kent
Helsinki
a cultural and literary history
Neil Kent
4.5 METS
• Metadata Encoding and Transmission Standard (LoC, 2001)
• Standard for exchanging digital objects between repositories (DLs)
• An XML schema that packs into one "package“ all components of a
complex DO:
– the internal structure of the object
– metadata (descriptive, admin, technical, etc.) for the object and all its components
– source files that comprise the object
• The package can by moved and easily integrated into the new repository
• Example: object = one academic journal
– Complex internal structure: Journal – Volumes – Issues ‐ Articles
– Thousands of metadata records for all components
– Thousands of source files (articles)
DLs and their Technologies 71
4.5 METS – diagram
FF MU 2017 M.Bartošek ‐ Digitální knihovny 72
METS record (XML)
book
chapter1 chap2 chap3 chapN
page53 page54
Id:01
Id:01.1 Id:01.2 Id:01.3 Id:01.n
Id:01.2.1 Id:01.2.2
Structural map
headings
Id:01 descriptive metadata
Id:01.1 descriptive metadata
Id:01.2 descriptive metadata
…
Id:01.2.1 descr metadata
Id:01.2.2 descr metadata
Descriptive metadata Section
Id:01 admin metadata
Id:01.1 admin metadata
Id:01.2 admin metadata
…
Id:01.2.1 admin metadata
Id:01.2.2 admin metadata
Administrative metadata Sect
File inventory
p52.tiff
p53.tiff
p54.tiff page348.tiff
Id:01.2.1
Id:01.2.2
…
4.6 Other metadata schemas
TEI – Text Encoding Initiative
• XML standard for marking documents
and linguistic texts of any kind (books,
articles, poems, dramas, …) (1987)
• Very extensive (2000 pages)
• TEI‐lite
RDF – Resource Description Framework
• W3C standard for describing resources
on the web using simple machine‐readable
(understandable) statements – triplets
• Subject – predicate – object
DLs and their Technologies 73
Hamlet – Author – Shakespeare
Hamlet – Type – tragedy
Hamlet – Date – 1959
Shakespeare – Nationality – British
Shakespeare – Occupation ‐ Writer
END OF PART 1
To be continued…
DLs and their Technologies 74