Preamble

Lasaris

NoSQL databases

  • non relational databases, flexible schema
  • often used for big data applications, clusters
  • different storage structure than SQL databases
  • give up constraints/transactions to improve performance
  • low-level interface

NoSQL types

  • key-value
    • Redis, Memcached, Amazon SimpleDB…
  • document (JSON, XML…)
    • CouchDB, Elasticsearch, MongoDB…
  • graph / RDF triple
    • Virtuoso, Neo4j…
  • object
    • Caché, GemStone…

RDF databases / triple store

  • standard data model (RDF)
  • standardized interchange format (N-Triples, N-Quads, XML,…)
  • query language (SPARQL)
  • Linked Data
  • native
    • Apache Jena, Sesame/RDF4J…
  • RDF layer to relational database
    • Virtuoso, IBM DB2…

SPARQL

  • SPARQL Protocol and RDF Query Language
  • W3C Recommendation SPARQL 1.1, March 2013
  • SELECT - values as table
  • CONSTRUCT - extract RDF
  • ASK - true/false
  • DESCRIBE - extract RDF graph
  • inferencing

SPARQL example

Ontology
ex1:FullProfessor  rdf:subClassOf  ex1:Professor.
ex1:AssistantProfessor  rdf:subClassOf  ex1:Professor.
ex1:Professor  owl:equivalentClass  ex2:Teacher
Data
ex1:Bob  rdf:type  ex1:FullProfessor .
ex1:Alice  rdf:type  ex1:AssistantProfessor .
ex2:Mary  rdf:type  ex2:Teacher
SPARQL query
SELECT ?x
 WHERE {
 ?x rdf:type ex1:Professor
 }
  • noone is Professor, but inferencing will find Bob, Alice, Mary

XML databases, when to use

  • working with documents or metadata in XML format
  • data format/schema changes over time
  • complex and variable schema
  • structure queries

XML database concepts

  • basic element = document
  • documents gathered in collections ("tables")
  • query on document structure
  • output is document, document fragment, or constructed XML

XML database types

  • XML-enabled databases
    • mapping XML data to own data model (relational, object…)
      • character large object
      • fragmented to series of tables/objects
      • stored in XML Type
    • ISO SQL/XML - element construction, data mapping, enhanced SQL with XQuery
    • Oracle, IBM DB2, MS SQL, PostgreSQL
  • native XML databases
    • using XML data model directly

XML database examples

  • open-source
    • eXist, Sedna, BaseX, MonetDB, Oracle/Berkeley DB XML
  • commercial
    • MarkLogic, Virtuoso, Qizx

Interface

XQuery
<titles>{
for $book in collection("books")/book where $book/year="1990"
return $book/title
}</titles>
XQuery Update
update delete collection("books")/book/isbn

Interface, 2

SQL/XML
select
   id, vol, xmlquery('$j/name', passing journal as "j") as name
from
   journals
where
   xmlexists('$j[licence="CreativeCommons"]', passing journal as "j")
  • XSLT - output transformation
  • XML Schema - input validation

Interface, 3

  • XQJ - XQuery API for Java
    • unified query layer between application and XML Datasource
    • prepared statements
    • binding variables
XQDataSource xqs = new ExistXQDataSource();
XQDataSource xqs = new SednaXQDataSource();
xqs.setProperty("serverName", "localhost");
XQConnection conn = xqs.getConnection();
XQExpression xqe = conn.createExpression();
String xqueryString =  "for $x in doc('books.xml')//book return $x/title/text()";
XQResultSequence rs = xqe.executeQuery(xqueryString);
while(rs.next())
  System.out.println(rs.getItemAsString(null));
conn.close();

Interface, 4

  • XML:DB
  • similar concept to JDBC, abstract interface to XML database
    • Driver - access to given database
    • Collection - document collection in database
    • Services - support database features, e.g. XPathQueryService, XUpdateQueryService
    • Resource - data stored in database
    • ResourceSet - data as result of query

Benchmark

  • compare database performance
  • mostly XQuery speed, less often Update
  • data generator (up to GBs) and a set of XQueries
  • XMark, XBench, XMach-1
  • TPoX - complex database testing, XQuery and SQL/XML, indexing, XML Schema, XQuery Update