NoSQL databases

  • non relational databases, flexible schema

  • often used for big data applications, clusters

  • different storage structure than SQL databases

  • give up constraints/transactions to improve performance

  • low-level interface

NoSQL types

  • key-value

    • Redis, Memcached, Amazon SimpleDB…

  • document (JSON, XML…)

    • CouchDB, Elasticsearch, MongoDB…

  • graph / RDF triple

    • Virtuoso, Neo4j…

  • object

    • Caché, GemStone…

RDF databases / triple store

  • standard data model (RDF)

  • standardized interchange format (N-Triples, N-Quads, XML,…)

  • query language (SPARQL)

  • Linked Data

  • native

    • Apache Jena, Sesame/RDF4J…

  • RDF layer to relational database

    • Virtuoso, IBM DB2…

SPARQL

  • SPARQL Protocol and RDF Query Language

  • W3C Recommendation SPARQL 1.1, March 2013

  • SELECT - values as table

  • CONSTRUCT - extract RDF

  • ASK - true/false

  • DESCRIBE - extract RDF graph

  • inferencing

SPARQL example

Ontology
ex1:FullProfessor  rdf:subClassOf  ex1:Professor.
ex1:AssistantProfessor  rdf:subClassOf  ex1:Professor.
ex1:Professor  owl:equivalentClass  ex2:Teacher
Data
ex1:Bob  rdf:type  ex1:FullProfessor .
ex1:Alice  rdf:type  ex1:AssistantProfessor .
ex2:Mary  rdf:type  ex2:Teacher
SPARQL query
SELECT ?x
 WHERE {
 ?x rdf:type ex1:Professor
 }
  • noone is Professor, but inferencing will find Bob, Alice, Mary

XML databases, when to use

  • working with documents or metadata in XML format

  • data format/schema changes over time

  • complex and variable schema

  • structure queries

XML database concepts

  • basic element = document

  • documents gathered in collections ("tables")

  • query on document structure

  • output is document, document fragment, or constructed XML

XML database types

  • XML-enabled databases

    • mapping XML data to own data model (relational, object…)

      • character large object

      • fragmented to series of tables/objects

      • stored in XML Type

    • ISO SQL/XML - element construction, data mapping, enhanced SQL with XQuery

    • Oracle, IBM DB2, MS SQL, PostgreSQL

  • native XML databases

    • using XML data model directly

XML database examples

  • open-source

    • eXist, Sedna, BaseX, MonetDB, Oracle/Berkeley DB XML

  • commercial

    • MarkLogic, Virtuoso, Qizx

Interface

XQuery
<titles>{
for $book in collection("books")/book where $book/year="1990"
return $book/title
}</titles>
XQuery Update
update delete collection("books")/book/isbn

Interface, 2

SQL/XML
select
   id, vol, xmlquery('$j/name', passing journal as "j") as name
from
   journals
where
   xmlexists('$j[licence="CreativeCommons"]', passing journal as "j")
  • XSLT - output transformation

  • XML Schema - input validation

Interface, 3

  • XQJ - XQuery API for Java

    • unified query layer between application and XML Datasource

    • prepared statements

    • binding variables

XQDataSource xqs = new ExistXQDataSource();
XQDataSource xqs = new SednaXQDataSource();
xqs.setProperty("serverName", "localhost");
XQConnection conn = xqs.getConnection();
XQExpression xqe = conn.createExpression();
String xqueryString =  "for $x in doc('books.xml')//book return $x/title/text()";
XQResultSequence rs = xqe.executeQuery(xqueryString);
while(rs.next())
  System.out.println(rs.getItemAsString(null));
conn.close();

Interface, 4

  • XML:DB

  • similar concept to JDBC, abstract interface to XML database

    • Driver - access to given database

    • Collection - document collection in database

    • Services - support database features, e.g. XPathQueryService, XUpdateQueryService

    • Resource - data stored in database

    • ResourceSet - data as result of query

Benchmark

  • compare database performance

  • mostly XQuery speed, less often Update

  • data generator (up to GBs) and a set of XQueries

  • XMark, XBench, XMach-1

  • TPoX - complex database testing, XQuery and SQL/XML, indexing, XML Schema, XQuery Update