Document-oriented databases Martin Hrdlička A database is not the synonym for a relational database. NoSQL ˇ Scalability * Distributive * Performance * Flexibility * Domain Complexity Stores Key-value Graph-oriented Column-oriented DocumentSimpleDB CouchDB MongoDB VertexDB Neo4j InfoGrid RDF Project Valdemort Lotus Notes Riak Cassandra Redis XML HBase Dynomite BigTable Dryad LucidDBFluidDB Tokyo Cabinet Scalaris What is a document? A document (noun) is a bounded physical representation of a body of information designed with the capacity (and usually intent) to communicate Source: wikipedia.org What is a document? A document (noun) is a bounded physical representation of a body of information designed with the capacity (and usually intent) to communicate Source: wikipedia.org A document is a structure Author Title Text ... Chapters Paragraphs First name Last name Recipient Stamps Text ... Name Address Price Picture Postcode Street Town Country ... Document-oriented stores * CouchDB * MongoDB * SimpleDB * Riak * Lotus Notes * XML databases XML, JSON, YAML, ... { "title": "Presentation of document-oriented databases systems", "presenter": { "uco": 208297 "name": "Martin Hrdlicka", }, "session": { "start": "26.11.2009 12:00" "end": "26.11.2009 13:00" }, "tags": ["couchdb", "mongodb", "document databases"] } Presentation of document-oriented database systems 208297 Martin Hrdlicka 26.11.2009 12:00 26.11.2009 13:00 couchdb mongodb document databases XML JSON What are document-oriented stores and why should we use them? * Schema-less * Distributive * Scalability * Replication CouchDBrelax CouchDBrelax CouchDBrelax CouchDB * Robust, highly concurrent, fault-tolerant * HTTP protocol and REST API * MapReduce system for querying * Incremental replication * P2P and multi-master replication * ACID, MVCC * Modular system, Futon web interface Couch Document CouchDBrelax { "_id": "e34ae5e9ff56453e81351d7cdf51fd58", "_rev": "2-967a00dff5e02add41819138abb3284d", "_attachments": { "reigatexslt2.xml": { "stub": true, "content_type": "text/xml", "length": 7687, "revpos": 2 } } "title": "Presentation of document-oriented databases systems", "presenter": { "uco": 208297 "name": "Martin Hrdlicka", }, "session": { "start": "26.11.2009 12:00" "end": "26.11.2009 13s:00" }, "tags": ["couchdb", "mongodb", "document databases"] } * Semi structured data * JSON * Revisions * Binary attachments * Full updates Couch Views CouchDBrelax function(doc) { if (doc.tags) { doc.tags.forEach(function(tag) { emit(tag, 1) }) } } Map Reduce function(keys, values, rereduce) { return sum(values); } { "couchdb": 2 "mongodb": 1 } { ... "tags": ["mongodb", "couchdb"] }, { ... "tags: ["couchdb"] } Documents Result * Static and predefined * Incremental indexing * Called design documents (prefix _design) Close To Metal CouchDBrelax Create HTTP REST API PUT /my_db/doc_id Read GET /my_db/doc_id Update PUT /my_db/doc_id Delete DELETE /my_db/doc_id HTTP tools * load balancing * cluster Who uses it CouchDBrelax MongoDB * Between key-value store and traditional relational databases * Binary protocol * Dynamic queries * Fail-over, replication * Indexes for inner-objects (embedded documents) * Auto-sharding, GridFS ˇ JSON-like format Binary JSON (BSON) * Data-types (String, Date, Integer, ...) * Binary data (max. 4 MB, 4 MB > GridFS) * Dynamic queries Mongo document Mongo Queries * Dynamic * Based on JSON-like syntax * Querying for inner-objects Conditional Operators : <, <=, >, >= // field > value db.collection.find({ "field" : { $gt: value } }); // field < value db.collection.find({ "field" : { $lt: value } }); // field >= value db.collection.find({ "field" : { $gte: value } }); db.collection.find({ "presenter.name" : "Martin Hrdlicka" }); Value in an Embedded Object limit(); db.students.find().limit(10); Who uses it CouchDB vs. MongoDB vs. MySQL Source: http://www.mongodb.org/display/DOCS/MongoDB,+CouchDB,+MySQL+Compare+Grid Use cases * Web applications * Analytics tools * CRM, Warehouse * Caching * Etc. Real World? !Pragmaticraft " !"#$%&'()*+#$)(,-)*),.*#&'.,/#&,01'$234,5)61!,%,78)(,9#.:';#<,%,=>>?%@@%@A " Source: http://github.com/igal/ruby_datastores/raw/master/2009-11-14%20Non-relational%20data%20stores%20for%20OpenSQL%20Camp.pdf !Pragmaticraft " !"#$%&'()*+#$)(,-)*),.*#&'.,/#&,01'$234,5)61!,%,78)(,9#.:';#<,%,=>>?%@@%@A " Source: http://github.com/igal/ruby_datastores/raw/master/2009-11-14%20Non-relational%20data%20stores%20for%20OpenSQL%20Camp.pdf !Pragmaticraft " !"#$%&'()*+#$)(,-)*),.*#&'.,/#&,01'$234,5)61!,%,78)(,9#.:';#<,%,=>>?%@@%@A " Source: http://github.com/igal/ruby_datastores/raw/master/2009-11-14%20Non-relational%20data%20stores%20for%20OpenSQL%20Camp.pdf Your use case is more important here! Philosophy Design Strategy Requirements/Goals Product Should start asking yourself to your Useful links Apache CouchDB http://couchdb.apache.org CouchDB: The Definitive Guide http://books.couchdb.org/relax/ Interactive CouchDB http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html MongoDB http://www.mongodb.org MongoMapper (ODM tool for Ruby) http://github.com/jnunemaker/mongomapper Why I think Mongo is to Databases what Rails was to Frameworks http://railstips.org/2009/12/18/why-i-think-mongo-is-to-databases-what-rails-was-to-frameworks Useful links NoSQL http://en.wikipedia.org/wiki/NoSQL "Relational Databases", Comm. of the ACM 35,4 (April 1992), 16,18. http://home.pipeline.com/~hbaker1/letters/CACM-RelationalDatabases.html Anti-RDBMS: A list of distributed key-value stores http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/ NoSQL East http://nosqleast.com NoSQL Discussion http://groups.google.com/group/nosql-discussion NoSQL Databases http://nosql-databases.org/ Questions? Goodbye!