Principles of NoSQL Databases
Data Model, Distribution & Consistency
Lecture 3 of NoSQL Databases (PA195)
David Novak & Vlastislav Dohnal
Faculty of Informatics, Masaryk University, Brno
http://disa.fi.muni.cz/vlastislav-dohnal/teaching/nosql-databases-fall-2019/
Agenda
● Fundamentals of RDBMs and NoSQL Databases
● Data Model of Aggregates
● Models of Data Distribution
○ scalability, sharding
○ replication: master-slave, peer-to-peer
○ combination
● Consistency
○ write-write vs. read-write conflict
○ strategies and techniques
○ relaxing consistency
2
Agenda
● Fundamentals of RDBMs and NoSQL Databases
● Data Model of Aggregates
● Models of Data Distribution
○ scalability, sharding
○ replication: master-slave, peer-to-peer
○ combination
● Consistency
○ write-write vs. read-write conflict
○ strategies and techniques
○ relaxing consistency
3
Fundamentals of RDBMS
Relational Database Management Systems (RDMBS)
1. Data structures are broken into the smallest units
○ normalization of database schema (3NF, BCNF)
● because the data structure is known in advance
● and users/applications query the data in different ways
○ database schema is rigid
2. Queries merge the data from different tables
3. Write operations are simple, search can be slower
4. Strong guarantees for transactional processing 4
From RDBMS to NoSQL
Efficient implementations of table joins and of
transactional processing require centralized system.
NoSQL Databases:
● Database schema tailored for specific application
○ keep together data pieces that are often accessed together
● Write operations might be slower but read is fast
● Weaker consistency guarantees
=> efficiency and horizontal scalability
5
Data Model
● The model by which the database organizes data
● Each NoSQL DB type has a different data model
○ Key-value, document, column-family, graph
○ The first three are oriented on aggregates
● Let us have a look at the classic relational model
6
Example (1): UML Model
source: Holubová, Kosek, Minařík, Novák. Big Data a NoSQL databáze. 2015. 7
Example (2): Relational Model
source: Holubová, Kosek, Minařík, Novák. Big Data a NoSQL databáze. 2015. 8
Agenda
● Fundamentals of RDBMs and NoSQL Databases
● Data Model of Aggregates
● Models of Data Distribution
○ scalability, sharding
○ replication: master-slave, peer-to-peer
○ combination
● Consistency
○ write-write vs. read-write conflict
○ strategies and techniques
○ relaxing consistency
9
Aggregates
An aggregate
● A data unit with a complex structure
○ Not simply a tuple (a table row) like in RDBMS
● A collection of related objects treated as a unit
○ unit for data manipulation and management of consistency
● Relational model is aggregate-ignorant
○ It is not a bad thing, it is a feature
○ Allows to easily look at the data in different ways
○ Best choice when there is no primary structure for data
manipulation
10
Example (3): Aggregates
source: Holubová, Kosek, Minařík, Novák. Big Data a NoSQL databáze. 2015. 11
Example (4): Aggregates
// collection "Order"
{
"orderNumber": 11,
"date": "2015-04-01",
"customerID": 1,
"orderItems": [
{
"productID": 111,
"name": "Vysavač ETA E1490",
"quantity": 1,
"price": 1300
},
{
"productID": 112,
"name": "Sáček k ETA E1490",
"quantity": 10,
"price": 300
}
],
"invoice": { "bankAccount": …, …}
}
// collection "Customer"
{
"customerID": 1,
"name": "Jan Novák",
"address": {
"city": "Praha",
"street": "Krásná 5",
"ZIP": "111 00"
}
}
// collection "Invoice"
{
"invoiceID": 2015003,
"orderNumber": 11,
"bankAccount": "64640439/0100",
"paymentDate": "2015-04-16",
"address": {
"city": "Brno",
"street": "Slunečná 7",
"ZIP": "602 00"
} 12
NoSQL Databases: Aggregate-oriented
Many NoSQL stores are aggregate-oriented:
○ There is no general strategy to set aggregate boundaries
○ Aggregates give the database information about which bits
of data will be manipulated together
■ What should be stored on the same node
○ Minimize the number of nodes accessed during a search
○ Impact on concurrency control:
■ NoSQL databases typically support atomic manipulation of a single
aggregate at a time
13
Agenda
● Fundamentals of RDBMs and NoSQL Databases
● Data Model of Aggregates
● Models of Data Distribution
○ scalability, sharding
○ replication: master-slave, peer-to-peer
○ combination
● Consistency
○ write-write vs. read-write conflict
○ strategies and techniques
○ relaxing consistency
14
Scalability of Database Systems
● Scalability = handling growing amounts of data
and queries without losing performance
Two general approaches:
● vertical scalability
● horizontal scalability
15
Vertical Scalability (Scaling up)
● Involve larger and more powerful machines
○ large disk storage using disk arrays
○ massively parallel architectures
○ large main memories
● Traditional choice
○ in favour of strong consistency
○ very simple to realize (no handling of data distribution)
● Works in many cases but…
16
Vertical Scalability: Drawbacks
● Higher costs
○ Large machines cost more than equivalent commodity HW
● Data growth limit
○ Large machine works well until the data grows to fill it
○ Even the largest of machines has a limit
● Proactive provisioning
○ In the beginning, no idea of the final scale of the application
○ An upfront budget is needed when scaling vertically
● Vendor lock-in
○ Large machines are produced by a few vendors
○ Customer is dependent on a single vendor (proprietary HW)17
Horizontal Scalability (Scaling out)
System is distributed across multiple machines/nodes
● Commodity machines, cost effective
● Provides higher scalability than vertical approach
○ Data is partitioned over many disks
○ Application can use main memory of all machines
○ Distribution computational model
● Introduces new problems:
○ synchronization, consistency, partial failures handling, etc.
18
Horizontal Scalability: Fallacies
● Typical false assumptions of distributed computing:
○ The network is reliable
○ Latency is zero
○ Bandwidth is infinite
○ The network is secure
○ The network is homogeneous
○ Topology of the network does not change
○ There is one network administrator
source: https://blogs.oracle.com/jag/resource/Fallacies.html 19
Distribution Models: Overview
● Horizontal scalability = scaling out
● Two generic ways of data distribution:
○ Replication – the same data is copied over multiple nodes
■ Master-slave vs. peer-to-peer
○ Sharding – different data chunks are put on different nodes
(data partitioning)
■ Master-master
● We can use either or combine them
○ Distribution models = specific ways to do sharding,
replication or combination of both
20
Distribution Model: Single Server
● Running the database on a single machine is
always the preferred scenario
○ it spares us a lot of problems
● It can make sense to use a NoSQL database on a
single server
○ Other advantages remain: Flexible data model, simplicity
○ Graph databases: If the graph is “almost” complete, it is
difficult to distribute it
21
Sharding (Data Partitioning)
● Placing different parts
of the data (card suits)
onto different servers
● Applicability: Different
clients access different
parts of the dataset
source: Sadalage & Fowler: NoSQL Distilled, 2012 22
Distribution Models: Sharding (2)
We should try to ensure that
1. Data accessed together is kept together
○ So that user gets all data from a single server
○ Aggregates data model helps achieve this
2. Arrange the data on the nodes:
○ Keep the load balanced (can change in time)
○ Consider the physical location (of the data centers)
● Many NoSQL databases offer auto-sharding
● A node failure makes shard’s data unavailable
○ Sharding is often combined with replication 23
Master-slave Replication
● We replicate data across
multiple nodes
● One node is designated as
primary (master), others as
secondary (slaves)
● Master is responsible for
processing all updates to
the data
● Reads from any node source: Sadalage & Fowler: NoSQL Distilled, 2012 24
Master-slave Replication (2)
● For scaling a read-intensive application
○ More read requests → more slave nodes
○ The master fails → the slaves can still handle read requests
○ A slave can become a new master quickly (it is a replica)
● Limited by ability of the master to process updates
● Masters are selected manually or automatically
○ User-defined vs. cluster-elected
25
Peer-to-peer Replication
● No master, all the
replicas are equal
● Every node can handle
a write and then
spreads the update
to the others
source: Sadalage & Fowler: NoSQL Distilled, 201226
Peer-to-peer Replication (2)
● Problem: consistency
○ Users can write simultaneously at two different nodes
● Solution:
○ When writing, the replicas coordinate to avoid conflict
■ At the cost of network traffic
■ The write operation waits till the coordination process is finished
○ Not all replicas need to agree on the write, just a majority
(details below)
27
Sharding & Replication (1)
● Sharding and master-slave replication:
○ Each data shard is replicated (via a single master)
○ A node can be a master for some data and a slave for other
source: Sadalage & Fowler: NoSQL Distilled, 2012
28
Sharding & Replication (2)
● Sharding and peer-to-peer replication:
○ A common strategy for column-family databases
○ A typical default is replication factor of 3
■ each shard is present on three nodes
source: Sadalage & Fowler: NoSQL Distilled, 2012
=> we have to solve
consistency issues
(let’s first talk more about
what consistency means)
29
Agenda
● Fundamentals of RDBMs and NoSQL Databases
● Data Model of Aggregates
● Models of Data Distribution
○ scalability, sharding
○ replication: master-slave, peer-to-peer
○ combination
● Consistency
○ write-write vs. read-write conflict
○ strategies and techniques
○ relaxing consistency
30
Consistency in Databases
● “Consistency is the lack of contradiction in the DB”
● Centralized RDBMS ensure strong consistency
● Distributed NoSQL databases typically relax
consistency (and/or durability)
○ Strong consistency → eventual consistency
○ BASE (basically available, soft state, eventual consistency)
○ CAP theorem
○ tradeoff between consistency and availability
31
Write (Update) Consistency
● Problem: two users want
to update the same record
(write-write conflict)
○ Issues: lost update, second update is based on stale data
DB
Write(K, A)
Write(K, B)
● Two general solutions
○ Pessimistic approach: preventing conflicts from occurring
■ acquiring write locks before update
○ Optimistic approach: lets conflicts occur, but detects
them and takes actions to resolve them
■ conditional update, save both updates and record the conflict
■ implementation by, e.g., version stamps (details later in the course)
32
Read Consistency
● Problem: one user reads
in the middle of other
user’s writes
(read-write conflict, inconsistent read)
○ this leads to logical inconsistency
● Ideal solution: transactions (ACID)
○ strong consistency
DB
1. Write(K, A)
2. Read(K)
4. Write(K’, B)
3. Read(K’)
33
Read Consistency in NoSQL
● NoSQL databases inherently support atomic
updates only within a single aggregate
○ Update that affects multiple aggregates leaves a time slot
when clients could perform an inconsistent read
○ Inconsistency window
● Graph Databases
○ Typically strong consistency (if centralized)
34
Transaction Processing in NoSQL
● Basically, no problem if the DB is centralized
○ ACID can be implemented
○ Various levels of isolation (details later in the course)
■ read uncommitted
■ read committed
■ repeatable reads
■ serializable
● Distributed transactions (details later in the course)
○ X/Open Distributed Trans. Processing Model (X/Open XA)
○ Two-phase Commit Protocol (2PC)
○ Strong Strict Two-phase Locking (SS2PL)
35
Replication Consistency
● Consistency among replicas
○ Ensuring that the same data item
has the same value when reading
from different replicas
● After some time, the write propagates everywhere
○ Eventual consistency, in the meanwhile: stale data
○ Various levels of consistency (e.g. quorums - see below)
● Read-your-writes (session consistency)
○ Is violated if one user writes and reads on different replicas
○ Solution: sticky session (session affinity)
node 1 node 2
read(K)
read(K)
36
CAP Theorem
CAP = Consistency, Availability, Partition Tolerance
Consistency
● After an update, all readers in a distributed system
(assuming replication) see the same data
● Example:
○ A single server database is always consistent
○ If the replication factor > 1, the system must handle the
writes and/or reads in a special way
37
CAP Theorem (2)
Availability
● Every request must result in a response
○ If a node (server) is working, it can read and write data
Partition Tolerance
● System continues to operate, even if two sets of servers
get isolated
○ A connection failure should not shut the system down
It would be great to have all these three CAP properties!
38
CAP Theorem: Formulation
● CAP Theorem: A “shared-data” system cannot
have all three CAP properties
○ Or: only two of the three CAP properties are possible
■ This is the common version of the theorem
● First formulated in 2000: prof. Eric Brewer
○ PODC Conference Keynote speech
■ www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
● Proven in 2002: Seth Gilbert & Nancy Lynch
○ SIGACT News 33(2) http://dl.acm.org/citation.cfm?id=564601
39
● A single-server system is
always CA
○ As well as all ACID systems
● A distributed system practically
has to be tolerant of network Partitions (P)
○ because it is difficult to detect all network failures
● So, tradeoff between Consistency and Availability
○ in fact, it is not a binary decision
CAP Theorem: Real Application
40
PC: Partition Tolerance & Consistency
Example: two users, two
nodes, two write attempts
● Strong consistency:
○ Before the write is committed,
both nodes have to agree on the order of the writes
node 1 node 2
Write(key, A)
Write(key, B)
agreement
node 1 node 2
Write(key, A)
(waiting 4ever)
Write(key, B)
(waiting 4ever)
● If the nodes are partitioned,
we are losing Availability
○ (but reads are still available)
41
PC: Partition Tolerance & Consistency (2)
● Adding some availability:
○ Master-slave replication master slave
Write(key, A)
Write(key, B)
master slave
Write(key, A)
(OK)
Write(key, B)
(waiting 4ever)
● In case of partitioning,
master can commit write
○ Losing some Consistency:
Data on slave will be stale
for read
Write(key, B)
42
PA: Partition Tolerance & Availability
● Choosing Availability:
○ Peer-to-peer replication
○ Eventual consistency
● In case of Partitioning
○ All requests are answered (full Availability)
○ We risk losing consistency guarantees completely
● But we can do something in the middle: Quorums
peer 1 peer 2
Write(key, A)
Write(key, B)
43
Quorums
● Peer-to-peer replication with replication factor N
○ Number of replicas of each data object
● Write quorum: W
○ When writing, at least W replicas have to agree
○ Having W > N/2 results in write consistency
■ in case of two simultaneous writes, only one can get the majority
peer 1 peer 2
Write(key, A) Write(key, B)
peer 3
Example:
● Replication factor N = 3
● Write quorum: W = 2
(W > N/2) 44
Quorums (2)
● Read quorum: R
○ Number of peers contacted for a single read
■ Assuming that each value has a time stamp (time of write) to tell the
older value from the newer
○ For a strong read consistency: R + W > N
■ reader surely does not read stale data
peer 1 peer 2
Write(key, A) Write(key, B)
peer 3
Read(key)
Example:
● Read quorum: R = 2
(R + W > N)
● 2 nodes contacted for read
=> the newest data returned
45
Relaxing Durability
Durability:
● When Write is committed, the change is permanent
● In some cases, strict durability is not essential and it
can be traded for scalability (write performance)
○ e.g., storing session data, collection sensor data
A simple way to relax durability:
● Store data in memory and flush to disk regularly
○ if the system shuts down, we loose updates in memory
46
Relaxing Durability II
● Replication durability (of a write operation)
○ The writing node can either
1. acknowledge (answer) the write operation immediately
● not wait until spread to other replicas
● if the writing node crashes before spreading, durability fails
● write-behind (write-back)
2. or it can first spread the update to other replicas
● operation is answered only after acknowledgement from the others
● write-through
○ both variants are possible for P2P repl., master-slave
replication, quora...
47
BASE Concept
BASE is a vague term often used as contrast to ACID
● Basically Available
○ The system works basically all the time
○ Partial failures can occur, but without total system failure
● Soft state
○ The system is in flux (unstable), non-deterministic state
○ Changes occur all the time
● Eventual consistency
○ The system will be in some consistent state
○ At some time in future
source: Eric Brewer: Towards Robust Distributed Systems. www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf 48
Summary of the Lesson
● Aggregate-oriented data modelling
● Sharding vs. replication
○ Master-slave vs. peer-to-peer replication
■ Combination of sharding & replication
● Database consistency:
○ Write/Read consistency (write-write & write-read conflict)
■ Replication consistency (also, read-your-own-writes)
● Relaxing consistency:
○ CAP (Consistency, Availability, Tolerance to Partitions),
■ Eventual consistency
○ Quoras (write/read quorum)
■ can ensure strong replication consistency; wide range of settings 49
Conclusions
● There is a wide range of options influencing
○ Scalability
■ of data storage, of read operations, of update (write) requests
○ Availability
■ How the system behaves in case of HW (e.g. network) failure
○ Consistency
■ Consistency has many facets and it depends how important they are
○ Durability
■ Can I rely on confirmed updates (and is it so important)?
○ Fault-tolerance
■ Do I have copies of data to recover after a complete HW fail?
● It’s good to know the options and choose wisely
50
References
● I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a
NoSQL databáze. Praha: Grada Publishing, 2015. 288 p.
● Sadalage, P. J., & Fowler, M. (2012). NoSQL Distilled: A
Brief Guide to the Emerging World of Polyglot
Persistence. Addison-Wesley Professional, 192 p.
● RNDr. Irena Holubova, Ph.D. MMF UK course NDBI040:
Big Data Management and NoSQL Databases
● Eric Brewer: Towards Robust Distributed Systems.
www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
51