PA152: Efficient Use of DB
12. Replication and
High Availability
Vlastislav Dohnal
PA152, Vlastislav Dohnal, FI MUNI, 2015 2
Credits
 This presentation is based on:
 Microsoft MSDN library
 Course NoSQL databases and Big Data management
 Irena Holubová
 Charles University, Prague
 http://www.ksi.mff.cuni.cz/~holubova/NDBI040/
PostgreSQL documentation
 http://www.postgresql.org/docs/9.3/static/high-
availability.html
PA152, Vlastislav Dohnal, FI MUNI, 2015 3
Contents
 Availability
 Data distribution & Replication
 High availability
 Failover
 Recommendations
Availability
PA152, Vlastislav Dohnal, FI MUNI, 2015 4
DB Server
DB Server
Source: Microsoft
Determining Availability Requirements
 Hours of Operation
Business hours vs. all of the time
 intranet service vs. web services
 shift workers vs. all-around the world customers
 Connectivity Requirements
Online vs. offline applications
 Tight/Loose coupling of app and DBMS
Synchronous vs. asynchronous data updates
PA152, Vlastislav Dohnal, FI MUNI, 2015 5
Availability
 Definition in operation hours
 Av = “up time” / “total time” = MTTF / (MTTF+MTTR)
 “up time” = the system is up and operating
 More practical def.
 Av = (total time - down time) / total time
 Down time
 Scheduled – reboot, SW/HW upgrade, …
 Unscheduled – HW/SW failure, security breaches,
network unavailability, power outage, disasters, …
 For “true” high-availability, down time is not
distinguished
PA152, Vlastislav Dohnal, FI MUNI, 2015 6
Nines
 Availability as percentage of uptime
Class of nines: 𝑐 = − log10 1 − 𝐴𝑣
 Assuming 24/7 operation:
PA152, Vlastislav Dohnal, FI MUNI, 2015 7
Nine class Availability Downtime per year Downtime per month Downtime per week
1 90% 36.5 days 72 hours 16.8 hours
2 99% 3.65 days 7.20 hours 1.68 hours
3 99.9% 8.76 hours 43.8 minutes 10.1 minutes
4 99.99% 52.56 minutes 4.32 minutes 1.01 minutes
5 99.999% 5.26 minutes 25.9 seconds 6.05 seconds
6 99.9999% 31.5 seconds 2.59 seconds 0.605 seconds
7 99.99999% 3.15 seconds 0.259 seconds 0.0605 seconds
Source: Wikipedia.org
Scalability
 Scalability
 Providing access to a number of concurrent users
 Handling growing amounts of data without losing
performance
 With acceptable latency!
 Scaling Up – vertical scaling  vendor dependence
 Increasing RAM
 Multiprocessing
 Scaling Out – horizontal scaling
 Replication
 Read-only standby servers
 Server federations / clusters / data distribution
PA152, Vlastislav Dohnal, FI MUNI, 2015 8
Horizontal Scaling
 Systems are distributed across multiple
machines or nodes
 Commodity machines  cost effective
 Often surpasses scalability of vertical approach
 Fallacies of distributed computing by Peter Deutsch
 Network
 Is reliable, secure, homogeneous
 Topology does not change
 Latency and transport cost is zero
 Bandwidth is infinite
 One administrator
PA152, Vlastislav Dohnal, FI MUNI, 2015 9
Source: https://blogs.oracle.com/jag/resource/Fallacies.html
Brewer’s CAP Theorem
 Consistency
 After an update, all readers in a distributed system
see the same data
 All nodes are supposed to contain the same data
at all times
 E.g. in multiple instances, all writes must be
duplicated before write operation is completed.
 Availability
 Every request receives a response
 about whether it was successful or failed
 Partition Tolerance
 System continues to operate despite arbitrary
message loss or failure of part of the system.
PA152, Vlastislav Dohnal, FI MUNI, 2015 10
Brewer’s CAP Theorem
 Only 2 of 3 guarantees can be given in a “shareddata”
system.
 Proved by Nancy Lynch in 2002
 ACID
 provides Availability and
Consistency
 E.g. database on a single
machine
 BASE
 provides Availability and Partition
tolerance
 Reality: you can trade a little consistency for some
availability
 E.g. distributed database
PA152, Vlastislav Dohnal, FI MUNI, 2015 11
Source: http://bigdatanerd.wordpress.com
NewSQL
DB
NewSQL
 Distributed database that scales out
 CP system
trades availability for consistency when partition
happens
 MySQL cluster, Google Spanner, VoltDB, …
In fact, master-master replication with data
sharding
PA152, Vlastislav Dohnal, FI MUNI, 2015 12
BASE Properties
 Basically Available
Partial failures can occur, but without total
system failure
 Soft state
System is in flux / non-deterministic
 Changes occur all the time
 Eventual consistency (replica convergence)
is a liveness guarantee
 reads eventually return the same value
is not safety guarantee
 can return any value before it converges
PA152, Vlastislav Dohnal, FI MUNI, 2015 13
Consistency
 Strong (ACID) vs. Eventual (BASE)
consistency
 Example:
PA152, Vlastislav Dohnal, FI MUNI, 2015 14
Server A: read(A)=1 write(A,2) read(A)=2
Server B: read(A)= 1 read(A)=1 read(A)=2
time
Server C: read(A)= 1 read(A)=2
Server A: read(A)=1 write(A,2) read(A)=2
Server B: read(A)= 1 read(A)=2 read(A)=2
Server C: read(A)= 1 read(A)=2
Inconsistent state
EventualStrong
Need for Distributing Data
 Brings data closer to its user
 Allows site independence
 Separates
Online transaction processing
Read-intensive applications
 Can reduce conflicts during user requests
 Process big data
PA152, Vlastislav Dohnal, FI MUNI, 2015 15
Replication / Distribution Model
 Model of distributing data
Replication
 The same data stored in more nodes.
Filtering data (sharding)
 The data is partitioned and stored separately
 Helps avoid replication conflicts when multiple
sites are allowed to update data.
PA152, Vlastislav Dohnal, FI MUNI, 2015 16
Filtering Data
Subscriber
Vertical Filtering
Horizontal Filtering
2
3
4
5
6
7
1
A B C D E F
2
3
4
5
6
7
1
A B C D E F
2
3
4
5
6
7
1
A B E
3
6
2
A B C D E F
Table A
Table BPublisher
Source: Microsoft
PA152, Vlastislav Dohnal, FI MUNI, 2015 17
Distribution Model
 Master-slave model (replication)
 Load-balancing of read-intensive queries
 Master node
 manages data
 distributes changes
to slaves
 Slave node
 stores data
 queries data
 no modifications
to data
PA152, Vlastislav Dohnal, FI MUNI, 2015 18
Slaves
Master
One master / many slaves
Distribution Model
 Master-master model
Typically with filtering data
 Master for a subset of data
 Slave for the rest
Consistency needs resolving of update
conflicts
PA152, Vlastislav Dohnal, FI MUNI, 2015 19
Multiple Masters
Master/Slave
Master/Slave
Master/Slave
Master-master Model
PA152, Vlastislav Dohnal, FI MUNI, 2015 20
Orders (Master A)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
1
1
1000
3100
~
~
15
22
Orders (Master B)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
2
2
1000
2380
~
~
32
8
Orders (Master C)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
3
3
1000
1070
~
~
7
19
Master/Slave
Master/SlaveMaster/Slave
Source: Microsoft
Replication Types
PA152, Vlastislav Dohnal, FI MUNI, 2015 21
Snapshot Replication
Transactional Replication
Distributed Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Source: Microsoft
Replication Types
 Distributed Transactions
For “real” master-master model, ensures
consistency
Low latency, high consistency
 Transactional Replication
Replication of incremental changes
Minimal latency
PA152, Vlastislav Dohnal, FI MUNI, 2015 22
Replication Types
 Snapshot Replication
Periodic bulk transfer of new snapshots of
data
Data changes – substantial but infrequent
Slaves are read-only
High latency is acceptable
PA152, Vlastislav Dohnal, FI MUNI, 2015 23
Replication Types
 Merge Replication
Autonomous changes to replicated data are
later merged
Does not guarantee transactional
consistency, but converges
Default and custom conflict resolution rules
Adv: Nodes can update data offline, sync later
Disadv: Changes to schema needed.
PA152, Vlastislav Dohnal, FI MUNI, 2015 24
Maintaining High-Availability
 Standby server
Shared disk failover (NAS)
File system replication (DRBD)
Transaction log shipping
Trigger-based replication
Statement-Based
Replication
Middleware
PA152, Vlastislav Dohnal, FI MUNI, 2015 25
Clients
Primary
Node
Secondary/
Standby
Node
Cluster
Log-shipping Standby Server
 Also called warm standby
 Primary node
 serves all queries
 in permanent archiving mode
 Continuous sending of WAL records to standby servers
 Standby server
 serves no queries
 in permanent recovery mode
 Continuous processing of WAL records arriving from primary
node
 Log shipping can be synchronous/asynchronous
 Disadvantage: all tables are replicated typically
 Advantage: no schema changes, no trigger
definitions
PA152, Vlastislav Dohnal, FI MUNI, 2015 26
Failover
 If primary fails, standby server begins
failover.
Standby applies all WAL records pending,
marks itself as primary,
starts to serve all queries.
 If standby fails, no action taken.
After becoming online, catch-up procedure is
started.
 Heartbeat mechanism
to continually verify the connectivity between
the two and the viability of the primary server
PA152, Vlastislav Dohnal, FI MUNI, 2015 27
Failover
 Failover by standby succeeded
New standby should be configured
Original primary node becomes available
 inform it that it is no longer the primary
 do so-called STONITH (Shoot The Other Node In
The Head),
 otherwise serious data corruption/loss may occur
Typically old primary becomes new standby
PA152, Vlastislav Dohnal, FI MUNI, 2015 28
Primary and Standby Servers
 Swap primary and standby regularly
To verify recovery steps
To do necessary maintenance on standby
server
 SW/HW upgrades, …
PA152, Vlastislav Dohnal, FI MUNI, 2015 29
Recommended Practices
 Maximize availability at each tier of the
application
 Keep standby servers on a different
subnet
 Different power supply to the primary
server
 Test whether your availability solution
works
PA152, Vlastislav Dohnal, FI MUNI, 2015 30