PA152: Efficient Use of DB
13. Replication and
High Availability
Vlastislav Dohnal
PA152, Vlastislav Dohnal, FI MUNI, 2023 2
Credits
◼ This presentation is based on:
 Microsoft MSDN library
 Course NoSQL databases and Big Data management
◼ Irena Holubová
◼ Charles University, Prague
◼ http://www.ksi.mff.cuni.cz/~holubova/NDBI040/
PostgreSQL documentation
◼ http://www.postgresql.org/docs/9.3/static/high-
availability.html
PA152, Vlastislav Dohnal, FI MUNI, 2023 3
Contents
◼ Availability
◼ Data distribution & replication
◼ High availability
◼ Failover
◼ Recommendations
Availability
PA152, Vlastislav Dohnal, FI MUNI, 2023 4
DB Server
DB ServerSource: Microsoft
High availability (HA) is the ability of a system to operate continuously
without failing for a specified period of time.
Determining Availability
◼ Hours of Operation
Business hours vs. all of the time
◼ intranet service vs. web service
◼ shift workers vs. all-around the world customers
◼ Connectivity Requirements
Tight/Loose coupling of app and DBMS
◼ Synchronous vs. asynchronous data updates
Online vs. offline applications – so response
time can be important!
PA152, Vlastislav Dohnal, FI MUNI, 2023 5
Availability
◼ Definition using operation hours
 Av = “up time” / “total time”
◼ “up time” = the system is up and operating
 More practical def.
◼ Av = (total time - down time) / total time
◼ Down time
 Scheduled – reboot, SW/HW upgrade, …
 Unscheduled – HW/SW failure, security breaches,
network unavailability, power outage, disasters, …
 Non-functional app requirements – response time
◼ For “true” high-availability, down time is not
distinguished
PA152, Vlastislav Dohnal, FI MUNI, 2023 6
Nines
◼ Availability as percentage of uptime
Class of nines: 𝑐 = − log10 1 − 𝐴𝑣
◼ Assuming 24/7 operation:
PA152, Vlastislav Dohnal, FI MUNI, 2023 7
Nine class Availability Downtime per year Downtime per month Downtime per week
1 90% 36.5 days 72 hours 16.8 hours
2 99% 3.65 days 7.20 hours 1.68 hours
3 99.9% 8.76 hours 43.8 minutes 10.1 minutes
4 99.99% 52.56 minutes 4.32 minutes 1.01 minutes
5 99.999% 5.26 minutes 25.9 seconds 6.05 seconds
6 99.9999% 31.5 seconds 2.59 seconds 0.605 seconds
7 99.99999% 3.15 seconds 0.259 seconds 0.0605 seconds
Source: Wikipedia.org
Scalability
◼ Providing access to a number of
concurrent users
◼ Handling growing amounts of data without
losing performance
◼ With acceptable latency!
PA152, Vlastislav Dohnal, FI MUNI, 2023 8
Scalability: Solutions
◼ Scaling Up – vertical scaling → vendor
dependence
Increasing RAM
Multiprocessing
◼ Scaling Out – horizontal scaling
Server federations / clusters
Data partitioning
◼ sharding
Replication
◼ Read-only (standby) servers
PA152, Vlastislav Dohnal, FI MUNI, 2023 9
Need for Distributing Data
◼ Brings data closer to its user
◼ Allows site independence
◼ Separates
Online transaction processing
Read-intensive applications
◼ Can reduce conflicts during user requests
◼ Process large volumes of data
PA152, Vlastislav Dohnal, FI MUNI, 2023 10
Horizontal Scaling
◼ Systems are distributed across multiple
machines or nodes
 Commodity machines → cost effective
 Often surpasses scalability of vertical approach
◼ Fallacies of distributed computing by Peter Deutsch
 Network
◼ Is reliable, secure, homogeneous
◼ Topology does not change
◼ Latency and transport cost is zero
◼ Bandwidth is infinite
 One administrator
PA152, Vlastislav Dohnal, FI MUNI, 2023 11
Source: https://blogs.oracle.com/jag/resource/Fallacies.html
Replication / Distribution Model
◼ Model of distributing data
Replication
◼ The same data stored in more nodes.
Filtering data (sharding)
◼ The data is partitioned and stored separately
◼ Helps avoid replication conflicts when multiple
sites are allowed to update data.
PA152, Vlastislav Dohnal, FI MUNI, 2023 12
Filtering Data (in general)
Subscriber
Horizontal Filtering
2
3
4
5
6
7
1
A B C D E F
3
6
2
A B C D E F
Vertical Filtering
2
3
4
5
6
7
1
A B C D E F
2
3
4
5
6
7
1
A B E
Table
TablePublisher
Source: Microsoft
PA152, Vlastislav Dohnal, FI MUNI, 2023 13
Distribution Model: Replication
◼ Master-slave model
Load-balancing of read-intensive queries
◼ Master node
manages data
distributes changes to slaves
◼ Slave node
stores data
queries data
no modifications
to data
PA152, Vlastislav Dohnal, FI MUNI, 2023 14
Slaves
Master
One master / many slaves
Distribution Model: Replication
◼ Master-master model
Typically, with sharding (filtering) data
◼ Master for a subset of data
◼ Slave for the rest
Consistency needs resolving update conflicts
PA152, Vlastislav Dohnal, FI MUNI, 2023 15
Multiple Masters
Master/Slave
Master/Slave
Master/Slave
Master-Master Model with Sharding
PA152, Vlastislav Dohnal, FI MUNI, 2023 16
Orders (Master A)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
1
1
1000
3100
~
~
15
22
Orders (Master B)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
2
2
1000
2380
~
~
32
8
Orders (Master C)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
3
3
1000
1070
~
~
7
19
Master/Slave
Master/SlaveMaster/Slave
Source: Microsoft
Replication Types (for “real” multi-master model)
PA152, Vlastislav Dohnal, FI MUNI, 2023 17
Snapshot Replication
Transactional Replication
Distributed Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Source: Microsoft
Replication Types
◼ Distributed Transactions
For “real” master-master model, ensures
consistency
Low latency, high consistency
◼ Transactional Replication
Replication of incremental changes
Minimal latency (typically online)
Conflicts solves using shared locks
PA152, Vlastislav Dohnal, FI MUNI, 2023 18
Snapshot Replication
Transactional Replication
Distributed
Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Replication Types
◼ Snapshot Replication
Periodic bulk transfer of new snapshots of
data
◼ Intermediate updates to data might be unnoticed
by “subscribers”
Data changes – substantial but infrequent
Slaves are read-only
High latency is acceptable
PA152, Vlastislav Dohnal, FI MUNI, 2023 19
Snapshot Replication
Transactional Replication
Distributed
Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Replication Types
◼ Merge Replication
Autonomous changes to replicated data are
later merged
◼ Default and custom conflict resolution rules
Does not guarantee transactional consistency,
but converges
Adv: Nodes can update data offline, sync later
Disadv: Changes to schema needed.
▪ E.g., row version, row originator
PA152, Vlastislav Dohnal, FI MUNI, 2023 20
Snapshot Replication
Transactional Replication
Distributed
Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Brewer’s CAP Theorem
◼ Consistency
 After an update, all readers in a distributed system
see the same data
 All nodes are supposed to always contain the
same data
 E.g., in multiple instances, all writes must be
duplicated before write operation is completed.
◼ Availability
 Every request receives a response
◼ about whether it was successful or failed
◼ Partition Tolerance
 System continues to operate despite arbitrary
message loss or failure of part of the system.
PA152, Vlastislav Dohnal, FI MUNI, 2023 21
Brewer’s CAP Theorem
◼ Only 2 of 3 guarantees can be given in a “shareddata”
system.
 Proved by Nancy Lynch in 2002
◼ ACID
 provides Availability and
Consistency
 E.g., replication with
distributed transactions
◼ BASE
 provides Availability and Partition
tolerance
 Reality: you can trade a little consistency for some
availability
 E.g., sharding with merge replication
PA152, Vlastislav Dohnal, FI MUNI, 2023 22
Source: http://bigdatanerd.wordpress.com
NewSQL
DB
NewSQL
◼ Distributed database systems that scale out
◼ CP systems
trade availability for consistency when partition
occurs
◼ MySQL cluster, Google Spanner, VoltDB, …
In fact, master-master replication with data
sharding
PA152, Vlastislav Dohnal, FI MUNI, 2023 23
BASE Properties
◼ Basically Available
Partial failures can occur, but without total
system failure
◼ Soft state
System is in flux / non-deterministic
◼ Changes occur all the time
◼ Eventual consistency (replica convergence)
is a liveness guarantee
◼ reads eventually return the same value
is not safety guarantee
◼ can return any value before it converges
PA152, Vlastislav Dohnal, FI MUNI, 2023 24
Consistency
◼ Strong (ACID) vs. Eventual (BASE)
consistency
◼ Example:
PA152, Vlastislav Dohnal, FI MUNI, 2023 25
Server A: read(A)=1 write(A,2) read(A)=2
Server B: read(A)= 1 read(A)=1 read(A)=2
time
Server C: read(A)= 1 read(A)=2
Server A: read(A)=1 write(A,2) read(A)=2
Server B: read(A)= 1 read(A)=2 read(A)=2
Server C: read(A)= 1 read(A)=2
Inconsistent state
EventualStrong
Maintaining High-Availability of DBMS
◼ Standby server
Shared disk failover (NAS)
File system replication (DRBD)
Transaction log shipping
Trigger-based replication
Statement-Based
Replication
Middleware
PA152, Vlastislav Dohnal, FI MUNI, 2023 26
Clients
Primary
Node
Secondary/
Standby
Node
Cluster
Log-shipping Standby Server
◼ Also called warm standby
◼ Primary node
 serves all queries
 in permanent archiving mode
◼ Continuous sending of WAL records to standby servers
◼ Standby server
 serves no queries
 in permanent recovery mode
◼ Continuous processing of WAL records arriving from primary
node
◼ Log shipping can be synchronous/asynchronous
◼ Disadvantage: all tables are replicated typically
◼ Advantage: no schema changes, no trigger
definitions
PA152, Vlastislav Dohnal, FI MUNI, 2023 27
Failure of a node
◼ If standby fails, no action taken.
After becoming online, catch-up procedure is
started.
◼ If primary fails, standby server begins
failover.
Standby applies all WAL records pending,
marks itself as primary,
starts to serve all queries.
◼ Heartbeat mechanism
to continually verify the connectivity between
the two and the viability of the primary server
PA152, Vlastislav Dohnal, FI MUNI, 2023 28
Failover
◼ Failover by standby succeeded
New standby should be configured
Original primary node becomes available
→ inform it that it is no longer the primary
◼ do so-called STONITH (Shoot The Other Node In
The Head),
◼ otherwise, serious data corruption/loss may occur
Typically, old primary becomes new standby
PA152, Vlastislav Dohnal, FI MUNI, 2023 29
Primary and Standby Servers
◼ Swap primary and standby regularly
To verify recovery steps
To do necessary maintenance on standby
server
◼ SW/HW upgrades, …
PA152, Vlastislav Dohnal, FI MUNI, 2023 30
PostgreSQL: Replication
PA152, Vlastislav Dohnal, FI MUNI, 2023
◼ TPC Benchmark B
31
scale factor 1
1 trans. = 5 queries
2x server
(AMD Opteron 8439 SE,
1024 MB RAM,
20 GB HDD)
Recommended HA Practices
◼ Maximize availability at each tier of the
application
Independent power supply to the primary
server
Keep standby servers on a different subnet
◼ Test whether your availability solution
works
PA152, Vlastislav Dohnal, FI MUNI, 2023 32
Lecture Takeaways
◼ Term of Availability and its classification
◼ Possible techniques (sharding / replication)
◼ CAP Theorem & BASE systems
◼ Know possible implementation in relational
DBMS
PA152, Vlastislav Dohnal, FI MUNI, 2023 33
Lectures ain’t over yet! What next?
◼ Lectures on May 3 and 10 are cancelled. The date May 17 is
not valid since there was no public holiday on Wednesday.
◼ Follow-up courses
 PV003 Architektura relačních databázových systémů
 PA128 Similarity Searching in Multimedia Data
 PA212 Advanced Search Techniques for Large Scale Data
Analytics
 PA195 NoSQL Databases
 PA220 Data Warehouses
◼ Research topics for both master thesis or PhD!
 Research in Motion Capture Data
 Similarity Operators in Databases
 AI/machine-learning for data indexes
◼ Or come and have a chit-chat ((-:
PA152, Vlastislav Dohnal, FI MUNI, 2023 34