PA152: Efficient Use of DB 13. Replication and High Availability Vlastislav Dohnal PA152, Vlastislav Dohnal, FI MUNI, 2023 2 Credits ◼ This presentation is based on:  Microsoft MSDN library  Course NoSQL databases and Big Data management ◼ Irena Holubová ◼ Charles University, Prague ◼ http://www.ksi.mff.cuni.cz/~holubova/NDBI040/ PostgreSQL documentation ◼ http://www.postgresql.org/docs/9.3/static/high- availability.html PA152, Vlastislav Dohnal, FI MUNI, 2023 3 Contents ◼ Availability ◼ Data distribution & replication ◼ High availability ◼ Failover ◼ Recommendations Availability PA152, Vlastislav Dohnal, FI MUNI, 2023 4 DB Server DB ServerSource: Microsoft High availability (HA) is the ability of a system to operate continuously without failing for a specified period of time. Determining Availability ◼ Hours of Operation Business hours vs. all of the time ◼ intranet service vs. web service ◼ shift workers vs. all-around the world customers ◼ Connectivity Requirements Tight/Loose coupling of app and DBMS ◼ Synchronous vs. asynchronous data updates Online vs. offline applications – so response time can be important! PA152, Vlastislav Dohnal, FI MUNI, 2023 5 Availability ◼ Definition using operation hours  Av = “up time” / “total time” ◼ “up time” = the system is up and operating  More practical def. ◼ Av = (total time - down time) / total time ◼ Down time  Scheduled – reboot, SW/HW upgrade, …  Unscheduled – HW/SW failure, security breaches, network unavailability, power outage, disasters, …  Non-functional app requirements – response time ◼ For “true” high-availability, down time is not distinguished PA152, Vlastislav Dohnal, FI MUNI, 2023 6 Nines ◼ Availability as percentage of uptime Class of nines: 𝑐 = − log10 1 − 𝐴𝑣 ◼ Assuming 24/7 operation: PA152, Vlastislav Dohnal, FI MUNI, 2023 7 Nine class Availability Downtime per year Downtime per month Downtime per week 1 90% 36.5 days 72 hours 16.8 hours 2 99% 3.65 days 7.20 hours 1.68 hours 3 99.9% 8.76 hours 43.8 minutes 10.1 minutes 4 99.99% 52.56 minutes 4.32 minutes 1.01 minutes 5 99.999% 5.26 minutes 25.9 seconds 6.05 seconds 6 99.9999% 31.5 seconds 2.59 seconds 0.605 seconds 7 99.99999% 3.15 seconds 0.259 seconds 0.0605 seconds Source: Wikipedia.org Scalability ◼ Providing access to a number of concurrent users ◼ Handling growing amounts of data without losing performance ◼ With acceptable latency! PA152, Vlastislav Dohnal, FI MUNI, 2023 8 Scalability: Solutions ◼ Scaling Up – vertical scaling → vendor dependence Increasing RAM Multiprocessing ◼ Scaling Out – horizontal scaling Server federations / clusters Data partitioning ◼ sharding Replication ◼ Read-only (standby) servers PA152, Vlastislav Dohnal, FI MUNI, 2023 9 Need for Distributing Data ◼ Brings data closer to its user ◼ Allows site independence ◼ Separates Online transaction processing Read-intensive applications ◼ Can reduce conflicts during user requests ◼ Process large volumes of data PA152, Vlastislav Dohnal, FI MUNI, 2023 10 Horizontal Scaling ◼ Systems are distributed across multiple machines or nodes  Commodity machines → cost effective  Often surpasses scalability of vertical approach ◼ Fallacies of distributed computing by Peter Deutsch  Network ◼ Is reliable, secure, homogeneous ◼ Topology does not change ◼ Latency and transport cost is zero ◼ Bandwidth is infinite  One administrator PA152, Vlastislav Dohnal, FI MUNI, 2023 11 Source: https://blogs.oracle.com/jag/resource/Fallacies.html Replication / Distribution Model ◼ Model of distributing data Replication ◼ The same data stored in more nodes. Filtering data (sharding) ◼ The data is partitioned and stored separately ◼ Helps avoid replication conflicts when multiple sites are allowed to update data. PA152, Vlastislav Dohnal, FI MUNI, 2023 12 Filtering Data (in general) Subscriber Horizontal Filtering 2 3 4 5 6 7 1 A B C D E F 3 6 2 A B C D E F Vertical Filtering 2 3 4 5 6 7 1 A B C D E F 2 3 4 5 6 7 1 A B E Table TablePublisher Source: Microsoft PA152, Vlastislav Dohnal, FI MUNI, 2023 13 Distribution Model: Replication ◼ Master-slave model Load-balancing of read-intensive queries ◼ Master node manages data distributes changes to slaves ◼ Slave node stores data queries data no modifications to data PA152, Vlastislav Dohnal, FI MUNI, 2023 14 Slaves Master One master / many slaves Distribution Model: Replication ◼ Master-master model Typically, with sharding (filtering) data ◼ Master for a subset of data ◼ Slave for the rest Consistency needs resolving update conflicts PA152, Vlastislav Dohnal, FI MUNI, 2023 15 Multiple Masters Master/Slave Master/Slave Master/Slave Master-Master Model with Sharding PA152, Vlastislav Dohnal, FI MUNI, 2023 16 Orders (Master A) Primary Key Area Id Order_no 1 1 2 2 3 3 1000 3100 1000 2380 1000 1070 ~ ~ ~ ~ ~ ~ Qty 15 22 32 8 7 19 1 1 1000 3100 ~ ~ 15 22 Orders (Master B) Primary Key Area Id Order_no 1 1 2 2 3 3 1000 3100 1000 2380 1000 1070 ~ ~ ~ ~ ~ ~ Qty 15 22 32 8 7 19 2 2 1000 2380 ~ ~ 32 8 Orders (Master C) Primary Key Area Id Order_no 1 1 2 2 3 3 1000 3100 1000 2380 1000 1070 ~ ~ ~ ~ ~ ~ Qty 15 22 32 8 7 19 3 3 1000 1070 ~ ~ 7 19 Master/Slave Master/SlaveMaster/Slave Source: Microsoft Replication Types (for “real” multi-master model) PA152, Vlastislav Dohnal, FI MUNI, 2023 17 Snapshot Replication Transactional Replication Distributed Transactions Lower Autonomy Lower Latency Higher Autonomy Higher Latency Merge Replication Source: Microsoft Replication Types ◼ Distributed Transactions For “real” master-master model, ensures consistency Low latency, high consistency ◼ Transactional Replication Replication of incremental changes Minimal latency (typically online) Conflicts solves using shared locks PA152, Vlastislav Dohnal, FI MUNI, 2023 18 Snapshot Replication Transactional Replication Distributed Transactions Lower Autonomy Lower Latency Higher Autonomy Higher Latency Merge Replication Replication Types ◼ Snapshot Replication Periodic bulk transfer of new snapshots of data ◼ Intermediate updates to data might be unnoticed by “subscribers” Data changes – substantial but infrequent Slaves are read-only High latency is acceptable PA152, Vlastislav Dohnal, FI MUNI, 2023 19 Snapshot Replication Transactional Replication Distributed Transactions Lower Autonomy Lower Latency Higher Autonomy Higher Latency Merge Replication Replication Types ◼ Merge Replication Autonomous changes to replicated data are later merged ◼ Default and custom conflict resolution rules Does not guarantee transactional consistency, but converges Adv: Nodes can update data offline, sync later Disadv: Changes to schema needed. ▪ E.g., row version, row originator PA152, Vlastislav Dohnal, FI MUNI, 2023 20 Snapshot Replication Transactional Replication Distributed Transactions Lower Autonomy Lower Latency Higher Autonomy Higher Latency Merge Replication Brewer’s CAP Theorem ◼ Consistency  After an update, all readers in a distributed system see the same data  All nodes are supposed to always contain the same data  E.g., in multiple instances, all writes must be duplicated before write operation is completed. ◼ Availability  Every request receives a response ◼ about whether it was successful or failed ◼ Partition Tolerance  System continues to operate despite arbitrary message loss or failure of part of the system. PA152, Vlastislav Dohnal, FI MUNI, 2023 21 Brewer’s CAP Theorem ◼ Only 2 of 3 guarantees can be given in a “shareddata” system.  Proved by Nancy Lynch in 2002 ◼ ACID  provides Availability and Consistency  E.g., replication with distributed transactions ◼ BASE  provides Availability and Partition tolerance  Reality: you can trade a little consistency for some availability  E.g., sharding with merge replication PA152, Vlastislav Dohnal, FI MUNI, 2023 22 Source: http://bigdatanerd.wordpress.com NewSQL DB NewSQL ◼ Distributed database systems that scale out ◼ CP systems trade availability for consistency when partition occurs ◼ MySQL cluster, Google Spanner, VoltDB, … In fact, master-master replication with data sharding PA152, Vlastislav Dohnal, FI MUNI, 2023 23 BASE Properties ◼ Basically Available Partial failures can occur, but without total system failure ◼ Soft state System is in flux / non-deterministic ◼ Changes occur all the time ◼ Eventual consistency (replica convergence) is a liveness guarantee ◼ reads eventually return the same value is not safety guarantee ◼ can return any value before it converges PA152, Vlastislav Dohnal, FI MUNI, 2023 24 Consistency ◼ Strong (ACID) vs. Eventual (BASE) consistency ◼ Example: PA152, Vlastislav Dohnal, FI MUNI, 2023 25 Server A: read(A)=1 write(A,2) read(A)=2 Server B: read(A)= 1 read(A)=1 read(A)=2 time Server C: read(A)= 1 read(A)=2 Server A: read(A)=1 write(A,2) read(A)=2 Server B: read(A)= 1 read(A)=2 read(A)=2 Server C: read(A)= 1 read(A)=2 Inconsistent state EventualStrong Maintaining High-Availability of DBMS ◼ Standby server Shared disk failover (NAS) File system replication (DRBD) Transaction log shipping Trigger-based replication Statement-Based Replication Middleware PA152, Vlastislav Dohnal, FI MUNI, 2023 26 Clients Primary Node Secondary/ Standby Node Cluster Log-shipping Standby Server ◼ Also called warm standby ◼ Primary node  serves all queries  in permanent archiving mode ◼ Continuous sending of WAL records to standby servers ◼ Standby server  serves no queries  in permanent recovery mode ◼ Continuous processing of WAL records arriving from primary node ◼ Log shipping can be synchronous/asynchronous ◼ Disadvantage: all tables are replicated typically ◼ Advantage: no schema changes, no trigger definitions PA152, Vlastislav Dohnal, FI MUNI, 2023 27 Failure of a node ◼ If standby fails, no action taken. After becoming online, catch-up procedure is started. ◼ If primary fails, standby server begins failover. Standby applies all WAL records pending, marks itself as primary, starts to serve all queries. ◼ Heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary server PA152, Vlastislav Dohnal, FI MUNI, 2023 28 Failover ◼ Failover by standby succeeded New standby should be configured Original primary node becomes available → inform it that it is no longer the primary ◼ do so-called STONITH (Shoot The Other Node In The Head), ◼ otherwise, serious data corruption/loss may occur Typically, old primary becomes new standby PA152, Vlastislav Dohnal, FI MUNI, 2023 29 Primary and Standby Servers ◼ Swap primary and standby regularly To verify recovery steps To do necessary maintenance on standby server ◼ SW/HW upgrades, … PA152, Vlastislav Dohnal, FI MUNI, 2023 30 PostgreSQL: Replication PA152, Vlastislav Dohnal, FI MUNI, 2023 ◼ TPC Benchmark B 31 scale factor 1 1 trans. = 5 queries 2x server (AMD Opteron 8439 SE, 1024 MB RAM, 20 GB HDD) Recommended HA Practices ◼ Maximize availability at each tier of the application Independent power supply to the primary server Keep standby servers on a different subnet ◼ Test whether your availability solution works PA152, Vlastislav Dohnal, FI MUNI, 2023 32 Lecture Takeaways ◼ Term of Availability and its classification ◼ Possible techniques (sharding / replication) ◼ CAP Theorem & BASE systems ◼ Know possible implementation in relational DBMS PA152, Vlastislav Dohnal, FI MUNI, 2023 33 Lectures ain’t over yet! What next? ◼ Lectures on May 3 and 10 are cancelled. The date May 17 is not valid since there was no public holiday on Wednesday. ◼ Follow-up courses  PV003 Architektura relačních databázových systémů  PA128 Similarity Searching in Multimedia Data  PA212 Advanced Search Techniques for Large Scale Data Analytics  PA195 NoSQL Databases  PA220 Data Warehouses ◼ Research topics for both master thesis or PhD!  Research in Motion Capture Data  Similarity Operators in Databases  AI/machine-learning for data indexes ◼ Or come and have a chit-chat ((-: PA152, Vlastislav Dohnal, FI MUNI, 2023 34