PA152: Efficient Use of DB 10. Failure Recovery Vlastislav Dohnal Contents ◼ Overview of integrity ◼ Transactions ◼ Logging in DBMS PA152, Vlastislav Dohnal, FI MUNI, 2024 2 PA152, Vlastislav Dohnal, FI MUNI, 2024 3 Integrity or correctness of data ◼ Would like data to be “accurate” or “correct” at all times Name Newman Altman Freeman Age 52 3421 1 Employee PA152, Vlastislav Dohnal, FI MUNI, 2024 4 Integrity or correctness of data ◼ Integrity constraints Main approach to consistency of DB Predicates that data must satisfy ◼ Examples: Domain(x) = {red, blue, green} x is a key of relation R A valid value for attribute x of R (foreign key) Functional dependency: x → y PA152, Vlastislav Dohnal, FI MUNI, 2024 5 Integrity or correctness of data ◼ Consistent state satisfies all constraints ◼ Consistent DB DB in consistent state PA152, Vlastislav Dohnal, FI MUNI, 2024 6 Limits of integrity constraints ◼ May not capture “full correctness” ◼ Examples: (Transaction constraints) No employee should make more than twice the average salary. Student scholarship may not exceed 30k per month in total. When a bank account is deleted, balance = 0 PA152, Vlastislav Dohnal, FI MUNI, 2024 7 Limits of integrity constraints ◼ Some could be “emulated” by simple constraints Deletion of account replaced with deletion flag acc.no. … balance deletedaccount PA152, Vlastislav Dohnal, FI MUNI, 2024 8 Limits of integrity constraints ◼ Database should reflect real world. ◼ Continue with constraints  even though some part of “reality” cannot be defined as constraint or DB does not mirror reality ◼ Observation  DB cannot always be consistent. DB Reality PA152, Vlastislav Dohnal, FI MUNI, 2024 9 Example of inconsistent state ◼ Constraint example: a1 + a2 +…. an = TOT ◼ Depositing 100 CZK to account a2 a2  a2 + 100 TOT  TOT + 100 . . 50 . . 1000 . . 150 . . 1000 . . 150 . . 1100 a2 TOT PA152, Vlastislav Dohnal, FI MUNI, 2024 10 Solving inconsistencies ◼ Transaction Collection of actions (updating data) that preserve consistency ◼ the actions are ordered – it’s a sequence. Consistent DB Consistent DB’ T PA152, Vlastislav Dohnal, FI MUNI, 2024 11 Transaction Processing ◼ Assumption If T starts with consistent state and T executes in isolation → T leaves DB in a consistent state ◼ Correctness If we finish running transactions, DB is left consistent Each transaction sees a consistent DB PA152, Vlastislav Dohnal, FI MUNI, 2024 12 Consistency Violation ◼ Possible causes: Transaction bug DBMS bug Hardware failure ◼ E.g., a disk crash during storing updates to accounts Data sharing ◼ E.g., T1: give 10% raise to programmers T2: change programmers → systems analysts PA152, Vlastislav Dohnal, FI MUNI, 2024 13 Prevent Consistency Violations ◼ Failure model Identify possible risks Handle individual component failures CPU M Dmemory processor disk bus PA152, Vlastislav Dohnal, FI MUNI, 2024 14 Prevent Consistency Violations ◼ Failure model Categorize risks Events Desired Undesired Expected Unexpected PA152, Vlastislav Dohnal, FI MUNI, 2024 15 Prevent Consistency Violations ◼ Events Desired ◼ See product manuals… ☺ Undesired expected ◼ Memory lost ◼ CPU halts, resets ◼ Forcible shutdown Undesired Unexpected (Everything else) ◼ Disk data is lost ◼ Memory lost without CPU halt ◼ Disaster – fire, flooding, … PA152, Vlastislav Dohnal, FI MUNI, 2024 16 Failure Model ◼ Approach: Add low-level checks Redundancy to increase probability model holds ◼ E.g., Replicate disk storage (stable store, RAID) Memory parity, ECC CPU checks PA152, Vlastislav Dohnal, FI MUNI, 2024 17 Failure Model ◼ Focusing on memory and disk drive ◼ Key problem Unfinished transactions E.g., Memory x x Disk Constraint: A=B Transaction T1: A  A  2 B  B  2 PA152, Vlastislav Dohnal, FI MUNI, 2024 18 Transaction ◼ Elementary operations Input (x): block containing x → memory Read (x,t): a. Input(x), if necessary, b. t := value of x in block Write (x,t): a. Input(x), if necessary, b. value of x in block := t Output (x): block containing x → disk PA152, Vlastislav Dohnal, FI MUNI, 2024 19 Example: Transaction T1 T1: Read (A,t); t  t  2; Write (A,t); Read (B,t); t  t  2; Write (B,t); Output (A); Output (B); A: 8 B: 8 memory disk 16 A: 8 B: 8 16 Failure! 16 PA152, Vlastislav Dohnal, FI MUNI, 2024 20 Transaction ◼ Atomicity Solution to unfinished transactions Execute all actions of a transaction or none at all ◼ How to implement atomicity? Log changes done to data ◼ i.e., create a journal (file with records about changes) PA152, Vlastislav Dohnal, FI MUNI, 2024 21 Logging ◼ Transaction produces records of changes into journal Start, End, Output, Write, … ◼ Uses: System failure→ redo/undo changes following the journal Recovery from backup → redo changes following the journal PA152, Vlastislav Dohnal, FI MUNI, 2024 22 Logging ◼ During recovery after system failure Some transactions are done again ◼ REDO Some transactions are aborted ◼ UNDO PA152, Vlastislav Dohnal, FI MUNI, 2024 23 Undo logging ◼ Property Changes done in transaction are immediately propagated to disk Original (previous) value is logged. ◼ If not sure (100%) about storing of changes done during finished transaction Undo the changes in the data from journal ◼ i.e., recover last consistent DB → Transaction has not ever been executed PA152, Vlastislav Dohnal, FI MUNI, 2024 24 Undo logging: Transaction T1 Read (A,t); t  t  2; Write (A,t); Read (B,t); t  t  2; Write (B,t); Output (A); Output (B); A: 8 B: 8 memory disk A: 8 B: 8 16 journal 16 16 16 T1: Remark: requiring validity of A=B PA152, Vlastislav Dohnal, FI MUNI, 2024 25 Undo logging ◼ Inconvenience Logging uses buffer manager too → accumulated in memory, stored to disk later. A: 8 16 B: 8 16 Log: A: 8 B: 8 16 Error # 1 memory disk journal PA152, Vlastislav Dohnal, FI MUNI, 2024 26 Undo logging ◼ Inconvenience Logging uses a buffer manager too → accumulated in memory, stored to disk later. A: 8 16 B: 8 16 Log: A: 8 B: 8 16 Error # 2 memory disk journal ... PA152, Vlastislav Dohnal, FI MUNI, 2024 27 Undo logging ◼ Rules 1. For every action write(X,t), generate undo log record containing old value of X 2. Before X is modified on disk (output(X)), log records pertaining to X must be on disk ▪ i.e., write-ahead logging (WAL) 3. Before commit is flushed to log, all writes of transaction must be reflected on disk. PA152, Vlastislav Dohnal, FI MUNI, 2024 28 Undo logging – recovery after failure ◼ For every Ti with in journal: If or is in log, do nothing Else for every in journal: ◼ write(X, v) ◼ output(X) ◼ write to journal Is it correct? PA152, Vlastislav Dohnal, FI MUNI, 2024 29 Undo logging – recovery after failure 1. S = set of transactions  with in log,  but no or in log 2. For each in log  in the reverse order do (latest → earliest)  If Ti  S, then write(X, v) and output (X) 3. For each Ti  S  write to log ◼ after successful writing, all output(X) to disk PA152, Vlastislav Dohnal, FI MUNI, 2024 30 Undo logging – recovery after failure ◼ Failure during recovery No problem ◼ UNDO can be done repeatedly (is idempotent) ◼ Done for unfinished transactions PA152, Vlastislav Dohnal, FI MUNI, 2024 31 Redo logging ◼ Properties Logging of new (updated) values Changes done by transaction are stored later ◼ → after transaction’s commit ◼ i.e., requires storing log records before any change is done to DB. ◼ May save some intermediate writes to disk. Unfinished transactions are skipped during recovery PA152, Vlastislav Dohnal, FI MUNI, 2024 32 Redo logging: Transaction T1 Read (A,t); t  t  2; Write (A,t); Read (B,t); t  t  2; Write (B,t); Output (A); Output (B); A: 8 B: 8 memory disk A: 8 B: 8 16 journal 16 16 16 T1: PA152, Vlastislav Dohnal, FI MUNI, 2024 33 Redo logging ◼ Rules 1. For every action write(X,t), generate log record containing a new value of X 2. Before X is modified on disk (in DB) (output(X)), all log records that modified X (including commit) must be on disk. 3. For transaction modifying X 1. Flush log records to disk 2. Write updated blocks to disk 3. Write end to journal PA152, Vlastislav Dohnal, FI MUNI, 2024 34 Redo logging – recovery after failure ◼ For every Ti with in log, do: For all in log: ◼ write(X, v) ◼ output(X) Is it correct? PA152, Vlastislav Dohnal, FI MUNI, 2024 35 Redo logging – recovery after failure 1. S = set of transactions  with in log,  but no 2. For each in log  Do in forward order (earliest → latest)  If Ti  S, then write(X, v) and output (X) 3. For each Ti  S  write to log Combining Records ◼ Want to delay DB flushes for hot objects PA152, Vlastislav Dohnal, FI MUNI, 2024 36 Say X is branch balance: T1: ... update X... T2: ... update X... T3: ... update X... T4: ... update X... Log actions: write X, v1 output X write X , v2 output X write X , v3 output X write X , v4 output X combined PA152, Vlastislav Dohnal, FI MUNI, 2024 37 Redo logging – recovery after failure ◼ Storing changes by output(X) If there are more transactions changing X, then output(X) can be done for the last log record only end can also be combined for multiple transactions PA152, Vlastislav Dohnal, FI MUNI, 2024 38 Redo logging – recovery after failure ◼ Recovery is very slow if end(T) is not used (or delayed…) ... ... ... Failure First record (1 year ago) T1 updates A,B Committed 1 year ago → STILL needed for recovery! Last record Transaction Journal: Does DB know what transactions are active here? PA152, Vlastislav Dohnal, FI MUNI, 2024 39 Logging – recovery after failure ◼ Solution to slowness → checkpoints ◼ Periodically do: 1. Do not accept new transactions 2. Wait until all transactions finish 3. Flush all log records to disk (log) 4. Flush all buffers to disk (DB) 5. Write “checkpoint” record on disk (log) 6. Resume transaction processing PA152, Vlastislav Dohnal, FI MUNI, 2024 40 Logging – recovery after failure ◼ Procedure during recovery Locate last checkpoint Start recovery from this place ◼ Example for redo logging Checkpoint ... ... ... ... ... ... PA152, Vlastislav Dohnal, FI MUNI, 2024 41 Logging ◼ Key drawbacks Writes to disk are controlled by logging rules and not be accesses to data. Undo logging ◼ cannot bring backup DB copies up to date Redo logging ◼ need to keep all modified blocks in memory until commit ◼ Solution: Undo/Redo logging Log record contains old and new value of X: PA152, Vlastislav Dohnal, FI MUNI, 2024 42 Undo/Redo logging ◼ Rules  Page X can be flushed before or after Ti ‘s commit  Log record flushed before corresponding updated page (WAL)  Flush log records at commit ◼ Recovery  Finished (committed) transactions are re-done from beginning  Unfinished transactions are rolled back (un-done) from end PA152, Vlastislav Dohnal, FI MUNI, 2024 43 Undo/Redo logging – recovery ◼ Example of undo/redo log: ... ... ... ... ... ... PA152, Vlastislav Dohnal, FI MUNI, 2024 44 Checkpoints ◼ Simple checkpoint No transaction can be active during creating checkpoint Transaction throughput considerably lowered! ◼ Solution Non-quiescent Checkpoint ◼ Register active transactions ◼ UNDO/REDO logging:  all modified pages (blocks) are flushed to disk PA152, Vlastislav Dohnal, FI MUNI, 2024 45 Non-quiescent Checkpoint ◼ Store start and end of checkpoint Start-ckpt active TR: T1,T2,... End ckpt .........Log Dirty buffer pages flushed (all, i.e., finished & active (unfinished) ones) Pointers to beginnings of transactions PA152, Vlastislav Dohnal, FI MUNI, 2024 46 Non-quiescent Checkpoint ◼ Recovery 1 ◼ T1 has not been committed → Undo T1 (undo changes to b, a) T1 a ... Start-ckpt T1 ... End ckpt ... T1 b ... ◼ Recovery 2 ◼ T1 has been committed → Redo T1 (redo b,c) PA152, Vlastislav Dohnal, FI MUNI, 2024 47 ... T1 a ... ... T1 b ... ... T1 c ... T1 cmt ... ckpt- end ckpt-s T1 Non-quiescent Checkpoint PA152, Vlastislav Dohnal, FI MUNI, 2024 48 ... ckpt start ... ... T1 b ... ... T1 c ... ckpt- start ckpt end Non-quiescent Checkpoint ◼ Recovery 3 ◼ Unfinished checkpoint → Locate last finished checkpoint Start undo/redo of transactions PA152, Vlastislav Dohnal, FI MUNI, 2024 49 Recovery process ◼ Backwards pass (end of log → latest valid checkpoint start) 1. construct set S of committed transactions 2. undo actions of transactions not in S ◼ Remark: Undo pending transactions  Follow undo chains for transactions in checkpoint active list ◼ Forward pass (latest checkpoint start → end of log)  redo actions of S transactions (without end) backward pass forward pass start check- point PA152, Vlastislav Dohnal, FI MUNI, 2024 50 Real world transaction ◼ Withdraw cash from ATM Info about bank accounts HW of ATM ◼ Implementation Transaction in DB Dispense money ◼ Procedure Do DB transaction, money dispensing after commit. Dispensing should be made idempotent. PA152, Vlastislav Dohnal, FI MUNI, 2024 51 Real world transaction ◼ After DB transaction, a “signal“ for money dispensing is sent $ give(amount) lastTid: time:Give$$(amount, Tid, time) PA152, Vlastislav Dohnal, FI MUNI, 2024 53 Media Failure ◼ RAID ◼ Make copies of data E.g., ◼ Keep 3 copies ◼ Output(X) → three outputs ◼ Input(X) → three inputs + voting X1 X2 X3 PA152, Vlastislav Dohnal, FI MUNI, 2024 54 Media Failure ◼ Make copies of data Other solution ◼ Keep 3 copies ◼ Output(X) → three outputs ◼ Input(X) → read from first (if ok, continue) → read from second, … ◼ Assumption  bad data can be detected X1 X2 X3 PA152, Vlastislav Dohnal, FI MUNI, 2024 55 Media Failure ◼ DB backup (dump) Recover DB backup Apply log ◼ Use redo entries of each transaction not finished at the backup time backup DB active DB log PA152, Vlastislav Dohnal, FI MUNI, 2024 56 Discarding Log ◼ When can log be discarded? In case of UNDO/REDO logging not needed for media recovery not needed for undo after system failure not needed for redo after system failure time check- point db dump last needed undo log check-point check-point PA152, Vlastislav Dohnal, FI MUNI, 2024 57 LOG DATADATADATA STABLE STORAGE UNSTABLE STORAGE WRITE log records before commit WRITE modified pages after commit RECOVERY Pi Pj DATABASE BUFFER LOG BUFFER lri lrj PA152, Vlastislav Dohnal, FI MUNI, 2024 58 Logging in SQLServer 2000 Free Log caches Flush Log caches Current Log caches Flush Log caches db writer Flush queue Waiting processes LOG DATA free Bi free Bj Log entries: - LSN - before and after images or logical log Lazy- writer Synchronous I/O Asynchronous I/O DATABASE BUFFER DB2 v7 uses similar schema PA152, Vlastislav Dohnal, FI MUNI, 2024 59 Logging in Oracle 8i Rollback segments (fixed size) After images (redo entries) Log buffer (default 32 Kb) Bi Bj Free list DATABASE BUFFER LOG DATA Rollback Segments Log File #1 Log File #2 Log Writer Database Writer Before images PA152, Vlastislav Dohnal, FI MUNI, 2024 60 Storing Log ◼ On dedicated disk ◼ Log records are stored sequentially ◼ Sequential writes are much faster than random ones (on a magnetic disk) Disk for logging should not store any other data + sequential I/O + loss of log is not dependent on loss of DB PA152, Vlastislav Dohnal, FI MUNI, 2024 61 Storing Log 0 50 100 150 200 250 300 350 controller cache no controller cache Throughput(tuples/sec) Log on same disk Log on separate disk 300 000 transactions Each transaction = 1x INSERT DB2 v7.1 server 5% improvement when log stored on dedicated disk Controller Cache diminishes negative impact of non-dedicated disk HW: middle server, Adaptec RAID controller (80Mb RAM), 2x18Gb disk. PA152, Vlastislav Dohnal, FI MUNI, 2024 62 Flushing Buffers ◼ Flushing dirty page When a threshold of modified pages is reached (Oracle 8) When the ratio of free pages drops below a threshold (less than 3% in SQLServer 7) After checkpoint Periodically PA152, Vlastislav Dohnal, FI MUNI, 2024 63 Creating Checkpoints 0 0.2 0.4 0.6 0.8 1 1.2 0 checkpoint 4 checkpoints ThroughputRatio Performance influence (decreased throughput) Reduces size of log Shortens time to recover after failure 300 000 transactions Each transaction = one INSERT command Oracle 8i, Windows 2000 PA152, Vlastislav Dohnal, FI MUNI, 2024 64 Lecture Takeaways ◼ Data consistency One source of problems: failures ◼ Solutions: (i) logging; (ii) redundancy Another source of problems: data sharing ◼ Solution: (i) Locking data during transactions ▪ Not done in this course… ◼ Logging Know principles and limitations Understand checkpoints Be able to do recovery