PA152: Efficient Use of DB 6. Query Processing Vlastislav Dohnal PA152, Vlastislav Dohnal, FI MUNI, 2023 2 Query Processing ◼ Procedure: Query Checking syntax and semantics ◼ Parse Tree Logical query plan ◼ Plan modifications Physical query plan Evaluation PA152, Vlastislav Dohnal, FI MUNI, 2023 3 Example ◼ Relation R(A,B,C) S(C,D,E) ◼ Query select B,D from R,S where R.C=S.C and R.A=‘c’ and S.E=2 PA152, Vlastislav Dohnal, FI MUNI, 2023 4 Example R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 select B,D from R,S where R.C=S.C and R.A=‘c’ and S.E=2 PA152, Vlastislav Dohnal, FI MUNI, 2023 5 Example R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 Result: B D 2 x PA152, Vlastislav Dohnal, FI MUNI, 2023 6 How to evaluate this query? ◼ Cartesian product ◼ Selecting records ◼ Projection 1. way PA152, Vlastislav Dohnal, FI MUNI, 2023 7 R  S R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . c 2 10 10 x 2 . . PA152, Vlastislav Dohnal, FI MUNI, 2023 8 This record matches R  S R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . c 2 10 10 x 2 . . Output – query result select B,D from R,S where R.C=S.C and R.A=‘c’ and S.E=2 PA152, Vlastislav Dohnal, FI MUNI, 2023 9 Describing Evaluation Plans ◼ Using relational algebra B,D [ sR.A=‘c’ S.E=2  R.C = S.C (RS)] ◼ Example of Plan 1: Query plan B,D sR.A=‘c’ S.E=2  R.C=S.C  R S PA152, Vlastislav Dohnal, FI MUNI, 2023 10 Describing Evaluation Plans ◼ Example of Plan 2: B,D sR.A = ‘c’ sS.E = 2 R S natural join PA152, Vlastislav Dohnal, FI MUNI, 2023 11 Physical Plan ◼ Example of Plan 2: R S A B C s (R) s(S) C D E a 1 10 A B C C D E 10 x 2 b 1 20 c 2 10 10 x 2 20 y 2 c 2 10 20 y 2 30 z 2 d 2 35 30 z 2 40 x 1 e 3 45 50 y 3 PA152, Vlastislav Dohnal, FI MUNI, 2023 12 Describing Evaluation Plans ◼ Plan 3: Assume an index on R.A and on S.C Scan index R.A for looking up records of R satisfying R.A = “c” ◼ For each record found, take value of R.C to scan index on S.C to get matching records of S ◼ Filter out records of S, where S.E ≠ 2 Join the corresponding records of R and S Project on B,D PA152, Vlastislav Dohnal, FI MUNI, 2023 13 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 PA152, Vlastislav Dohnal, FI MUNI, 2023 14 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =‘c’ PA152, Vlastislav Dohnal, FI MUNI, 2023 15 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =“c” <10,x,2> PA152, Vlastislav Dohnal, FI MUNI, 2023 16 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =“c” <10,x,2> Check e=2? Output: <2,x> PA152, Vlastislav Dohnal, FI MUNI, 2023 17 next record: R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =“c” <10,x,2> Check=2? Output: <2,x> … PA152, Vlastislav Dohnal, FI MUNI, 2023 18 Overview on Query Optimization parser translate rules estimate sizes create physical query plans estimate costs of p.q.p. choose best execute {P1,P2,…..} {(P1,C1),(P2,C2)...} Pi result SQL query parse tree logical query plan „improved“ l.q.p. l.q.p. with sizes statistics PA152, Vlastislav Dohnal, FI MUNI, 2023 19 Example: SQL query ◼ Relations  StarsIn(title, year, starName)  MovieStar(name, birthdate) ◼ Query  Select movies with stars born in 1960:  SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE extract(year from birthdate) = 1960 ); PA152, Vlastislav Dohnal, FI MUNI, 2023 20 Example: Parse Tree SELECT FROM WHERE IN title StarsIn ( ) starName SELECT FROM WHERE = name MovieStar extract(year from birthdate) 1960 PA152, Vlastislav Dohnal, FI MUNI, 2023 21 Example: Translation to Rel. Algebra ◼ Selection has two arguments Must be transformed… ◼ IN operator i.e., avoiding nested queries (sub-selects) title s StarsIn IN name sextract(year from birthdate) = 1960 starName MovieStar PA152, Vlastislav Dohnal, FI MUNI, 2023 22 Example: Logical Query Plan ◼ IN operator replaced with product title sstarName=name StarsIn name sextract(year from birthdate) = 1960 MovieStar  PA152, Vlastislav Dohnal, FI MUNI, 2023 23 Example: Improving Logical Plan ◼ Substitution of product and selection by join ◼ Next option Push project to relation StarsIn? title starName=name StarsIn name sextract(year from birthdate) = 1960 MovieStar ◼ Before generating physical plans ◼ Influence estimation of evaluation costs PA152, Vlastislav Dohnal, FI MUNI, 2023 24 Example: Estimate Result Sizes Estimate size StarsIn MovieStar name sextract(year from birthdate) = 1960 PA152, Vlastislav Dohnal, FI MUNI, 2023 25 Example: One Physical Plan Parameters: relation order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: select condition, ... StarsIn MovieStar PA152, Vlastislav Dohnal, FI MUNI, 2023 26 Example: Estimate Costs Logical Query Plan Phys. Plan1 Plan2 …. Plann Cost1 Cost2 …. Costn Pick plan with best cost! PA152, Vlastislav Dohnal, FI MUNI, 2023 27 Query Optimization ◼ Relational algebra level ◼ Detailed query plan level Estimate costs ◼ Without indexes ◼ With indexes Generate and compare plans PA152, Vlastislav Dohnal, FI MUNI, 2023 28 Relational Algebra Optimization ◼ Transformation rules Must preserve equivalence What are good transformations? PA152, Vlastislav Dohnal, FI MUNI, 2023 29 Transformation Rules ◼ Natural join Relation order is not important since all attributes are preserved ◼ Example: R S = S R (R S) T = R (S T) R S S T RT PA152, Vlastislav Dohnal, FI MUNI, 2023 30 Transformation Rules ◼ Same for cartesian product and union R  S = S  R (R  S)  T = R  (S  T) R  S = S  R R  (S  T) = (R  S)  T PA152, Vlastislav Dohnal, FI MUNI, 2023 31 Transformation Rules ◼ Selects sp1p2(R) = sp1p2(R) = sp1p2(R) = sp1 [ sp2 (R)] [ sp1 (R)]  [ sp2 (R)] [ sp1 (R)]  [ sp2 (R)] PA152, Vlastislav Dohnal, FI MUNI, 2023 32 Question of Tuple Duplicates ◼ Sets vs. bags?  Relations are bags ◼ Example  R = {a,a,b,b,b,c}  S = {b,b,c,c,d} ◼ R  S = ?  MIN: R  S = {b,b,c} SQL: INTERSECT ALL ◼ R  S = ?  SUM: R  S = {a,a,b,b,b,b,b,c,c,c,d} SQL: UNION ALL  MAX: R  S = {a,a,b,b,b,c,c,d} PA152, Vlastislav Dohnal, FI MUNI, 2023 34 Option MAX: Select Decomposition ◼ Select decomposition: ◼ Example: R={a,a,b,b,b,c} a,b satisfy p1; b,c satisfy p2 sp1p2 (R) = sp1(R)  sp2(R) sp1p2 (R) = {a,a,b,b,b,c} sp1(R) = {a,a,b,b,b} sp2(R) = {b,b,b,c} sp1(R) max sp2(R) = {a,a,b,b,b,c} PA152, Vlastislav Dohnal, FI MUNI, 2023 35 Choice of Correct Option ◼ Pragmatic solution for  Use “SUM” for bag union “MAX” for dividing disjunctive predicates () ◼ Some rules cannot be applied to bags Associativity of except: R – (S – T) Distributivity: R  (S  T) != (R  S)  (R  T) PA152, Vlastislav Dohnal, FI MUNI, 2023 36 Transformation Rules ◼ Notation: X = set of attributes Y = set of attributes XY = X  Y ◼ Project pxy (R) = px [py (R)] PA152, Vlastislav Dohnal, FI MUNI, 2023 37 Transformation Rules ◼ Combining select and natural join ◼ Let p = expr. containing only attrs. of R q = expr. containing only attrs. of S m = expr. containing attrs. of both R, S [sp (R)]  S R  [sq (S)] sp (R  S) = sq (R  S) = PA152, Vlastislav Dohnal, FI MUNI, 2023 38 Transformation Rules ◼ Combining select and natural join Other rules can be derived spq (R  S) = [sp (R)]  [sq (S)] spqm (R  S) = sm [(sp (R))  (sq (S))] spq (R  S) = [(sp (R))  S] max [R (sq (S))] Transformation Rules ◼ Combining select and natural join Example of rule derivation PA152, Vlastislav Dohnal, FI MUNI, 2023 39 spq (R  S) = sp [sq (R  S) ] = sp [ R  sq (S) ] = [sp (R)]  [sq (S)] PA152, Vlastislav Dohnal, FI MUNI, 2023 40 Transformation Rules ◼ Combining select and natural join Example of rule derivation Let m = expr. containing only attrs. common in R and S, but does not compare them [sm (R)]  [sm (S)]sm (R  S) = PA152, Vlastislav Dohnal, FI MUNI, 2023 41 Transformation Rules ◼ Combining project and select ◼ Let x = attribute subset of R z = attributes referenced in expr. P (subset of R) (sp [ px (R) ])px pxz px[sp (R) ] = PA152, Vlastislav Dohnal, FI MUNI, 2023 42 Transformation Rules ◼ Combining project and natural join ◼ Let x = attribute subset of R y = attribute subset S z = attributes common in R and S pxy([pxz (R) ]  [pyz (S) ]) pxy (R  S) = ◼ Combining previous with select PA152, Vlastislav Dohnal, FI MUNI, 2023 43 Transformation Rules pxy (sp (R  S)) = pxy (sp [pxz’ (R)  pyz’ (S)]) z’ = z  {attributes referenced in P} ◼ Combining project, select and Cartesian product PA152, Vlastislav Dohnal, FI MUNI, 2023 44 Transformation Rules pxy (sp (R  S)) = ? PA152, Vlastislav Dohnal, FI MUNI, 2023 45 Transformation Rules ◼ Combining select and union ◼ Combining select and except Select can also be applied to S ◼ May be convenient for shrinking relation before doing except Are there some limits on P ? sp(R sum S) = sp(R) sum sp(S) sp(R – S) = sp(R) – S = sp(R) – sp(S) PA152, Vlastislav Dohnal, FI MUNI, 2023 46 Good Transformations ◼ Select early ◼ Project early Example: ◼ R(A,B,C,D,E,F,G,H,I,J) result={E} ◼ Filter using P: (A=3)  (B=“cat”) pE (sp (R)) vs. pE (sp (pABE(R))) px [sp (R)] → px (sp [pxz (R)]) sp (R  S) → [sp (R)]  S PA152, Vlastislav Dohnal, FI MUNI, 2023 47 Good Transformations sp1p2 (R) → sp1 [sp2 (R)] → sp2 [sp1 (R)] → [ sp1 (R)]  [ sp2 (R)] sp1p2 (R) → sp1(R) max sp2(R) R  S → S  R PA152, Vlastislav Dohnal, FI MUNI, 2023 48 Good Transformations ◼ Assume indexes On A and on B s(A=3)  (B=“cat”)(R) = s(A=3)(R)  s(B=“cat”)(R) A = 3B = ‘cat’ Record pointer: Intersect pointers to get pointers to matching tuples PA152, Vlastislav Dohnal, FI MUNI, 2023 49 Good Transformations: Summary ◼ Recommendations: No transformation is always good. Usually, good to ◼ select early ◼ project early ◼ Eliminate common sub-expressions ◼ Eliminate tuple duplicates Good Transformations: Example ◼ Push select to relations → apparently OK  But: Firstly, move them as far as possible; next push them back ◼ Example:  Relations: StarsIn(title, year, starName) Movie(title, year, studioName)  View: create view MoviesOf1996 as select * from Movie where year = 1996; ◼ Query: select starName, studioName from MoviesOf1996 natural join StarsIn; PA152, Vlastislav Dohnal, FI MUNI, 2023 50 starName,studioName MoviesOf1996 StarsIn syear=1996 starName,studioName Movie StarsIn syear=1996 starName,studioName Movie StarsIn syear=1996 PA152, Vlastislav Dohnal, FI MUNI, 2023 51 Query Evaluation: Overview ◼ Relational algebra level Transformation rules Apply good rules ◼ Detail query plan level Estimate costs Generate and compare plans PA152, Vlastislav Dohnal, FI MUNI, 2023 52 Estimating Cost of Query Plan 1. Estimate size of result 2. Estimate number of IOs PA152, Vlastislav Dohnal, FI MUNI, 2023 53 Estimating Result Size ◼ Keep statistics for relation R T(R) – # tuples in R S(R) – # of bytes in each R tuple ◼ S(R,A) – length in bytes of values of attribute A B(R) – # of blocks to hold all R tuples V(R, A) – # distinct values in R for attribute A ◼ Good estimates need Statistics up to date! PA152, Vlastislav Dohnal, FI MUNI, 2023 54 Statistics Example ◼ Relation R Attribute A – string, max. 20 B ◼ S(R,A) = 3 ← average length Attribute B – integer, 4 B Attribute C – date, 8 B Attribute D – string, 5 B ◼ S(R,D) = 1 ◼ Statistics T(R) = 5 S(R) = 16 V(R,A) = 3 V(R,B) = 1 V(R,C) = 5 V(R,D) = 4 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2023 55 Estimating Result Size ◼ Cartesian product W = R1  R2 T(W) = T(R1)  T(R2) S(W) = S(R1) + S(R2) PA152, Vlastislav Dohnal, FI MUNI, 2023 56 Estimating Result Size ◼ Select W = sZ=val(R) S(W) = S(R) T(W) = ? ◼ W = sA=‘cat’(R) ◼ W2 = sB=2(R) T(W2) = ? V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 T(W) = T(R) V(R,A) = 5/3 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2023 57 Estimating Result Size ◼ Assumption of last estimate Values are uniformly distributed over possible V(R,Z) values! ◼ f(val) = 1 / V(R,Z) ◼ T(sZ=val(R)) = T(R)  f(val) ◼ Alternate assumption Uniform distribution of values over domain with DOM(R,Z) values ◼ # values in domain is denoted as DOM(R,Z) ◼ f(val) = 1 / DOM(R,Z) PA152, Vlastislav Dohnal, FI MUNI, 2023 58 Estimating Result Size: Example ◼ Select W = sZ=val(R) T(W) = ? ◼ Podle DOM(R,*) ◼ Formula derivation W = sC=val(R) ◼ T(W) = f(val)  T(R) = 1/10 * 5 = 0,5 W = sB=val(R) ◼ T(W) = (1/10)*5 W = sA=val(R) ◼ T(W) = 0,5 V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2023 59 Estimating Result Size ◼ Select W = sZ=val(R) Original solution Alternate solution T(W) = T(R) V(R,Z) V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 T(W) = T(R) DOM(R,Z) A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2023 60 Estimating Result Size ◼ Select W = sZ≥val(R) Solution 1 ◼ T(W) = T(R) / 2 Solution 2 ◼ T(W) = T(R) / 3 Solution 3 ◼ Estimate values in range PA152, Vlastislav Dohnal, FI MUNI, 2023 61 Estimating Result Size ◼ Select – estimate values in range ◼ Calculate fraction of unique values in range T(W) = f  T(R) Min=1 V(R,Z)=10 W= sz  15 (R) Max=20 ZR f = 20-15+1 20-1+1 6 20 = PA152, Vlastislav Dohnal, FI MUNI, 2023 62 Estimate # values: Histogram ◼ Histogram of attribute values Replacing V(R,A) and DOM(R,A) More precise estimates ◼ Distinct values Few → abs. number per each Many → quantization ◼ Ranges (intervals) of equal size (in recs) ◼ Percentiles ◼ Most frequent ones only  others all together (i.e., uniformly distributed) 10 23 37 42 8 15 28 37 f PA152, Vlastislav Dohnal, FI MUNI, 2023 63 Estimating Result Size ◼ Select W = szval(R) T(W) = T(R)(1 – f(val)) = T(R)(1 – 1/V(R,Z)) Typical solution ◼ T(W) = T(R) = T(R) – T(R) V(R,Z) PA152, Vlastislav Dohnal, FI MUNI, 2023 64 Estimating Result Size ◼ Natural join W = R1  R2 Notation ◼ X – attributes of R1 ◼ Y – attributes of R2 ◼ Case 1 X  Y = Ø Same as R1  R2 ◼ Case 2 X  Y = Z Follows… PA152, Vlastislav Dohnal, FI MUNI, 2023 65 Estimating Result Size: Natural Join ◼ Assumption Z={A} and also: V(R1,A) ≤ V(R2,A) → every A value in R1 is in R2 V(R1,A) ≥ V(R2,A) → every A value in R2 is in R1 R1 A B C R2 A DR1 R2 PA152, Vlastislav Dohnal, FI MUNI, 2023 66 Estimating Result Size: Natural Join ◼ V(R1,A) ≤ V(R2,A) ◼ One record is matched with T(R2) / V(R2,A) records Assumption of uniform distribution ◼ Result: R1 A B C R2 A D Take 1 tuple Match T(W) = T(R2) V(R2,A) T(R1)  PA152, Vlastislav Dohnal, FI MUNI, 2023 67 Estimating Result Size: Natural Join ◼ Overview of both variants V(R1,A) ≤ V(R2,A) V(R2,A) ≤ V(R1,A) ◼ Difference is in denominator T(W) = T(R2) V(R2,A) T(R1)  T(W) = T(R1) V(R1,A) T(R2)  PA152, Vlastislav Dohnal, FI MUNI, 2023 68 Estimating Result Size: Natural Join ◼ In general W = R1 R2 T(W) = T(R1)  T(R2) max { V(R1,A), V(R2,A) } PA152, Vlastislav Dohnal, FI MUNI, 2023 69 Estimating Result Size: Natural Join ◼ Alternate solution Values uniformly distributed over domain ◼ One rec. matches with T(R2)/DOM(R2,A) recs. ◼ Result: R1 A B C R2 A D Take 1 tuple Match T(W) = T(R1)  T(R2) DOM(R2,A) = T(R1)  T(R2) DOM(R1,A) assuming to be same PA152, Vlastislav Dohnal, FI MUNI, 2023 70 Estimating Result Size: Natural Join ◼ W = R1(X), R2(Y), X  Y = Z ◼ Size of record S(W) = S(R1) + S(R2) – S(R1,Z) Valid for all cases ◼ Number of tuples if Z contains more attrs.? Assume they are independent R1 R2 T(W) = T(R1)  T(R2) max{ V(R1,A1), V(R2,A1) }  max{ V(R1,A2), V(R2,A2) } PA152, Vlastislav Dohnal, FI MUNI, 2023 71 Estimating Size: Project, Select ◼ Project W = AB(R) T(W)=T(R) S(W)=S(R, AB) ◼ Select W = sA=aB=b(R) S(W)=S(R) Let n=T(R), T(W) = n(1 – (1-m1/n)(1-m2/n)) ◼ m1=T(R) / V(R,A) m2=T(R) / V(R,B) PA152, Vlastislav Dohnal, FI MUNI, 2023 72 Estimating Size: Set Operations ◼ Union, intersect, except  , , – ◼ T(W) – choose average size  E.g. ◼ T(R  S) = T(R) + T(S) … if  means UNION ALL ◼ T(R  S) = [ max{T(R), T(S)} , T(R) + T(S) ]  So use: T(R  S) = avg{ max{T(R), T(S)} , T(R) + T(S) } ▪ If set union is evaluated ◼ T(R – S) = T(R) – 1/2T(S) ◼ T(R  S) = avg{ 0, min{T(R), T(S)} } ◼ DISTINCT  All attributes ◼ min{ 1/2T(R), (V(R,A)*V(R,B)*…) } PA152, Vlastislav Dohnal, FI MUNI, 2023 73 Estimating Result Size ◼ For complex expressions, intermediate results and their stats are needed ◼ Example W = [sA=a(R1)] R2 denote as U ◼ T(U) = T(R1) / V(R1,A) S(U) = S(R1) Need V(U,*) to estimate costs of W! PA152, Vlastislav Dohnal, FI MUNI, 2023 74 Estimate Number of Values ◼ To estimate V(U,*) U = sA=a(R1) Assume R1(A,B,C,D) PA152, Vlastislav Dohnal, FI MUNI, 2023 75 Estimate # values: Example ◼ Relation R1 ◼ U = sA=a(R1) ◼ Estimates V(U,A) = 1 V(U,B) = 1 V(U,C) = 1 .. (T(R1) / V(R1,A)) V(U,D) = 1 .. (T(R1) / V(R1,A)) V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d Sth. in between PA152, Vlastislav Dohnal, FI MUNI, 2023 76 Estimate # values: Reality ◼ Common solution U = sA=a(R1) V(U,A) = 1 V(U,K) = T(U) ◼ Primary key K of R1 is an exception V(U,*) = V(R,*), i.e., V(U,*) = T(U) ◼ As a result, original V(R,*) can be used V(U,*) = min { V(R,*), T(U) } PA152, Vlastislav Dohnal, FI MUNI, 2023 77 Estimate # values: Join ◼ U = R1(A,B) R2(A,C) ◼ Estimates: V(U,A) = min{ V(R1,A), V(R2,A) } V(U,B) = min { V(R1,B), T(U) } V(U,C) = min { V(R2,C), T(U) } PA152, Vlastislav Dohnal, FI MUNI, 2023 78 Estimate # values: Join ◼ Example Z = R1(A,B) R2(B,C) R3(C,D) T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 PA152, Vlastislav Dohnal, FI MUNI, 2023 79 Estimate # values: Join ◼ Intermediate result U = R1(A,B) R2(B,C) Estimates: ◼ T(U) = T(R1)  T(R2) / max{ V(R1,B), V(R2,B) } = = 1000  2000 / 200 = 10 000 ◼ V(U,A) = 50 ◼ V(U,B) = min{ V(R1,B), V(R2,B) } = 100 ◼ V(U,C) = 300 PA152, Vlastislav Dohnal, FI MUNI, 2023 80 Estimate # values: Join ◼ Final result Z = U R3(C,D) ◼ U(A,B,C) Estimates: ◼ T(Z) = 10 000  3 000 / 300 = 100 000 ◼ V(Z,A) = 50 ◼ V(Z,B) = 100 ◼ V(Z,C) = 90 ◼ V(Z,D) = 500 PA152, Vlastislav Dohnal, FI MUNI, 2023 81 Example of Stats in PostgreSQL ◼ Connect to student instance of PostgreSQL See the first lecture for instructions ◼ Check the schema xdohnal Relations: predmet, skupina, hotel ◼ Observe both the relation and attribute statistics Explanation of individual items in doc ◼ https://www.postgresql.org/docs/15/view-pg-stats.html PA152, Vlastislav Dohnal, FI MUNI, 2023 82 Example of Stats in PostgreSQL ◼ Relation hotel PA152, Vlastislav Dohnal, FI MUNI, 2023 83 Example of Stats in PostgreSQL ◼ Attribute hotel.id ◼ Attribute hotel.name PA152, Vlastislav Dohnal, FI MUNI, 2023 84 Example of Stats in PostgreSQL ◼ Attribute hotel.state ◼ Attribute hotel.distance_to_center PA152, Vlastislav Dohnal, FI MUNI, 2023 85 Summary ◼ Estimating size of results is an “art” ◼ Do not forget: Statistics must be kept up to date for precise estimates → necessity to maintain them during table updates What are the costs to update stats? PA152, Vlastislav Dohnal, FI MUNI, 2023 86 Statistics Update ◼ Stats do not change rapidly in short time ◼ Inaccurate stats may also help ◼ Instant stats update Can become a bottleneck ◼ Stats are used very often ◼ → Do not update often PA152, Vlastislav Dohnal, FI MUNI, 2023 87 Statistics Update ◼ Run periodically After some time period elapses After some number of updates are made ◼ Slow for V(R,A) Especially if histograms are kept → Use a data sample ◼ If almost all are distinct → V(R,A)T(R) ◼ If not many are distinct → we likely saw most of them PA152, Vlastislav Dohnal, FI MUNI, 2023 88 Estimating Costs of Query Plan: Outline ◼ Estimating the size of results Already done ◼ Estimating # of IOs Next lecture… ◼ Generate and compare plans