PA152: Efficient Use of DB 6. Query Processing Vlastislav Dohnal PA152, Vlastislav Dohnal, FI MUNI, 2021 2 Query Processing  Procedure: Query Checking syntax and semantics  Parse Tree Logical query plan  Plan modifications Physical query plan Evaluation PA152, Vlastislav Dohnal, FI MUNI, 2021 3 Example  Relation R(A,B,C) S(C,D,E)  Query select B,D from R,S where R.C=S.C and R.A=‘c’ and S.E=2 PA152, Vlastislav Dohnal, FI MUNI, 2021 4 Example R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 select B,D from R,S where R.C=S.C and R.A=‘c’ and S.E=2 PA152, Vlastislav Dohnal, FI MUNI, 2021 5 Example R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 Result: B D 2 x PA152, Vlastislav Dohnal, FI MUNI, 2021 6 How to evaluate this query?  Cartesian product  Selecting records  Projection 1. way PA152, Vlastislav Dohnal, FI MUNI, 2021 7 R  S R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . c 2 10 10 x 2 . . PA152, Vlastislav Dohnal, FI MUNI, 2021 8 This record matches R  S R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . c 2 10 10 x 2 . . Output – query result select B,D from R,S where R.C=S.C and R.A=‘c’ and S.E=2 PA152, Vlastislav Dohnal, FI MUNI, 2021 9 Describing Evaluation Plans  Using relational algebra B,D [ sR.A=‘c’ S.E=2  R.C = S.C (RS)]  Example of Plan 1: Query plan B,D sR.A=‘c’ S.E=2  R.C=S.C  R S PA152, Vlastislav Dohnal, FI MUNI, 2021 10 Describing Evaluation Plans  Example of Plan 2: B,D sR.A = ‘c’ sS.E = 2 R S natural join PA152, Vlastislav Dohnal, FI MUNI, 2021 11 Physical Plan  Example of Plan 2: R S A B C s (R) s(S) C D E a 1 10 A B C C D E 10 x 2 b 1 20 c 2 10 10 x 2 20 y 2 c 2 10 20 y 2 30 z 2 d 2 35 30 z 2 40 x 1 e 3 45 50 y 3 PA152, Vlastislav Dohnal, FI MUNI, 2021 12 Describing Evaluation Plans  Plan 3: Assume an index on R.A and on S.C Scan index R.A for looking up records of R satisfying R.A = “c”  For each record found, take value of R.C to scan index on S.C to get matching records of S  Filter out records of S, where S.E ≠ 2 Join the corresponding records of R and S Project on B,D PA152, Vlastislav Dohnal, FI MUNI, 2021 13 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 PA152, Vlastislav Dohnal, FI MUNI, 2021 14 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =‘c’ PA152, Vlastislav Dohnal, FI MUNI, 2021 15 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =“c” <10,x,2> PA152, Vlastislav Dohnal, FI MUNI, 2021 16 R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =“c” <10,x,2> Check e=2? Output: <2,x> PA152, Vlastislav Dohnal, FI MUNI, 2021 17 next record: R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 =“c” <10,x,2> Check=2? Output: <2,x> … PA152, Vlastislav Dohnal, FI MUNI, 2021 18 Overview on Query Optimization parser translate rules estimate sizes create physical query plans estimate costs of p.q.p. choose best execute {P1,P2,…..} {(P1,C1),(P2,C2)...} Pi result SQL query parse tree logical query plan „improved“ l.q.p. l.q.p. with sizes statistics PA152, Vlastislav Dohnal, FI MUNI, 2021 19 Example: SQL query  Relations  StarsIn(title, year, starName)  MovieStar(name, birthdate)  Query  Select movies with stars born in 1960:  SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE extract(year from birthdate) = 1960 ); PA152, Vlastislav Dohnal, FI MUNI, 2021 20 Example: Parse Tree SELECT FROM WHERE IN title StarsIn ( ) starName SELECT FROM WHERE = name MovieStar extract(year from birthdate) 1960 PA152, Vlastislav Dohnal, FI MUNI, 2021 21 Example: Translation to Rel. Algebra  Selection has two arguments Must be transformed…  IN operator i.e., avoiding nested queries (sub-selects) title s StarsIn IN name sextract(year from birthdate) = 1960 starName MovieStar PA152, Vlastislav Dohnal, FI MUNI, 2021 22 Example: Logical Query Plan  IN operator replaced with product title sstarName=name StarsIn name sextract(year from birthdate) = 1960 MovieStar  PA152, Vlastislav Dohnal, FI MUNI, 2021 23 Example: Improving Logical Plan  Substitution of product and selection by join  Next option Push project to relation StarsIn? title starName=name StarsIn name sextract(year from birthdate) = 1960 MovieStar  Before generating physical plans  Influence estimation of evaluation costs PA152, Vlastislav Dohnal, FI MUNI, 2021 24 Example: Estimate Result Sizes Estimate size StarsIn MovieStar name sextract(year from birthdate) = 1960 PA152, Vlastislav Dohnal, FI MUNI, 2021 25 Example: One Physical Plan Parameters: relation order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: select condition, ... StarsIn MovieStar PA152, Vlastislav Dohnal, FI MUNI, 2021 26 Example: Estimate Costs Logical Query Plan Plan1 Plan2 …. Plann Cost1 Cost2 …. Costn Pick plan with best cost! PA152, Vlastislav Dohnal, FI MUNI, 2021 27 Query Optimization  Relational algebra level  Detailed query plan level Estimate costs  Without indexes  With indexes Generate and compare plans PA152, Vlastislav Dohnal, FI MUNI, 2021 28 Relational Algebra Optimization  Transformation rules Must preserve equivalence What are good transformations? PA152, Vlastislav Dohnal, FI MUNI, 2021 29 Transformation Rules  Natural join Relation order is not important since all attributes are preserved  Example: R S = S R (R S) T = R (S T) R S S T RT PA152, Vlastislav Dohnal, FI MUNI, 2021 30 Transformation Rules  Same for cartesian product and union R  S = S  R (R  S)  T = R  (S  T) R  S = S  R R  (S  T) = (R  S)  T PA152, Vlastislav Dohnal, FI MUNI, 2021 31 Transformation Rules  Selects sp1p2(R) = sp1p2(R) = sp1p2(R) = sp1 [ sp2 (R)] [ sp1 (R)]  [ sp2 (R)] [ sp1 (R)]  [ sp2 (R)] PA152, Vlastislav Dohnal, FI MUNI, 2021 32 Question of Tuple Duplicates  Sets vs. bags?  Relations are bags  Example  R = {a,a,b,b,b,c}  S = {b,b,c,c,d}  R  S = ?  MIN: R  S = {b,b,c} SQL: INTERSECT ALL  R  S = ?  SUM: R  S = {a,a,b,b,b,b,b,c,c,c,d} SQL: UNION ALL  MAX: R  S = {a,a,b,b,b,c,c,d} PA152, Vlastislav Dohnal, FI MUNI, 2021 33 Option SUM: Union  Union two relations R  S SQL: UNION ALL  Example  Representatives(id, year, party, …)  Senators(id, year, party, …)  R = pyear,party(Senators) S = pyear,party(Representatives) year party 1997 ODS 2003 ČSSD 2007 SZ year party 1997 ODS 1998 KDU 1996 ČSSD PA152, Vlastislav Dohnal, FI MUNI, 2021 34 Option MAX: Select Decomposition  Select decomposition:  Example: R={a,a,b,b,b,c} a,b satisfy p1; b,c satisfy p2 sp1p2 (R) = sp1(R)  sp2(R) sp1p2 (R) = {a,a,b,b,b,c} sp1(R) = {a,a,b,b,b} sp2(R) = {b,b,b,c} sp1(R) max sp2(R) = {a,a,b,b,b,c} PA152, Vlastislav Dohnal, FI MUNI, 2021 35 Choice of Correct Option  Pragmatic solution for  Use “SUM” for bag union “MAX” for dividing disjunctive predicates ()  Some rules cannot be applied to bags Associativity of except: R – (S – T) Distributivity: R  (S  T) != (R  S)  (R  T) PA152, Vlastislav Dohnal, FI MUNI, 2021 36 Transformation Rules  Notation: X = set of attributes Y = set of attributes XY = X  Y  Project pxy (R) = px [py (R)] PA152, Vlastislav Dohnal, FI MUNI, 2021 37 Transformation Rules  Combining select and natural join  Let p = expr. containing only attrs. of R q = expr. containing only attrs. of S m = expr. containing attrs. of both R, S [sp (R)]  S R  [sq (S)] sp (R  S) = sq (R  S) = PA152, Vlastislav Dohnal, FI MUNI, 2021 38 Transformation Rules  Combining select and natural join Other rules can be derived spq (R  S) = [sp (R)]  [sq (S)] spqm (R  S) = sm [(sp (R))  (sq (S))] spq (R  S) = [(sp (R))  S] max [R (sq (S))] Transformation Rules  Combining select and natural join Example of rule derivation PA152, Vlastislav Dohnal, FI MUNI, 2021 39 spq (R  S) = sp [sq (R  S) ] = sp [ R  sq (S) ] = [sp (R)]  [sq (S)] PA152, Vlastislav Dohnal, FI MUNI, 2021 40 Transformation Rules  Combining select and natural join Example of rule derivation Let m = expr. containing only attrs. common in R and S, but does not compare them [sm (R)]  [sm (S)]sm (R  S) = PA152, Vlastislav Dohnal, FI MUNI, 2021 41 Transformation Rules  Combining project and select  Let x = attribute subset of R z = attributes referenced in expr. P (subset of R) (sp [ px (R) ])px pxz px[sp (R) ] = PA152, Vlastislav Dohnal, FI MUNI, 2021 42 Transformation Rules  Combining project and natural join  Let x = attribute subset of R y = attribute subset S z = attributes common in R and S pxy([pxz (R) ]  [pyz (S) ]) pxy (R  S) =  Combining previous with select PA152, Vlastislav Dohnal, FI MUNI, 2021 43 Transformation Rules pxy (sp (R  S)) = pxy (sp [pxz’ (R)  pyz’ (S)]) z’ = z  {attributes referenced in P}  Combining project, select and Cartesian product PA152, Vlastislav Dohnal, FI MUNI, 2021 44 Transformation Rules pxy (sp (R  S)) = ? PA152, Vlastislav Dohnal, FI MUNI, 2021 45 Transformation Rules  Combining select and union  Combining select and except Select can also be applied to S  May be convenient for shrinking relation before doing except Are there some limits on P ? sp(R sum S) = sp(R) sum sp(S) sp(R – S) = sp(R) – S = sp(R) – sp(S) PA152, Vlastislav Dohnal, FI MUNI, 2021 46 Good Transformations  Select early  Project early Example:  R(A,B,C,D,E,F,G,H,I,J) result={E}  Filter using P: (A=3)  (B=“cat”) pE (sp (R)) vs. pE (sp (pABE(R))) px [sp (R)]  px (sp [pxz (R)]) sp (R  S)  [sp (R)]  S PA152, Vlastislav Dohnal, FI MUNI, 2021 47 Good Transformations sp1p2 (R)  sp1 [sp2 (R)]  sp2 [sp1 (R)]  [ sp1 (R)]  [ sp2 (R)] sp1p2 (R)  sp1(R) max sp2(R) R  S  S  R PA152, Vlastislav Dohnal, FI MUNI, 2021 48 Good Transformations  Assume indexes On A and on B s(A=3)  (B=“cat”)(R) = s(A=3)(R)  s(B=“cat”)(R) A = 3B = ‘cat’ Record pointer: Intersect pointers to get pointers to matching tuples PA152, Vlastislav Dohnal, FI MUNI, 2021 49 Good Transformations  Recommendations: No transformation is always good. Usually, good to  select early  project early  Eliminate common sub-expressions  Eliminate tuple duplicates Good Transformations: Example  Push select to relations  apparently OK  But: Firstly, move them as far as possible; next push them back  Example:  Relations: StarsIn(title, year, starName) Movie(title, year, studioName)  View: create view MoviesOf1996 as select * from Movie where year = 1996;  Query: select starName, studioName from MoviesOf1996 natural join StarsIn; PA152, Vlastislav Dohnal, FI MUNI, 2021 50 starName,studioName MoviesOf1996 StarsIn syear=1996 starName,studioName Movie StarsIn syear=1996 starName,studioName Movie StarsIn syear=1996 PA152, Vlastislav Dohnal, FI MUNI, 2021 51 Query Evaluation: Overview  Relational algebra level Transformation rules Apply good rules  Detail query plan level Estimate costs Generate and compare plans PA152, Vlastislav Dohnal, FI MUNI, 2021 52 Estimating Cost of Query Plan 1. Estimate size of result 2. Estimate number of IOs PA152, Vlastislav Dohnal, FI MUNI, 2021 53 Estimating Result Size  Keep statistics for relation R T(R) – # tuples in R S(R) – # of bytes in each R tuple  S(R,A) – length in bytes of values of attribute A B(R) – # of blocks to hold all R tuples V(R, A) – # distinct values in R for attribute A  Good estimates need Statistics up to date! PA152, Vlastislav Dohnal, FI MUNI, 2021 54 Statistics Example  Relation R Attribute A – string, max. 20 B  S(R,A) = 3 ← average length Attribute B – integer, 4 B Attribute C – date, 8 B Attribute D – string, 5 B  S(R,D) = 1  Statistics T(R) = 5 S(R) = 16 V(R,A) = 3 V(R,B) = 1 V(R,C) = 5 V(R,D) = 4 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2021 55 Estimating Result Size  Cartesian product W = R1  R2 T(W) = T(R1)  T(R2) S(W) = S(R1) + S(R2) PA152, Vlastislav Dohnal, FI MUNI, 2021 56 Estimating Result Size  Select W = sZ=val(R) S(W) = S(R) T(W) = ?  W = sA=‘cat’(R)  W2 = sB=2(R) T(W2) = ? V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 T(W) = T(R) V(R,A) = 5/3 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2021 57 Estimating Result Size  Assumption of last estimate Values are uniformly distributed over possible V(R,Z) values!  f(val) = 1 / V(R,Z)  T(sZ=val(R)) = T(R)  f(val)  Alternate assumption Uniform distribution of values over domain with DOM(R,Z) values  # values in domain is denoted as DOM(R,Z)  f(val) = 1 / DOM(R,Z) PA152, Vlastislav Dohnal, FI MUNI, 2021 58 Estimating Result Size: Example  Select W = sZ=val(R) T(W) = ?  Podle DOM(R,*)  Formula derivation W = sC=val(R)  T(W) = f(val)  T(R) = 1/10 * 5 = 0,5 W = sB=val(R)  T(W) = (1/10)*5 W = sA=val(R)  T(W) = 0,5 V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2021 59 Estimating Result Size  Select W = sZ=val(R) Original solution Alternate solution T(W) = T(R) V(R,Z) V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 T(W) = T(R) DOM(R,Z) A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d PA152, Vlastislav Dohnal, FI MUNI, 2021 60 Estimating Result Size  Select W = sZ≥val(R) Solution 1  T(W) = T(R) / 2 Solution 2  T(W) = T(R) / 3 Solution 3  Estimate values in range PA152, Vlastislav Dohnal, FI MUNI, 2021 61 Estimating Result Size  Select – estimate values in range  Calculate fraction of unique values in range T(W) = f  T(R) Min=1 V(R,Z)=10 W= sz  15 (R) Max=20 ZR f = 20-15+1 20-1+1 6 20 = PA152, Vlastislav Dohnal, FI MUNI, 2021 62 Estimate # values: Histogram  Histogram of attribute values Replacing V(R,A) and DOM(R,A) More precise estimates  Distinct values Few  abs. number per each Many  quantization  Ranges (intervals) of equal size (in recs)  Percentiles  Most frequent ones only  others all together (i.e., uniformly distributed) 10 23 37 42 8 15 28 37 f PA152, Vlastislav Dohnal, FI MUNI, 2021 63 Estimating Result Size  Select W = szval(R) T(W) = T(R)(1 – f(val)) = T(R)(1 – 1/V(R,Z)) Typical solution  T(W) = T(R) = T(R) – T(R) V(R,Z) PA152, Vlastislav Dohnal, FI MUNI, 2021 64 Estimating Result Size  Natural join W = R1  R2 Notation  X – attributes of R1  Y – attributes of R2  Case 1 X  Y = Ø Same as R1  R2  Case 2 X  Y = Z Follows… PA152, Vlastislav Dohnal, FI MUNI, 2021 65 Estimating Result Size: Natural Join  Assumption Z={A} and also: V(R1,A) ≤ V(R2,A)  every A value in R1 is in R2 V(R1,A) ≥ V(R2,A)  every A value in R2 is in R1 R1 A B C R2 A DR1 R2 PA152, Vlastislav Dohnal, FI MUNI, 2021 66 Estimating Result Size: Natural Join  V(R1,A) ≤ V(R2,A)  One record is matched with T(R2) / V(R2,A) records Assumption of uniform distribution  Result: R1 A B C R2 A D Take 1 tuple Match T(W) = T(R2) V(R2,A) T(R1)  PA152, Vlastislav Dohnal, FI MUNI, 2021 67 Estimating Result Size: Natural Join  Overview of both variants V(R1,A) ≤ V(R2,A) V(R2,A) ≤ V(R1,A)  Difference is in denominator T(W) = T(R2) V(R2,A) T(R1)  T(W) = T(R1) V(R1,A) T(R2)  PA152, Vlastislav Dohnal, FI MUNI, 2021 68 Estimating Result Size: Natural Join  In general W = R1 R2 T(W) = T(R1)  T(R2) max { V(R1,A), V(R2,A) } PA152, Vlastislav Dohnal, FI MUNI, 2021 69 Estimating Result Size: Natural Join  Alternate solution Values uniformly distributed over domain  One rec. matches with T(R2)/DOM(R2,A) recs.  Result: R1 A B C R2 A D Take 1 tuple Match T(W) = T(R1)  T(R2) DOM(R2,A) = T(R1)  T(R2) DOM(R1,A) assuming to be same PA152, Vlastislav Dohnal, FI MUNI, 2021 70 Estimating Result Size: Natural Join  W = R1(X), R2(Y), X  Y = Z  Size of record S(W) = S(R1) + S(R2) – S(R1,Z) Valid for all cases  Number of tuples if Z contains more attrs.? Assume they are independent R1 R2 T(W) = T(R1)  T(R2) max{ V(R1,A1), V(R2,A1) }  max{ V(R1,A2), V(R2,A2) } PA152, Vlastislav Dohnal, FI MUNI, 2021 71 Estimating Size: Project, Select  Project W = AB(R) T(W)=T(R) S(W)=S(R, AB)  Select W = sA=aB=b(R) S(W)=S(R) Let n=T(R), T(W) = n(1 – (1-m1/n)(1-m2/n))  m1=T(R) / V(R,A) m2=T(R) / V(R,B) PA152, Vlastislav Dohnal, FI MUNI, 2021 72 Estimating Size: Set Operations  Union, intersect, except  , , –  T(W) – choose average size  E.g.  T(R  S) = T(R) + T(S) … if  means UNION ALL  T(R  S) = [ max{T(R), T(S)} , T(R) + T(S) ]  So use: T(R  S) = avg{ max{T(R), T(S)} , T(R) + T(S) }  If set union is evaluated  T(R – S) = T(R) – 1/2T(S)  T(R  S) = avg{ 0, min{T(R), T(S)} }  DISTINCT  All attributes  min{ 1/2T(R), (V(R,A)*V(R,B)*…) } PA152, Vlastislav Dohnal, FI MUNI, 2021 73 Estimating Result Size  For complex expressions, intermediate results and their stats are needed  Example W = [sA=a(R1)] R2 denote as U  T(U) = T(R1) / V(R1,A) S(U) = S(R1) Need V(U,*) to estimate costs of W! PA152, Vlastislav Dohnal, FI MUNI, 2021 74 Estimate Number of Values  To estimate V(U,*) U = sA=a(R1) Assume R1(A,B,C,D) PA152, Vlastislav Dohnal, FI MUNI, 2021 75 Estimate # values: Example  Relation R1  U = sA=a(R1)  Estimates V(U,A) = 1 V(U,B) = 1 V(U,C) = 1 .. (T(R1) / V(R1,A)) V(U,D) = 1 .. (T(R1) / V(R1,A)) V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 A B C D cat 1 10.2.98 a cat 1 20.3.98 b dog 1 30.4.98 a dog 1 14.6.98 c bat 1 15.6.98 d Sth. in between PA152, Vlastislav Dohnal, FI MUNI, 2021 76 Estimate # values: Reality  Common solution U = sA=a(R1) V(U,A) = 1 V(U,K) = T(U)  Primary key K of R1 is an exception V(U,*) = V(R,*), ie. V(U,*) = T(U)  As a result, original V(R,*) can be used V(U,*) = min { V(R,*), T(U) } PA152, Vlastislav Dohnal, FI MUNI, 2021 77 Estimate # values: Join  U = R1(A,B) R2(A,C)  Estimates: V(U,A) = min{ V(R1,A), V(R2,A) } V(U,B) = V(R1,B)  ie. min { V(R1,B), T(U) } V(U,C) = V(R2,C) PA152, Vlastislav Dohnal, FI MUNI, 2021 78 Estimate # values: Join  Example Z = R1(A,B) R2(B,C) R3(C,D) T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 PA152, Vlastislav Dohnal, FI MUNI, 2021 79 Estimate # values: Join  Intermediate result U = R1(A,B) R2(B,C) Estimates:  T(U) = T(R1)  T(R2) / max{ V(R1,B), V(R2,B) } = = 1000  2000 / 200 = 10 000  V(U,A) = 50  V(U,B) = min{ V(R1,B), V(R2,B) } = 100  V(U,C) = 300 PA152, Vlastislav Dohnal, FI MUNI, 2021 80 Estimate # values: Join  Final result Z = U R3(C,D)  U(A,B,C) Estimates:  T(Z) = 10 000  3 000 / 300 = 100 000  V(Z,A) = 50  V(Z,B) = 100  V(Z,C) = 90  V(Z,D) = 500 PA152, Vlastislav Dohnal, FI MUNI, 2021 81 Example of Stats in PostgreSQL  Connect to student instance of PostgreSQL See the first lecture for instructions  Check the schema xdohnal Relations: predmet, skupina, hotel  Observe both the relation and attribute statistics Explanation of individual items in doc  https://www.postgresql.org/docs/13/view-pg-stats.html PA152, Vlastislav Dohnal, FI MUNI, 2021 82 Example of Stats in PostgreSQL  Relation hotel PA152, Vlastislav Dohnal, FI MUNI, 2021 83 Example of Stats in PostgreSQL  Attribute hotel.id  Attribute hotel.name PA152, Vlastislav Dohnal, FI MUNI, 2021 84 Example of Stats in PostgreSQL  Attribute hotel.state  Attribute hotel.distance_to_center PA152, Vlastislav Dohnal, FI MUNI, 2021 85 Summary  Estimating size of results is an “art”  Do not forget: Statistics must be kept up to date for precise estimates  necessity to maintain them during table updates What are the costs to update stats? PA152, Vlastislav Dohnal, FI MUNI, 2021 86 Statistics Update  Stats do not change rapidly in short time  Inaccurate stats may also help  Instant stats update Can become a bottleneck  Stats are used very often   Do not update often PA152, Vlastislav Dohnal, FI MUNI, 2021 87 Statistics Update  Run periodically After some time period elapses After some number of updates are made  Slow for V(R,A) Especially if histograms are kept  Use a data sample  If almost all are distinct  V(R,A)T(R)  If not many are distinct  we likely saw most of them PA152, Vlastislav Dohnal, FI MUNI, 2021 88 Estimating Costs of Query Plan: Outline  Estimating the size of results Already done  Estimating # of IOs Next lecture…  Generate and compare plans