PA152: Efficient Use of DB
9. Query Tuning
Vlastislav Dohnal
PA152, Vlastislav Dohnal, FI MUNI, 2022 2
Credits
 Sources of materials for this lecture:
Courses CS245, CS345, CS345
 Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer
Widom
 Stanford University, California
Database Tuning (slides)
 Dennis Shasha, Philippe Bonnet
 Morgan Kaufmann, 1st edition, 440 pages, 2002
 ISBN-13: 978-1558607538
 http://www.databasetuning.org/
PA152, Vlastislav Dohnal, FI MUNI, 2022 3
Query Tuning
SELECT s.RESTAURANT_NAME, t.TABLE_SEATING, to_char(t.DATE_TIME,'Dy, Mon FMDD') AS THEDATE,
to_char(t.DATE_TIME,'HH:MI PM') AS THETIME,to_char(t.DISCOUNT,'99') || '%' AS AMOUNTVALUE,t.TABLE_ID,
s.SUPPLIER_ID, t.DATE_TIME, to_number(to_char(t.DATE_TIME,'SSSSS')) AS SORTTIME
FROM TABLES_AVAILABLE t, SUPPLIER_INFO s,
(SELECT s.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, max(t.DISCOUNT) AMOUNT, t.OFFER_TYPE
FROM TABLES_AVAILABLE t, SUPPLIER_INFO s
WHERE t.SUPPLIER_ID = s.SUPPLIER_ID
and (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') != TO_CHAR(sysdate, 'MM/DD/YYYY')
or TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)
and t.NUM_OFFERS > 0 and t.DATE_TIME > SYSDATE and s.CITY = 'SF'
and t.TABLE_SEATING = '2’ and t.DATE_TIME between sysdate and (sysdate + 7)
and to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800
and t.OFFER_TYPE = 'Discount‘
GROUP BY s.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, t.OFFER_TYP) u
WHERE t.SUPPLIER_ID=s.SUPPLIER_ID and u.SUPPLIER_ID=s.SUPPLIER_ID and t.SUPPLIER_ID=u.SUPPLIER_ID
and t.TABLE_SEATING = u.TABLE_SEATING and t.DATE_TIME = u.DATE_TIME
and t.DISCOUNT = u.AMOUNT and t.OFFER_TYPE = u.OFFER_TYPE
and (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') != TO_CHAR(sysdate, 'MM/DD/YYYY')
or TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)
and t.NUM_OFFERS > 2 and t.DATE_TIME > SYSDATE and s.CITY = 'SF'
and t.TABLE_SEATING = '2' and t.DATE_TIME between sysdate and (sysdate + 7)
and to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800 and t.OFFER_TYPE = 'Discount'
ORDER BY AMOUNTVALUE DESC, t.TABLE_SEATING ASC, upper(s.RESTAURANT_NAME) ASC,
SORTTIME ASC, t.DATE_TIME ASC
Execution is too slow …
1) How is the query evaluated?
2) How can we speed it up?
PA152, Vlastislav Dohnal, FI MUNI, 2022 4
Query Execution Plan
Output of EXPLAIN command in Oracle
Operator
Access method Evaluation cost
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=165 Card=1 Bytes=106)
1 0 SORT (ORDER BY) (Cost=165 Card=1 Bytes=106)
2 1 NESTED LOOPS (Cost=164 Card=1 Bytes=106)
3 2 NESTED LOOPS (Cost=155 Card=1 Bytes=83)
4 3 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=28)
5 3 VIEW
6 5 SORT (GROUP BY) (Cost=83 Card=1 Bytes=34)
7 6 NESTED LOOPS (Cost=81 Card=1 Bytes=34)
8 7 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=24)
9 7 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=200)
10 2 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=460)
PA152, Vlastislav Dohnal, FI MUNI, 2022 5
Monitoring Queries
 What is slow query?
Needs to many disk IOs
 high costs in execution plan (explain)
 E.g., query for one row (exact-match query) uses
table-scan.
Inconvenient query plan
 Existing indexes are not used
 How to reveal?
DBMS can log “long-lasting” queries
…
PA152, Vlastislav Dohnal, FI MUNI, 2022 6
Query Tuning
 Local tuning = query rewrite
First approach to speed up a query
Influences only the query
 Global tuning
Index creation
Schema modification
Transaction splitting
…
Potentially harmful
PA152, Vlastislav Dohnal, FI MUNI, 2022 7
Query Rewriting
 Example:
Employee(ssnum, name, manager, dept,
salary, numfriends)
 Clustering index on ssnum
 i.e., relation is sorted by this attribute in the file
 Non-clustering indexes: (i) name; (ii) dept
Student(ssnum, name, degree_sought, year)
 Clustering index on ssnum
 Non-clustering index on name
Tech(dept, manager, location)
 Clustering index on dept
PA152, Vlastislav Dohnal, FI MUNI, 2022 8
Query Rewriting
 Techniques
Index usage
DISTINCTs elimination
(Correlated) subqueries
Use of temporaries
Use of having
Use of views
Materialized views
PA152, Vlastislav Dohnal, FI MUNI, 2022 9
Index Usage
 Many query optimizers will not use indexes in
the presence of :
 Arithmetic expressions
WHERE salary/12 >= 4000;
WHERE inserted + 1 = current_date;
 Functions
SELECT * FROM employee
WHERE SUBSTR(name, 1, 1) = ‘G’;
… WHERE to_char(inserted, 'YYYYMM') = '201704'
 Numerical comparisons of fields with different
types
 Multi-attribute indexes
 Comparison with NULL
Index Usage
 = vs. like
 SELECT * FROM hotel WHERE city='city174’
 SELECT * FROM hotel WHERE city LIKE 'city174’
 SELECT * FROM hotel WHERE city like 'city174%'
PA152, Vlastislav Dohnal, FI MUNI, 2022 10
"Bitmap Heap Scan on hotel (cost=4.31..14.26 rows=5 width=59)“
" Filter: ((city)::text ~~ 'city174'::text)“
" -> Bitmap Index Scan on hotel_city (cost=0.00..4.31 rows=5 width=0)“
" Index Cond: ((city)::text = 'city174'::text)"
"Seq Scan on hotel (cost=0.00..17.25 rows=5 width=59)“
" Filter: ((city)::text ~~ 'city174%'::text)"
Index Usage (cont.)
 Aggregate functions MAX(A), MIN(A)
 resp. ORDER BY A LIMIT 1
 using functions on A
 E.g.,
 conn_log ( log_key, sim_imsi, time, car_key, pda_imei,
gsmnet_id, method, program_ver )
A. SELECT max(time AT TIME ZONE 'UTC') AS time
FROM conn_log
WHERE sim_imsi=‘23001234567890123’ AND
time>'2016-02-28 10:50:00.122 UTC' AND
method='U' AND program_ver IS NOT NULL;
B. SELECT time AT TIME ZONE 'UTC‘
FROM (SELECT max(time) AS time
FROM conn_log
WHERE sim_imsi=‘23001234567890123’ AND
time>'2016-02-28 10:50:00.122 UTC' AND
method='U' AND program_ver IS NOT NULL) AS x;
C. SELECT max(time) AT TIME ZONE 'UTC' AS time …
(cont. from A.)
PA152, Vlastislav Dohnal, FI MUNI, 2022 11
Plus a secondary index on
(sim_imsi,time)
Index Usage (cont.)
PA152, Vlastislav Dohnal, FI MUNI, 2022 12
QUERY PLAN (QUERY A.)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=19412.69..19412.70 rows=1 width=8) (actual time=36.415..36.415 rows=1 loops=1)
-> Append (cost=0.00..19385.45 rows=5448 width=8) (actual time=36.410..36.410 rows=0 loops=1)
-> Seq Scan on conn_log (cost=0.00..0.00 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1)
Filter: ((program_ver IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone) AND (sim_imsi = '23001234567890123'::bpchar) AND
(method = 'U'::bpchar))
-> Index Scan using conn_log_imsi_time_y2016m02 on conn_log_y2016m02 (cost=0.56..8.58 rows=1 width=8) (actual time=28.464..28.464 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Bitmap Heap Scan on conn_log_y2016m03 (cost=194.11..14125.36 rows=3969 width=8) (actual time=2.586..2.586 rows=0 loops=1)
Recheck Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Bitmap Index Scan on conn_log_imsi_time_y2016m03 (cost=0.00..193.12 rows=4056 width=0) (actual time=2.584..2.584 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
-> Bitmap Heap Scan on conn_log_y2016m04 (cost=71.87..5243.35 rows=1476 width=8) (actual time=5.346..5.346 rows=0 loops=1)
Recheck Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Bitmap Index Scan on conn_log_imsi_time_y2016m04 (cost=0.00..71.50 rows=1507 width=0) (actual time=5.342..5.342 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
-> Index Scan using conn_log_imsi_time_y2016m05 on conn_log_y2016m05 (cost=0.14..8.16 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
Planning time: 4.159 ms
Execution time: 36.535 ms
Index Usage (cont.)
PA152, Vlastislav Dohnal, FI MUNI, 2022 13
QUERY PLAN (QUERY B.)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on x (cost=5.98..6.01 rows=1 width=8) (actual time=0.162..0.163 rows=1 loops=1)
-> Result (cost=5.98..5.99 rows=1 width=0) (actual time=0.159..0.160 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=1.87..5.98 rows=1 width=8) (actual time=0.158..0.158 rows=0 loops=1)
-> Merge Append (cost=1.87..22424.61 rows=5449 width=8) (actual time=0.156..0.156 rows=0 loops=1)
Sort Key: conn_log."time"
-> Index Scan Backward using conn_log_imsi_time on conn_log (cost=0.12..8.15 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m02 on conn_log_y2016m02 (cost=0.56..8.58 rows=1 width=8)
(actual time=0.069..0.069 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m03 on conn_log_y2016m03 (cost=0.56..16225.91 rows=3969 width=8)
(actual time=0.046..0.046 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m04 on conn_log_y2016m04 (cost=0.43..6033.60 rows=1477 width=8)
(actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m05 on conn_log_y2016m05 (cost=0.14..8.17 rows=1 width=8)
(actual time=0.002..0.002 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
Planning time: 3.137 ms
Execution time: 0.317 ms
Index Usage (cont.)
PA152, Vlastislav Dohnal, FI MUNI, 2022 14
QUERY PLAN (QUERY C.)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=5.98..5.99 rows=1 width=0) (actual time=0.186..0.186 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=1.87..5.98 rows=1 width=8) (actual time=0.182..0.182 rows=0 loops=1)
-> Merge Append (cost=1.87..22424.63 rows=5450 width=8) (actual time=0.181..0.181 rows=0 loops=1)
Sort Key: conn_log."time"
-> Index Scan Backward using conn_log_imsi_time on conn_log (cost=0.12..8.15 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m02 on conn_log_y2016m02 (cost=0.56..8.58 rows=1 width=8)
(actual time=0.070..0.070 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m03 on conn_log_y2016m03 (cost=0.56..16225.91 rows=3969 width=8)
(actual time=0.064..0.064 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m04 on conn_log_y2016m04 (cost=0.43..6033.60 rows=1478 width=8)
(actual time=0.037..0.037 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
-> Index Scan Backward using conn_log_imsi_time_y2016m05 on conn_log_y2016m05 (cost=0.14..8.17 rows=1 width=8)
(actual time=0.003..0.003 rows=0 loops=1)
Index Cond: ((sim_imsi = '23001234567890123'::bpchar) AND ("time" IS NOT NULL) AND ("time" > '2016-02-28 11:50:00.122+01'::timestamp with time zone))
Filter: ((program_ver IS NOT NULL) AND (method = 'U'::bpchar))
Planning time: 3.094 ms
Execution time: 0.309 ms
PA152, Vlastislav Dohnal, FI MUNI, 2022 15
Eliminate unneeded DISTINCTs
 Query:
Find employees who work in the information
systems department. There should be no
duplicates.
SELECT DISTINCT ssnum
FROM employee
WHERE dept = ‘information systems’
 DISTINCT is unnecessary
ssnum is a prim. key in employee
PA152, Vlastislav Dohnal, FI MUNI, 2022 16
Eliminate unneeded DISTINCTs
 Query:
Find social security numbers of employees in
the technical departments. There should be
no duplicates.
SELECT DISTINCT ssnum
FROM employee, tech
WHERE employee.dept = tech.dept
 Is DISTINCT needed?
Employee(ssnum, name, manager, dept, salary, numfriends)
Tech(dept, manager, location)
PA152, Vlastislav Dohnal, FI MUNI, 2022 17
Eliminate unneeded DISTINCTs
 Query:
SELECT DISTINCT ssnum
FROM employee, tech
WHERE employee.dept = tech.dept
 Is DISTINCT needed?
ssnum is a key in employee
dept is a key in tech
 each employee record will join with at most
one record in tech.
 DISTINCT is unnecessary
Employee(ssnum, name, manager, dept, salary, numfriends)
Tech(dept, manager, location)
PA152, Vlastislav Dohnal, FI MUNI, 2022 18
Eliminate unneeded DISTINCTs
 The relationship among DISTINCT, keys
and joins can be generalized:
Definition of “privileged“
 Call a table T privileged if the fields returned by the
select contain a key of T.
Definition of relationship “reaches“
 Let R be an unprivileged table.
 Suppose that R is joined on equality by its key field
to some other table S, then we say R reaches S.
Relationship “reaches“ is transitive:
 If R1 reaches R2 and R2 reaches R3,
then R1 reaches R3.
PA152, Vlastislav Dohnal, FI MUNI, 2022 19
Eliminate unneeded DISTINCTs
 Main Theorem:
There will be no duplicates among the records
returned by a selection, even in the absence
of DISTINCT
if one of the two following conditions hold:
Every table mentioned in the FROM clause is
privileged.
Every unprivileged table reaches at least one
privileged table.
PA152, Vlastislav Dohnal, FI MUNI, 2022 20
Unneeded DISTINCT (1)
 Query:
SELECT DISTINCT ssnum
FROM employee, tech
WHERE employee.manager = tech.manager
 Employee is privileged
 Is tech privileged?
No.
 Does tech reach employee?
No. Attribute manager is not a key in tech.
Employee(ssnum, name, manager, dept, salary, numfriends)
Tech(dept, manager, location)
PA152, Vlastislav Dohnal, FI MUNI, 2022 21
Unneeded DISTINCT (2)
 Query:
SELECT DISTINCT ssnum, tech.dept
FROM employee, tech
WHERE employee.manager = tech.manager
 Employee is privileged
 Is tech privileged?
Yes.
 Result does not have duplicates
Employee(ssnum, name, manager, dept, salary, numfriends)
Tech(dept, manager, location)
PA152, Vlastislav Dohnal, FI MUNI, 2022 22
Unneeded DISTINCT (3)
 Query:
SELECT DISTINCT student.ssnum
FROM student, employee, tech
WHERE student.name = employee.name
AND employee.dept = tech.dept;
 Student is privileged
 Employee is not privileged and does not
reach any other relation.
  DISTINCT is needed.
Employee(ssnum, name, manager, dept, salary, numfriends)
Student(ssnum, name, degree_sought, year)
Tech(dept, manager, location)
Nested Queries
 SELECT containing another SELECT as its part
 SELECT employee_number, name
FROM employees AS X
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department = X.department );
 SELECT employee_number, name,
(SELECT AVG(salary) FROM employees
WHERE department = X.department ) AS department_average
FROM employees AS X;
PA152, Vlastislav Dohnal, FI MUNI, 2022 23
Rewriting Nested Queries
 Reason:
Query optimizer may not correctly handle
some nested queries
Usually:
 Uncorrelated subqueries without aggregate
 Correlated subqueries
PA152, Vlastislav Dohnal, FI MUNI, 2022 24
PA152, Vlastislav Dohnal, FI MUNI, 2022 25
Types of Nested Queries
 Uncorrelated subqueries with aggregates
SELECT ssnum FROM employee
WHERE salary >
(SELECT avg(salary) FROM
employee)
 Uncorrelated subqueries without
aggregate
SELECT ssnum FROM employee
WHERE dept in (SELECT dept FROM
tech)
So-called “semi-join”
PA152, Vlastislav Dohnal, FI MUNI, 2022 26
Types of Nested Queries
 Correlated subqueries with aggregates
SELECT ssnum FROM employee e1
WHERE salary >=
(SELECT avg(e2.salary)
FROM employee e2, tech
WHERE e2.dept = e1.dept
AND e2.dept = tech.dept)
PA152, Vlastislav Dohnal, FI MUNI, 2022 27
Types of Nested Queries
 Correlated subqueries without aggregates
Unusual for derived tables
 i.e., can rewrite with join
 Semi-join queries may be evaluated inefficiently
Example of two semi-join queries:
 SELECT ssnum FROM employee
WHERE dept in
(SELECT dept FROM tech
WHERE tech.manager=employee.manager)
 SELECT ssnum FROM employee
WHERE EXISTS (SELECT 1 FROM tech WHERE
employee.manager = tech.manager)
PA152, Vlastislav Dohnal, FI MUNI, 2022 28
Rewriting Uncorrel. Subq. without Aggregates
1. Combine the arguments of the two FROM
clauses
2. Replace IN with =
3. Retain the SELECT clause
SELECT ssnum FROM employee
WHERE dept in (select dept from tech)
SELECT DISTINCT ssnum
FROM employee, tech
WHERE employee.dept = tech.dept
PA152, Vlastislav Dohnal, FI MUNI, 2022 29
Rewriting Uncorrel. Subq. without Aggregates
 Potential problem with duplicates:
SELECT avg(salary) FROM employee
WHERE manager in (select manager from tech)
SELECT avg(salary) FROM employee, tech
WHERE employee.manager = tech.manager
 The rewritten query may include an
employee record several times
if that employee’s manager manages several
departments.
 The solution is to create a temporary table
(using DISTINCT) to eliminate duplicates.
PA152, Vlastislav Dohnal, FI MUNI, 2022 30
Rewriting Correlated Subqueries
 Query:
Find the employees of tech departments who
earn at least the average salary in their
department.
SELECT ssnum
FROM employee e1
WHERE salary >= (SELECT avg(e2.salary)
FROM employee e2, tech
WHERE e2.dept = tech.dept
AND e2.dept = e1.dept);
PA152, Vlastislav Dohnal, FI MUNI, 2022 31
Rewriting Correlated Subqueries
CREATE TEMPORARY TABLE temp ( … ) ON COMMIT DROP;
INSERT INTO temp
SELECT avg(salary) as avsalary, tech.dept
FROM tech, employee
WHERE tech.dept = employee.dept
GROUP BY tech.dept;
SELECT ssnum
FROM employee, temp
WHERE salary >= avsalary
AND employee.dept = temp.dept
PA152, Vlastislav Dohnal, FI MUNI, 2022 32
Rewriting Correlated Subqueries
SELECT ssnum
FROM employee as E,
(SELECT avg(salary) as avsalary, tech.dept
FROM tech, employee
WHERE tech.dept = employee.dept
GROUP BY tech.dept) as AVG
WHERE salary >= avsalary AND E.dept = AVG.dept
PA152, Vlastislav Dohnal, FI MUNI, 2022 33
Rewriting Correlated Subqueries
 Query:
Find employees of technical departments
whose number of friends equals the number
of employees in their department.
SELECT ssnum
FROM employee e1
WHERE numfriends = (
SELECT COUNT(e2.ssnum)
FROM employee e2, tech
WHERE e2.dept = tech.dept
AND e2.dept = e1.dept);
PA152, Vlastislav Dohnal, FI MUNI, 2022 34
Rewriting Correlated Subqueries
INSERT INTO temp
SELECT COUNT(ssnum) as numworkers,
employee.dept
FROM tech, employee
WHERE tech.dept = employee.dept
GROUP BY tech.dept;
SELECT ssnum
FROM employee, temp
WHERE numfriends = numworkers
AND employee.dept = temp.dept;
Can you spot the infamous COUNT bug?
PA152, Vlastislav Dohnal, FI MUNI, 2022 35
The Infamous COUNT Bug
 Example:
 Helene who is not in a technical department.
 In the original query, Helene’s number of friends
would be compared to COUNT(Ø)=0.
 In case Helene has no friends, she would survive the
selection.
 In the transformed query, Helene’s record would
not appear.
 The temporary table will contain counts for tech
departments only.
 This is a limitation of the correlated subquery
rewriting technique when COUNT is involved.
Rewriting Correlated Subqueries
 Anti-joins
SELECT * FROM Tech WHERE dept NOT IN
(SELECT dept FROM employee)
 Problem with NULLs in employee.dept
SELECT * FROM Tech WHERE NOT EXISTS
(SELECT 1 FROM employee.dept=tech.dept)
 Issues
Not using join algorithm
Using too many index lookups in index join
PA152, Vlastislav Dohnal, FI MUNI, 2022 36
Rewriting Correlated Subqueries
 Test these in student’s Pg:
PA152, Vlastislav Dohnal, FI MUNI, 2022 37
explain verbose select * from hotel
where id not in (select hotel_id from room);
"Seq Scan on xdohnal.hotel (cost=0.00..2190904.75 rows=250 width=59)“
" Output: hotel.id, hotel.name, hotel.street, …“
" Filter: (NOT (SubPlan 1))“
" SubPlan 1“
" -> Materialize (cost=0.00..7974.90 rows=315460 width=4)“
" Output: room.hotel_id“
" -> Seq Scan on xdohnal.room (cost=0.00..5164.60 rows=315460 width=4)“
" Output: room.hotel_id"
explain verbose select * from hotel
where id not in (select hotel_id from room
where hotel_id is not null);
explain verbose select * from hotel
where not exists(select 1 from room
where room.hotel_id=hotel.id);
PA152, Vlastislav Dohnal, FI MUNI, 2022 38
Query Rewriting
 Techniques
Index usage
DISTINCTs elimination
(Correlated) subqueries
Use of temporaries
Use of having
Use of views
Materialized views
PA152, Vlastislav Dohnal, FI MUNI, 2022 39
Abuse of Temporaries
 Query:
 Find all information about department employees with
their locations who earn at least > 40000.
 INSERT INTO temp
SELECT *
FROM employee
WHERE salary >= 40000
 SELECT ssnum, location
FROM temp
WHERE temp.dept = ‘information systems’
 This solution will not be optimal (should have been
done in the reverse order)
 Cannot use on dept in employee
 There is no index on temp table.
PA152, Vlastislav Dohnal, FI MUNI, 2022 40
Use of Having
 Reason for having:
Shortens queries that filter on aggregation
results
Cannot use aggregations in WHERE clause
Use HAVING clause then
 Example
SELECT avg(salary), dept
FROM employee
GROUP BY dept
HAVING avg(salary) > 10 000;
PA152, Vlastislav Dohnal, FI MUNI, 2022 41
Use of Having
 Another example
SELECT avg(salary), dept
FROM employee
GROUP BY dept
HAVING count(ssnum) > 100;
PA152, Vlastislav Dohnal, FI MUNI, 2022 42
Use of Having
 Don’t use HAVING
when WHERE is enough.
SELECT avg(salary) as avgsalary, dept
FROM employee
WHERE dept= ‘information systems’
GROUP BY dept;
SELECT avg(salary) as avgsalary, dept
FROM employee
GROUP BY dept
HAVING dept = ‘information systems’;
PA152, Vlastislav Dohnal, FI MUNI, 2022 43
Use of Views
 Query optimizer replaces the view
with its definition
CREATE VIEW techlocation AS
SELECT ssnum, tech.dept, location
FROM employee, tech
WHERE employee.dept = tech.dept;
SELECT location
FROM techlocation
WHERE ssnum = 43253265;
PA152, Vlastislav Dohnal, FI MUNI, 2022 44
Use of Views
 Resulting query:
SELECT location
FROM employee, tech
WHERE employee.dept = tech.dept
AND ssnum = 43253265;
PA152, Vlastislav Dohnal, FI MUNI, 2022 45
Use of Views
 Example for PostgreSQL:
CREATE VIEW hotels_in_city AS
SELECT city, COUNT(*) AS count
FROM hotel
GROUP BY city;
 Using view
SELECT * FROM hotels_in_city
WHERE count > 8
SELECT * FROM hotels_in_city
WHERE city='city174'
PA152, Vlastislav Dohnal, FI MUNI, 2022 46
Use of Views
 Output of EXPLAIN
EXPLAIN SELECT * FROM hotels_in_city;
EXPLAIN SELECT * FROM hotels_in_city
WHERE city='city174’;
Use of functions:
 Compare:
EXPLAIN SELECT * FROM
(SELECT lower(city) as city, COUNT(*) AS cnt
FROM hotel GROUP BY city HAVING COUNT(*) > 3) x
WHERE city='city174';
EXPLAIN SELECT lower(city), cnt FROM
(SELECT city, COUNT(*) AS cnt FROM hotel
GROUP BY city HAVING COUNT(*) > 3) x
WHERE city='city174';
PA152, Vlastislav Dohnal, FI MUNI, 2022 47
Query Rewriting: Performance Impact
-10
0
10
20
30
40
50
60
70
80
Throughputratio(%)
SQLServer 2000
Oracle 8i
DB2 V7.1
100k Employees, 100k Students, 10 tech. depts
>10 000
PA152, Vlastislav Dohnal, FI MUNI, 2022 48
Aggregate Maintenance
 Example:
Orders of a store chain
 Order(ordernum, itemnum, quantity, purchaser,
vendor)
 Item(itemnum, description, price)
 Clustered indexes on itemnum of Order and Item
Queries issues every five minutes :
 The total dollar amount on order from a particular
vendor.
 The total dollar amount on order by a particular
store outlet (purchaser).
PA152, Vlastislav Dohnal, FI MUNI, 2022 49
Aggregate Maintenance
 Queries:
 SELECT vendor, sum(quantity*price)
FROM order, item
WHERE order.itemnum = item.itemnum
GROUP BY vendor;
 SELECT purchaser, sum(quantity*price)
FROM order, item
WHERE order.itemnum = item.itemnum
GROUP BY purchaser;
Query costs?
  expensive
PA152, Vlastislav Dohnal, FI MUNI, 2022 50
Aggregate Maintenance
 Ways to speed up?
Use of views?
  no impact
Use of temporaries?
  helps
PA152, Vlastislav Dohnal, FI MUNI, 2022 51
Aggregate Maintenance
 Add temporaries
OrdersByVendor(vendor, amount)
OrdersByPurchaser(purchaser, amount)
 These redundant tables must be updated
When to update?
 After each update to order, or item?
 triggers can be used to implement this explicitly
 Recreate from scratch periodically
Costs of update
 Update overhead must be less than original costs.
PA152, Vlastislav Dohnal, FI MUNI, 2022 52
Materialized Views
 View data content stored in a table
Automatic updates by DBMS
 Typical…
Transparent expansion performed by the
optimizer based on cost
 It is the optimizer and not the programmer that
performs query rewriting
PA152, Vlastislav Dohnal, FI MUNI, 2022 53
Materialized Views
 In Oracle
CREATE MATERIALIZED VIEW OrdersByVendor
BUILD IMMEDIATE REFRESH COMPLETE
ENABLE QUERY REWRITE
AS
SELECT vendor, sum(quantity*price) AS amount
FROM order, item
WHERE order.itemnum = item.itemnum
GROUP BY vendor;
PA152, Vlastislav Dohnal, FI MUNI, 2022 54
Materialized Views
 Example
QUERY REWRITE
Query:
 SELECT vendor, sum(quantity*price) AS amount
FROM order, item
WHERE order.itemnum = item.itemnum
AND vendor=‘Apple’;
 OrdersByVendor view will be substituted:
 SELECT vendor, amount FROM OrdersByVendor
WHERE vendor=‘Apple’;
PA152, Vlastislav Dohnal, FI MUNI, 2022 55
Materialized Views
 Example
SQLServer, using triggers for maintenance
1m orders – 5 purchasers and 20 vendors
10k items
- 62.2
21900
31900
-5000
0
5000
10000
15000
20000
25000
30000
35000
insert vendor total purchaser total
gain with aggregate maintenance (%)
PA152, Vlastislav Dohnal, FI MUNI, 2022 60
Database Triggers
 A trigger is a stored procedure
Collection of SQL statements that executes as
a result of an event.
 Events:
DML – insert, update, delete
Timing events (not common)
DDL – definition of tables, …
PA152, Vlastislav Dohnal, FI MUNI, 2022 61
Database Triggers
 Independent of an application/API
Executed as part of the transaction containing
the enabling event by DBMS.
 Not using triggers requires implementation
of constraints in app
 Induce overhead
May insert to other tables, …
Firing can be conditional
 E.g., after price update, number of ordered items
 Not on updates to item description, …
PA152, Vlastislav Dohnal, FI MUNI, 2022 62
Global (Schema) Changes
 Materialized views
If refreshed automaticly…
 Creating indexes
 Schema change
See next slides
 Relation partitioning
See next slides
 …
PA152, Vlastislav Dohnal, FI MUNI, 2022 63
Using Indexes
 Small table
Indexes created
But not used
 Example
courses(id, title, credits)
SELECT COUNT(*) FROM courses;
 Result: 5
SELECT * FROM courses
WHERE id=‘MA102’;
 Table-scan is used
PA152, Vlastislav Dohnal, FI MUNI, 2022 64
Using Indexes
 Relation read sequentially (table scan / seq scan)
All records are checked
 slow
 Creating index (index scan)
Speeds up SELECTs
Slows down INSERTs, UPDATEs, DELETEs
 Indexes must be updated
PA152, Vlastislav Dohnal, FI MUNI, 2022 65
Influence of Indexes on Costs
 False friends
More indexes, faster evaluation!
 In theory, valid only for SELECT queries
 Each index increases update costs
Necessary to update both relation and index
Exception:
 INSERT INTO table SELECT …
 DELETE FROM table WHERE …
PA152, Vlastislav Dohnal, FI MUNI, 2022 66
Influence of Indexes: Example
 Relation
 StarsIn(id, movieTitle, movieYear, starName)
 Qmovies
 SELECT movieTitle, movieYear FROM StarsIn
WHERE starName=‘name’;
 Qstars
 SELECT starName FROM StarsIn
WHERE movieTitle=‘title’ AND movieYear=year;
 Insert
 INSERT INTO StarsIn (movieTitle, movieYear,starName)
VALUES (‘title’, year, ‘name’);
PA152, Vlastislav Dohnal, FI MUNI, 2022 67
Influence of Indexes: Example
 Assumptions
 B(StarsIn) = 10 blocks
 Each actor stars in 3 movies on average
 Each movie has 3 stars on average
 Relation is not sorted
 If index is present, 3 reads of disk (3 records).
 Searching in index
 1 block read
 Index update
 1 block read and 1 block write
 Insert to relation
 1 block read and 1 block write
 i.e., not locating any free block
PA152, Vlastislav Dohnal, FI MUNI, 2022 68
Influence of Indexes: Example
 Costs in blocks for individual operations
 Probability of individual operations
 Qmovies=p1, Qstars=p2, Insert=1 - p1 - p2
Ope-
ration
No
indexes
Index
starName
Index
movieTitle, movieYear
Both
indexes
Qmovies 10 4 10 4
Qstars 10 10 4 4
Insert 2 4 4 6
Avg.
costs
2 + 8p1 + 8p2 4 + 6p2 4 + 6p1 6 - 2p1 - 2p2
 Scenario 1: p1 = p2 = 0.1  no indexes
 Scenario 2: p1 = p2 = 0.4  both indexes
PA152, Vlastislav Dohnal, FI MUNI, 2022 69
Optimizing Indexes
1. Define a batch of operations
 i.e., composition of load
 Analyze log files to find out query types,
updates and their frequencies
2. Suggest different indexes
 Optimizer estimates costs to evaluate the
batch
 Choose a configuration with least costs
 Create corresponding indexes
PA152, Vlastislav Dohnal, FI MUNI, 2022 70
Optimizing Indexes
 Point 2 in detail:
 A set of possible indexes
 Initially without any index
 Repeat
 Estimate costs of batch for each possible index
 Create the index offering the greatest decrease of costs
 Use it in next iterations
 Repeat until an index has been created
 The process can be done automatically
 MS AutoAdmin (http://research.microsoft.com/en-us/projects/autoadmin/default.aspx)
 MS Index Tuning Wizard (S. Chaudhuri, V. Narasayya: An efficient, Cost-Driven Index Selection Tool for Microsoft
SQL Server. Proceedings of VLDB Conference, 1997) & the best 10-year paper in 2007!
 Oracle 10g (http://www.oracle-base.com/articles/10g/AutomaticSQLTuning10g.php)
PA152, Vlastislav Dohnal, FI MUNI, 2022 71
Referential Integrity
 Creating foreign key may not induce an
index on the key’s attributes
 Example in PostgreSQL (db.fi.muni.cz)
Hotel – primary key id
Room – primary key id, foreign key hotel_id
 V(Room, hotel_id) = 6
 Queries (check EXPLAIN plans)
SELECT * FROM hotel WHERE id=2;
SELECT * FROM room WHERE hotel_id=2 AND number=1;
PA152, Vlastislav Dohnal, FI MUNI, 2022 72
Referential Integrity
 Query
 No indexes (output of EXPLAIN SELECT…)
 Create an index on hotel_id
Seq Scan on room (cost=0.00..8750.89 rows=105 width=22)
Filter: ((hotel_id = 2) AND (number = 1))
CREATE INDEX room_hotel_id_fkey ON room (hotel_id);
Bitmap Heap Scan on room (cost=974.87..5782.99 rows=105 width=22)
Recheck Cond: (hotel_id = 2)
Filter: (number = 1)
-> Bitmap Index Scan on room_hotel_id_fkey (cost=0.00..974.84 rows=52608 width=0)
Index Cond: (hotel_id = 2)
SELECT * FROM room WHERE hotel_id=2 AND number=1;
PA152, Vlastislav Dohnal, FI MUNI, 2022 73
Referential Integrity
 Foreign keys may slow down deletions
drastically
 Example
DELETE FROM hotel WHERE id=500;
 Foreign key in room references table hotel
 During deletion room must be checked for
existence of records hotel_id=500
 Recommendation
Create indexes on foreign keys
Combining Indexes
 Query
 Index only on hotel_id
 Index only on number
PA152, Vlastislav Dohnal, FI MUNI, 2022 74
SELECT * FROM room WHERE hotel_id=2 AND number=1;
"Bitmap Heap Scan on room (cost=960.80..5756.77 rows=103 width=22)"
" Recheck Cond: (hotel_id = 2)"
" Filter: (number = 1)"
" -> Bitmap Index Scan on room_hotel_id_fkey (cost=0.00..960.77 rows=51798 width=0)"
" Index Cond: (hotel_id = 2)"
"Bitmap Heap Scan on room (cost=13.02..1688.30 rows=103 width=22)"
" Recheck Cond: (number = 1)"
" Filter: (hotel_id = 2)"
" -> Bitmap Index Scan on room_number_idx (cost=0.00..12.99 rows=628 width=0)"
" Index Cond: (number = 1)"
Combining Indexes
 Query
 Index on hotel_id, number
 Two indexes on hotel_id and number
PA152, Vlastislav Dohnal, FI MUNI, 2022 75
SELECT * FROM room WHERE hotel_id=2 AND number=1;
"Bitmap Heap Scan on room (cost=5.34..366.14 rows=103 width=22)"
" Recheck Cond: ((hotel_id = 2) AND (number = 1))"
" -> Bitmap Index Scan on room_hotel_id_number_fkey (cost=0.00..5.31 rows=103 width=0)"
" Index Cond: ((hotel_id = 2) AND (number = 1))"
"Bitmap Heap Scan on room (cost=974.07..1334.86 rows=103 width=22)"
" Recheck Cond: ((number = 1) AND (hotel_id = 2))"
" -> BitmapAnd (cost=974.07..974.07 rows=103 width=0)"
" -> Bitmap Index Scan on room_number_idx (cost=0.00..12.99 rows=628 width=0)"
" Index Cond: (number = 1)"
" -> Bitmap Index Scan on room_hotel_id_fkey (cost=0.00..960.77 rows=51798 width=0)"
" Index Cond: (hotel_id = 2)"
Reversed-key Index
 Specialty by Oracle
 Increases index updates throughput
Number of insertions / updates per second
 Idea
Key values are reversed in index
 sequence-generated values are scattered
 E.g., 12345 and 12346  54321 and 64321
 diminishes collisions in concurrent index
updates
 CREATE INDEX idx ON tab(attr) REVERSE;
PA152, Vlastislav Dohnal, FI MUNI, 2022 76
PA152, Vlastislav Dohnal, FI MUNI, 2022 77
Global (Schema) Changes
 Creating indexes
 Schema change
See next slides
 Relation partitioning
See next slides
Lecture Takeaways
 Pure predicates vs functional indexes
Time with time zone issues
 Avoid unnecessary statements
 Do not overuse temp tables
 Mind impacts of new indexes
PA152, Vlastislav Dohnal, FI MUNI, 2022 80