Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Chapter 6: Formal Relational Query Languages ©Silberschatz, Korth and Sudarshan6.2Database System Concepts - 6th Edition Chapter 6: Formal Relational Query Languages  Relational Algebra  Tuple Relational Calculus  Domain Relational Calculus ©Silberschatz, Korth and Sudarshan6.3Database System Concepts - 6th Edition Relational Algebra  Procedural language  Six basic operators  select:   project:   union:   set difference: –  Cartesian product:   rename:   The operators take one or two relations as inputs and produce a new relation as a result. ©Silberschatz, Korth and Sudarshan6.4Database System Concepts - 6th Edition Select Operation – Example  Relation r  A=B  D > 5 (r) ©Silberschatz, Korth and Sudarshan6.5Database System Concepts - 6th Edition Select Operation  Notation:  p(r)  p is called the selection predicate  Defined as: p(r) = {t | t  r and p(t)} Where p is a formula in propositional calculus consisting of terms connected by conjunctions:  (and),  (or),  (not) formula := term term term ( term ) term := expr expr expr ( expr ) expr := attribute constant is one of: =, , >, , <,   Example of selection:  dept_name=„Physics‟ (instructor) ©Silberschatz, Korth and Sudarshan6.6Database System Concepts - 6th Edition Project Operation – Example  Relation r:  A,C (r) ©Silberschatz, Korth and Sudarshan6.7Database System Concepts - 6th Edition Project Operation  Notation: where A1, A2 are attribute names and r is a relation name.  The result is defined as the relation of k columns obtained by erasing the columns that are not listed  Duplicate rows removed from result, since relations are sets  Example: To eliminate the dept_name attribute of instructor ID, name, salary (instructor) )(,,2,1 r kAAA  ©Silberschatz, Korth and Sudarshan6.8Database System Concepts - 6th Edition Union Operation – Example  Relations r, s:  r  s: ©Silberschatz, Korth and Sudarshan6.9Database System Concepts - 6th Edition Union Operation  Notation: r  s  Defined as: r  s = {t | t  r or t  s}  For r  s to be valid. 1. r, s must have the same arity (same number of attributes) 2. The attribute domains must be compatible (example: 2nd column of r deals with the same type of values as does the 2nd column of s)  Example: to find all courses taught in the Fall 2009 semester, or in the Spring 2010 semester, or in both course_id ( semester=“Fall” Λ year=2009 (section))  course_id ( semester=“Spring” Λ year=2010 (section)) ©Silberschatz, Korth and Sudarshan6.10Database System Concepts - 6th Edition Set difference of two relations  Relations r, s:  r – s: ©Silberschatz, Korth and Sudarshan6.11Database System Concepts - 6th Edition Set Difference Operation  Notation r – s  Defined as: r – s = {t | t  r and t  s}  Set differences must be taken between compatible relations.  r and s must have the same arity  attribute domains of r and s must be compatible  Example: to find all courses taught in the Fall 2009 semester, but not in the Spring 2010 semester course_id ( semester=“Fall” Λ year=2009 (section)) − course_id ( semester=“Spring” Λ year=2010 (section)) ©Silberschatz, Korth and Sudarshan6.12Database System Concepts - 6th Edition Cartesian-Product Operation – Example  Relations r, s:  r  s: ©Silberschatz, Korth and Sudarshan6.13Database System Concepts - 6th Edition Cartesian-Product Operation  Notation r  s  Defined as: r  s = {t q | t  r and q  s}  Assume that attributes of r(R) and s(S) are disjoint.  That is, R  S = .  If attributes of r(R) and s(S) are not disjoint, then renaming must be used. ©Silberschatz, Korth and Sudarshan6.14Database System Concepts - 6th Edition Composition of Operations  Can build expressions using multiple operations  Example: A=C(r  s)  r  s  A=C(r  s) ©Silberschatz, Korth and Sudarshan6.15Database System Concepts - 6th Edition Rename Operation  Allows us to name, and therefore to refer to, the results of relationalalgebra expressions.  Allows us to refer to a relation by more than one name.  Example:  x (E) returns the expression E under the name X  If a relational-algebra expression E has arity n, then returns the result of expression E under the name X, and with the attributes renamed to A1 , A2 , …., An . )(),...,2,1( E nAAAx ©Silberschatz, Korth and Sudarshan6.16Database System Concepts - 6th Edition Example Query  Find the largest salary in the university  Step 1: find instructor salaries that are less than some other instructor salary (i.e. not maximum) – using a copy of instructor under a new name d  instructor.salary ( instructor.salary < d.salary (instructor  d (instructor)))  Step 2: Find the largest salary  salary (instructor) – instructor.salary ( instructor.salary < d.salary (instructor  d (instructor))) ©Silberschatz, Korth and Sudarshan6.17Database System Concepts - 6th Edition Example Queries  Find the names of all instructors in the Physics department, along with the course_id of all courses they have taught  Query 1 instructor.ID,course_id (dept_name=„ Physics‟ (  instructor.ID=teaches.ID (instructor  teaches)))  Query 2 instructor.ID,course_id (instructor.ID=teaches.ID (  dept_name=„ Physics‟ (instructor)  teaches)) ©Silberschatz, Korth and Sudarshan6.18Database System Concepts - 6th Edition Formal Definition  A basic expression in the relational algebra consists of either one of the following:  A relation in the database  A constant relation  Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions:  E1  E2  E1 – E2  E1  E2  p (E1), P is a predicate on attributes in E1  s(E1), S is a list consisting of some of the attributes in E1   x (E1), x is the new name for the result of E1 ©Silberschatz, Korth and Sudarshan6.19Database System Concepts - 6th Edition Additional Operations We define additional operations that do not add any power to the relational algebra, but that simplify common queries.  Set intersection  Natural join  Outer join  Assignment ©Silberschatz, Korth and Sudarshan6.20Database System Concepts - 6th Edition Set-Intersection Operation  Notation: r  s  Defined as:  r  s = { t | t  r and t  s }  Assume:  r, s have the same arity  attributes of r and s are compatible  Note: r  s = r – (r – s) = s – (s – r) ©Silberschatz, Korth and Sudarshan6.21Database System Concepts - 6th Edition Set-Intersection Operation – Example  Relation r, s:  r  s ©Silberschatz, Korth and Sudarshan6.22Database System Concepts - 6th Edition  Notation: r s Natural-Join Operation  Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R  S obtained as follows:  Consider each pair of tuples tr from r and ts from s.  If tr and ts have the same value on each of the attributes in R  S, add a tuple t to the result, where  t has the same value as tr on r  t has the same value as ts on s  Example: R = (A, B, C, D) S = (E, B, D)  Result schema = (A, B, C, D, E)  r s is defined as: r.A, r.B, r.C, r.D, s.E (r.B = s.B  r.D = s.D (r  s)) ©Silberschatz, Korth and Sudarshan6.23Database System Concepts - 6th Edition Natural Join Example  Relations r, s:  r s ©Silberschatz, Korth and Sudarshan6.24Database System Concepts - 6th Edition Natural Join and Theta Join  Find the names of all instructors in the Comp. Sci. department together with the course titles of all the courses that the instructors teach   name, title ( dept_name=„Comp. Sci.‟ (instructor teaches course))  Natural join is associative  (instructor teaches) course is equivalent to instructor (teaches course)  Natural join is commutative  instructor teaches is equivalent to teaches instructor  The theta join operation r  s is defined as  r  s =  (r x s) ©Silberschatz, Korth and Sudarshan6.25Database System Concepts - 6th Edition Outer Join  An extension of the join operation that avoids loss of information.  Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.  Uses null values:  null signifies that the value is unknown or does not exist  All comparisons involving null are (roughly speaking) false by definition.  We shall study precise meaning of comparisons with nulls later ©Silberschatz, Korth and Sudarshan6.26Database System Concepts - 6th Edition Outer Join – Example  Relation instructor1  Relation teaches1 ID course_id 10101 12121 76766 CS-101 FIN-201 BIO-101 Comp. Sci. Finance Music ID dept_name 10101 12121 15151 name Srinivasan Wu Mozart ©Silberschatz, Korth and Sudarshan6.27Database System Concepts - 6th Edition  Left Outer Join instructor teaches Outer Join – Example  Join instructor teaches ID dept_name 10101 12121 Comp. Sci. Finance course_id CS-101 FIN-201 name Srinivasan Wu ID dept_name 10101 12121 15151 Comp. Sci. Finance Music course_id CS-101 FIN-201 null name Srinivasan Wu Mozart ©Silberschatz, Korth and Sudarshan6.28Database System Concepts - 6th Edition Outer Join – Example  Full Outer Join instructor teaches  Right Outer Join instructor teaches ID dept_name 10101 12121 76766 Comp. Sci. Finance null course_id CS-101 FIN-201 BIO-101 name Srinivasan Wu null ID dept_name 10101 12121 15151 76766 Comp. Sci. Finance Music null course_id CS-101 FIN-201 null BIO-101 name Srinivasan Wu Mozart null ©Silberschatz, Korth and Sudarshan6.29Database System Concepts - 6th Edition Outer Join using Joins  Outer join can be expressed using basic operations  e.g. r s can be written as (r s) U ( r – ∏R(r s) )  {(null, …, null)} ©Silberschatz, Korth and Sudarshan6.30Database System Concepts - 6th Edition Null Values  It is possible for tuples to have a null value, denoted by null, for some of their attributes  null signifies an unknown value or that a value does not exist.  The result of any arithmetic expression involving null is null.  Aggregate functions simply ignore null values (as in SQL)  For duplicate elimination and grouping, null is treated like any other value, and two nulls are assumed to be the same (as in SQL) ©Silberschatz, Korth and Sudarshan6.31Database System Concepts - 6th Edition Null Values  Comparisons with null values return the special truth value: unknown  If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5  Three-valued logic using the truth value unknown:  OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown  AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown  NOT: (not unknown) = unknown  In SQL there is a special operator “is null”, so “P is null” evaluates to true if predicate P evaluates to unknown  Result of select predicate is treated as false if it evaluates to unknown ©Silberschatz, Korth and Sudarshan6.32Database System Concepts - 6th Edition Extended Relational-Algebra-Operations  Generalized Projection  Aggregate Functions ©Silberschatz, Korth and Sudarshan6.33Database System Concepts - 6th Edition Generalized Projection  Extends the projection operation by allowing arithmetic functions to be used in the projection list.  E is any relational-algebra expression  Each of F1, F2, …, Fn are are arithmetic expressions involving constants and attributes in the schema of E.  Given relation instructor(ID, name, dept_name, salary) where salary is annual salary, get the same information but with monthly salary ID, name, dept_name, salary/12 (instructor) )(,...,, 21 EnFFF ©Silberschatz, Korth and Sudarshan6.34Database System Concepts - 6th Edition Aggregate Functions and Operations  Aggregation function takes a collection of values and returns a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values  Aggregate operation in relational algebra E is any relational-algebra expression  G1, G2 …, Gn is a list of attributes on which to group (can be empty)  Each Fi is an aggregate function  Each Ai is an attribute name  Note: Some books/articles use  instead of (Calligraphic G) )()(,),(),(,,, 221121 Emmn AFAFAFGGG  ©Silberschatz, Korth and Sudarshan6.35Database System Concepts - 6th Edition Aggregate Operation – Example  Relation r: A B         C 7 7 3 10  sum(c) (r) sum(c ) 27 ©Silberschatz, Korth and Sudarshan6.36Database System Concepts - 6th Edition Aggregate Operation – Example  Find the average salary in each department dept_name avg(salary) (instructor) avg ©Silberschatz, Korth and Sudarshan6.37Database System Concepts - 6th Edition Aggregate Functions (Cont.)  Result of aggregation does not have a name  Can use rename operation to give it a name  For convenience, we permit renaming as part of aggregate operation dept_name avg(salary) as avg_sal (instructor) ©Silberschatz, Korth and Sudarshan6.38Database System Concepts - 6th Edition Modification of the Database  The content of the database may be modified using the following operations:  Deletion  Insertion  Updating  All these operations can be expressed using the assignment operator () ©Silberschatz, Korth and Sudarshan6.39Database System Concepts - 6th Edition Deletion  A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected tuples are removed from the database.  Can delete only whole tuples; cannot delete values on only particular attributes  A deletion is expressed in relational algebra by: r  r – E where r is a relation and E is a relational algebra query.  Example:  Delete all account records in the Perryridge branch. account  account – branch_name = “Perryridge” (account ) ©Silberschatz, Korth and Sudarshan6.40Database System Concepts - 6th Edition Insertion  To insert data into a relation, we either:  specify a tuple to be inserted  write a query whose result is a set of tuples to be inserted  in relational algebra, an insertion is expressed by: r  r  E where r is a relation and E is a relational algebra expression.  The insertion of a single tuple is expressed by letting E be a constant relation containing one tuple.  Example:  Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch. account  account  {(“A-973”, “Perryridge”, 1200)} depositor  depositor  {(“Smith”, “A-973”)} ©Silberschatz, Korth and Sudarshan6.41Database System Concepts - 6th Edition Updating  A mechanism to change a value in a tuple without charging all values in the tuple  Use the generalized projection operator to do this task  Each Fi is either  the I th attribute of r, if the I th attribute is not updated, or,  if the attribute is to be updated Fi is an expression, involving only constants and the attributes of r, which gives the new value for the attribute  Example:  Make interest payments by increasing all balances by 5 percent. )(,,,, 21 rr lFFF   account   account_number, branch_name, balance * 1.05 (account) ©Silberschatz, Korth and Sudarshan6.44Database System Concepts - 6th Edition Multi-set Relational Algebra  Pure relational algebra removes all duplicates  e.g. after projection  Multi-set relational algebra retains duplicates, to match SQL semantics  SQL duplicate retention was initially for efficiency, but is now a feature  Multi-set relational algebra defined as follows  selection: has as many duplicates of a tuple as in the input, if the tuple satisfies the selection  projection: one tuple per input tuple, even if it is a duplicate  cross product: If there are m copies of t1 in r, and n copies of t2 in s, there are m  n copies of t1.t2 in r  s  Other operators similarly defined  E.g. union: m + n copies, intersection: min(m, n) copies difference: max(0, m – n) copies ©Silberschatz, Korth and Sudarshan6.45Database System Concepts - 6th Edition Relational Algebra and SQL  Assume the following expressions in multi-set relational algebra:   A1, .., An ( P (r1  r2  …  rm)) is equivalent to the following expression in SQL  select A1, A2, .. An from r1, r2, …, rm where P  A1, A2 sum(A3) ( P (r1  r2  …  rm))) is equivalent to the following expression in SQL  select A1, A2, sum(A3) from r1, r2, …, rm where P group by A1, A2 ©Silberschatz, Korth and Sudarshan6.46Database System Concepts - 6th Edition SQL and Relational Algebra  More generally, the non-aggregated attributes in the select clause may be a subset of the group by attributes, in which case the equivalence is as follows: select A1, sum(A3) from r1, r2, …, rm where P group by A1, A2 is equivalent to the following expression in multiset relational algebra  A1,sumA3( A1,A2 sum(A3) as sumA3( P (r1  r2  …  rm))) ©Silberschatz, Korth and Sudarshan6.47Database System Concepts - 6th Edition Tuple Relational Calculus ©Silberschatz, Korth and Sudarshan6.48Database System Concepts - 6th Edition Tuple Relational Calculus  A nonprocedural query language, where each query is of the form {t | P (t ) }  It is the set of all tuples t such that predicate P is true for t  t is a tuple variable, t [A ] denotes the value of tuple t on attribute A  t  r denotes that tuple t is in relation r  P is a formula similar to that of the predicate calculus ©Silberschatz, Korth and Sudarshan6.49Database System Concepts - 6th Edition Predicate Calculus Formula 1. Set of attributes and constants 2. Set of comparison operators: (e.g., , , , , , ) 3. Set of connectives: and (), or ()‚ not () 4. Implication (): x  y, if x if true, then y is true x  y x  y 5. Set of quantifiers:  t r (Q (t )) ”there exists” a tuple t in relation r such that predicate Q (t ) is true  t r (Q (t )) Q is true “for all” tuples t in relation r ©Silberschatz, Korth and Sudarshan6.50Database System Concepts - 6th Edition Example Queries  Find the ID, name, dept_name, salary for instructors whose salary is greater than $80,000  As in the previous query, but output only the ID attribute value {t |  s instructor (t [ID ] = s [ID ]  s [salary ]  80000)} Notice that a relation on schema (ID) is implicitly defined by the query {t | t  instructor  t [salary ]  80000} ©Silberschatz, Korth and Sudarshan6.51Database System Concepts - 6th Edition Example Queries  Find the names of all instructors whose department is in the Watson building {t | s  section (t [course_id ] = s [course_id ]  s [semester] = “Fall”  s [year] = 2009 v u  section (t [course_id ] = u [course_id ]  u [semester] = “Spring”  u [year] = 2010)}  Find the set of all courses taught in the Fall 2009 semester, or in the Spring 2010 semester, or both {t | s  instructor (t [name ] = s [name ]  u  department (u [dept_name ] = s[dept_name] “  u [building] = “Watson” ))} ©Silberschatz, Korth and Sudarshan6.52Database System Concepts - 6th Edition Safety of Expressions  It is possible to write tuple calculus expressions that generate infinite relations.  For example, { t |  t r } results in an infinite relation if the domain of any attribute of relation r is infinite  To guard against the problem, we restrict the set of allowable expressions to safe expressions.  An expression {t | P (t )} in the tuple relational calculus is safe if every component of t appears in one of the relations, tuples, or constants that appear in P  NOTE: this is more than just a syntax condition.  E.g. { t | t [A] = 5  true } is not safe --- it defines an infinite set with attribute values that do not appear in any relation or tuples or constants in P. ©Silberschatz, Korth and Sudarshan6.53Database System Concepts - 6th Edition Universal Quantification  Find all students who have taken all courses offered in the Biology department  {t |  r  student (t [ID] = r [ID])  ( u  course (u [dept_name]=“Biology”   s  takes (t [ID] = s [ID ]  s [course_id] = u [course_id]))}  Note that without the existential quantification on student, the above query would be unsafe if the Biology department has not offered any courses. ©Silberschatz, Korth and Sudarshan6.54Database System Concepts - 6th Edition Domain Relational Calculus ©Silberschatz, Korth and Sudarshan6.55Database System Concepts - 6th Edition Domain Relational Calculus  A nonprocedural query language equivalent in power to the tuple relational calculus  Each query is an expression of the form: {  x1, x2, …, xn  | P (x1, x2, …, xn)}  x1, x2, …, xn represent domain variables  P represents a formula similar to that of the predicate calculus ©Silberschatz, Korth and Sudarshan6.56Database System Concepts - 6th Edition Example Queries  Find the ID, name, dept_name, salary for instructors whose salary is greater than $80,000  {< i, n, d, s> | < i, n, d, s>  instructor  s  80000}  As in the previous query, but output only the ID attribute value  {< i> | < i, n, d, s>  instructor  s  80000}  Find the names of all instructors whose department is in the Watson building {< n > |  i, d, s (< i, n, d, s >  instructor   b, a (< d, b, a>  department  b = “Watson” ))} ©Silberschatz, Korth and Sudarshan6.57Database System Concepts - 6th Edition Example Queries { |  a, s, y, b, r, t (  section  s = “Fall”  y = 2009 ) v  a, s, y, b, r, t (  section ]  s = “Spring”  y = 2010)}  Find the set of all courses taught in the Fall 2009 semester, or in the Spring 2010 semester, or both This case can also be written as { |  a, s, y, b, r, t (  section  ( (s = “Fall”  y = 2009 ) v (s = “Spring”  y = 2010))}  Find the set of all courses taught in the Fall 2009 semester, and in the Spring 2010 semester { |  a, s, y, b, r, t (  section  s = “Fall”  y = 2009 )   a, s, y, b, r, t (  section ]  s = “Spring”  y = 2010)} ©Silberschatz, Korth and Sudarshan6.58Database System Concepts - 6th Edition Safety of Expressions The expression: {  x1, x2, …, xn  | P (x1, x2, …, xn )} is safe if all of the following hold: 1. All values that appear in tuples of the expression are values from dom (P ) (that is, the values appear either in P or in a tuple of a relation mentioned in P ). 2. For every “there exists” subformula of the form  x (P1(x )), the subformula is true if and only if there is a value of x in dom (P1) such that P1(x ) is true. 3. For every “for all” subformula of the form x (P1 (x )), the subformula is true if and only if P1(x ) is true for all values x from dom (P1). ©Silberschatz, Korth and Sudarshan6.59Database System Concepts - 6th Edition Universal Quantification  Find all students who have taken all courses offered in the Biology department  {< i > |  n, d, tc ( < i, n, d, tc >  student  ( ci, ti, dn, cr ( < ci, ti, dn, cr >  course  dn =“Biology”   si, se, y, g (  takes ))}  Note that without the existential quantification on student, the above query would be unsafe if the Biology department has not offered any courses. * Above query fixes bug in page 246, last query Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use End of Chapter 6