BKM_DATS: Databázové systémy
8. Relational DB Design
Vlastislav Dohnal
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 2
Relational Database Design
Features of Good Relational Design
Atomic Domains and First Normal Form
Decomposition Using Functional Dependencies
Functional Dependency Theory
Algorithms for Functional Dependencies
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 3
Combine Schemas?
Suppose we combine instructor(ID, name, salary, dept_name) and
department(dept_name, building, budget) into inst_dept
No connection to a relationship set inst_dept !
Result is possible repetition of information
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 4
What About Smaller Schemas?
Suppose we had started with
inst_dept (ID, name, salary, dept_name, building, budget)
How would we know to split up (decompose) it into instructor and
department?
Write a rule “if there were a schema (dept_name, building, budget), then
dept_name would be a candidate key”
Denote as a functional dependency:
dept_name → building, budget
In inst_dept, because dept_name is not a candidate key, the building and
budget of a department may have to be repeated.
This indicates the need to decompose inst_dept
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 5
What About Smaller Schemas? (cont.)
inst_dept (ID, name, salary, dept_name, building, budget)
Not all decompositions are good.
Suppose we decompose employee(ID, name, street, city, salary) into
instructor(ID, name, salary) and department(dept_name, building, budget)
Do we lose information?
We cannot reconstruct the original employee relation.
This is a lossy decomposition.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 6
A Lossy Decomposition
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 7
Example of Lossless Decomposition
Lossless decomposition
Decomposition of
R = (A, B, C) into R1 = (A, B) R2 = (B, C)
r =? A,B (r)  B,C (r)
B
1
2
B,C(r)
C
A
B
A B


1
2
C
A
B
A B


1
2
r
C
A
B
A


B
1
2
A,B(r)
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 8
Goal: Devise a Theory for the Following
Decide whether a particular relation R is in a “good” form.
In the case that a relation R is not in “good” form, decompose it into a
set of relations {R1, R2, ..., Rn} such that
each relation is in good form
the decomposition is a lossless decomposition
Our theory is based on:
functional dependencies
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 9
Functional Dependencies
Constraints on the set of legal relations.
Require that the value for a particular set of attributes determines the
value for another set of attributes uniquely.
E.g., employee_id determines employee name and address.
A functional dependency is a generalization of the notion of a key.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 10
Functional Dependencies (Cont.)
Let R be a relation schema   R and   R are non-empty
The functional dependency
 → 
holds on R if and only if for any legal relation r(R), whenever any
two tuples t1 and t2 of r agree on the attributes , they also agree
on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
Read  →  as “ depends on  ”
Example:
Consider r(A,B) with the following instance of r.
On this instance, A → B does NOT hold, but B → A does hold.
1 4
1 5
3 7
A B
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 11
Use of Functional Dependencies
We use functional dependencies to:
test relations to see if they are legal under a given set of functional
dependencies.
If a relation r is legal under a set F of functional dependencies,
we say that r satisfies F.
specify constraints on the set of legal relations
We say that F holds on R if all legal relations on R satisfy the
set of functional dependencies F.
Note
A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on
all legal instances.
For example, a specific instance of instructor(ID, name, salary)
may, by chance, satisfy
name → ID.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 12
Use of Functional Dependencies (Cont.)
K is a superkey for a relation schema R if and only if K → R
K is a candidate key for R if and only if
K → R, and
for no   K,  → R
Meaning: there is only one value for each value of K.
Functional dependencies allow us to express constraints that cannot
be expressed using superkeys.
Consider the schema:
inst_dept (ID, name, salary, dept_name, building, budget)
We expect these functional dependencies to hold:
dept_name → building
ID → building
ID → dept_name
but would not expect the following to hold:
dept_name → salary
There is only one building
for each department.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 13
Functional Dependencies (Cont.)
A functional dependency is trivial if it is satisfied by all instances of a
relation
Example:
ID, name → ID
name → name
In general,  →  is trivial if   
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 14
Closure of a Set of Functional Dependencies
Given a set F of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
Example
If A → B and B → C, then we can infer that A → C
The set of all functional dependencies logically implied by F is the
closure of F.
We denote the closure of F by F+
.
F+ is a superset of F.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 15
Closure of a Set of Functional Dependencies
We can find F+, the closure of F, by repeatedly applying
Armstrong’s Axioms:
if   , then  →  (reflexivity)
if  → , then   →   (augmentation)
if  → , and  → , then  →  (transitivity)
These rules are
sound (generate only functional dependencies that actually hold),
and
complete (generate all functional dependencies that hold).
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 16
Example
R = (A, B, C, G, H, I)
F = { A → B
A → C
CG → H
CG → I
B → H}
some members of F+
A → H
by transitivity from A → B and B → H
AG → I
by augmenting A → C with G, to get AG → CG
and then transitivity with CG → I
CG → HI
by augmenting CG → I to infer CG → CGI,
and augmenting of CG → H to infer CGI → HI,
and then transitivity
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 17
Closure of Functional Dependencies (Cont.)
Additional rules:
If  →  holds and  →  holds, then  →   holds
(union)
If  →   holds, then  →  holds and  →  holds
(decomposition)
If  →  holds and   →  holds, then   →  holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 18
Closure of Attribute Sets
Given a set of attributes , define the closure of  under F as a set of
attributes that are functionally determined by  under F
Denoted by +
Algorithm to compute +, the closure of  under F
result := ;
while (changes to result) do
for each  →  in F do
begin
if   result then result := result  
end
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 19
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A → B
A → C
CG → H
CG → I
B → H}
(AG)+
1. result = AG
2. result = ABCG (A → C and A → B)
3. result = ABCGH (CG → H and CG  AGBC)
4. result = ABCGHI (CG → I and CG  AGBCH)
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 20
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
Testing for superkey:
To test if  is a superkey, we compute +, and check if + contains
all attributes of R.
Testing functional dependencies
To check if a functional dependency  →  holds (or, in other
words, is in F+), just check if   +.
That is, we compute + by using attribute closure, and then
check if it contains .
It is a simple and cheap test, and very useful.
Computing closure of F (F+)
For each   R, we find the closure +, and for each S  +, we
output a functional dependency  → S.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 21
Example of Test for Candidate Key
R = (A, B, C, G, H, I)
F = {A → B
A → C
CG → H
CG → I
B → H}
Is AG a candidate key?
1. Is AG a super key?
1. Does AG → R?  Is (AG)+  R ?
(AG)+ = ABCGHI
2. Is any subset of AG a superkey?
1. Does A → R?  Is (A)+  R ?
2. Does G → R?  Is (G)+  R ?
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 22
Design Goals
Goal for a relational database design is:
BCNF, and
Lossless, and
Dependency preservation.
If we cannot achieve this, we accept one of
Lack of dependency preservation
Redundancy due to use of 3NF
Interestingly, SQL does not provide a direct way of specifying
functional dependencies other than super-keys.
Can specify functional dependences using assertions, but they are
expensive to test, and currently not supported by any of the widely
used databases!
Even if we had a dependency preserving decomposition, using SQL
we would not be able to efficiently test a functional dependency whose
left hand side is not a key.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 23
Lossless Decomposition
For the case of R = (R1, R2), we require that for all possible relations r
on schema R
r = R1 (r )  R2 (r )
A decomposition of R into R1 and R2 is lossless if at least one of the
following dependencies is in F+:
R1  R2 → R1
R1  R2 → R2
The above functional dependencies are a sufficient condition for
lossless decomposition.
The dependencies are a necessary condition only if all constraints are
functional dependencies.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 24
Dependency Preservation
Let Fi be the set of dependencies F + that include only attributes in Ri.
A decomposition is dependency preserving, if
(F1  F2  …  Fn )+ = F +
If it is not, then checking updates for violation of functional
dependencies may require computing joins, which is
expensive.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 25
Example
R = (A, B, C )
F = { A → B
B → C }
Key = {A}
R is not in BCNF
Decomposition R1 = (A, B), R2 = (B, C)
R1 and R2 in BCNF
Lossless decomposition
Dependency preserving
Alternative decomposition R1 = (A, B), R2 = (A, C)
Lossless decomposition?
R1  R2 = {A} and A → AB
Dependency preserving?
We cannot check B → C without computing R1  R2
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 26
First Normal Form
Domain is atomic if its elements are indivisible units
Examples of non-atomic domains:
Set of names, composite attributes
Identification numbers like CS101 that can be broken up into
parts (department code and course id)
A relational schema R is in first normal form if the domains of all
attributes of R are atomic
Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
Example
Set of accounts stored with each customer, and set of owners
stored with each account
We assume all relations are in first normal form
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 27
First Normal Form (Cont.)
Atomicity is a property of how the elements of the domain are used.
Example
Strings would normally be considered indivisible
Suppose that students are given roll numbers which are strings of
the form CS0012 or EE1127
If the first two characters are extracted to find the department,
the domain of roll numbers is not atomic.
Doing so is a bad idea:
leads to encoding of information in application program
rather than in the database.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 28
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
 → 
where   R and   R, at least one of the following holds:
 →  is trivial (i.e.,   )
 is a superkey for R (i.e.,  → R)
Example schema not in BCNF:
instr_dept (ID, name, salary, dept_name, building, budget )
because dept_name → building, budget holds on instr_dept,
but dept_name is not a superkey.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 29
Decomposing a Schema into BCNF
Suppose we have a schema R
A non-trivial dependency  →  causes a violation of BCNF, so
we decompose R into:
R1 = (    )
R2 = ( R - (  -  ) )
In our example, dept_name → building, budget
 = dept_name
 = building, budget
and inst_dept is replaced by
R1 = (    ) = ( dept_name, building, budget )
R2 = ( R - (  -  ) ) = ( ID, name, salary, dept_name )
instr_dept (ID, name, salary, dept_name, building, budget )
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 30
BCNF and Dependency Preservation
Constraints, including functional dependencies, are costly to check in
practice unless they pertain to only one relation
A decomposition is dependency preserving
If it is sufficient to test only dependencies on each individual
relation of the decomposition in order to ensure that all functional
dependencies hold.
Because it is not always possible to achieve both BCNF and
dependency preservation, we consider a weaker normal form, known
as third normal form.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 31
Third Normal Form
A relation schema R is in third normal form (3NF) if for all:
 →  in F+
where   R and   R, at least one of the following holds:
 →  is trivial (i.e.,   )
 is a superkey for R
Each attribute A in  –  is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF, it is in 3NF
Since in BCNF one of the first two conditions above must hold.
Third condition is the minimal relaxation of BCNF to ensure
dependency preservation.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 32
BCNF and Dependency Preservation
It is not always possible to get a BCNF decomposition that is
dependency preserving.
Relation dept_study_advisor (s_ID, a_ID, dept_name)
F = { s_ID, dept_name → a_ID,
a_ID → dept_name }
Two candidate keys = s_ID, dept_name and
s_ID, a_ID
dept_study_advisor is not in BCNF
Any decomposition of dept_study_advisor will fail to preserve
s_ID, dept_name → a_ID
This implies that testing for s_ID, dept_name → a_ID
requires a join.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 33
3NF Example
Relation dept_study_advisor:
dept_study_advisor (s_ID, a_ID, dept_name)
F = {s_ID, dept_name → a_ID,
a_ID → dept_name}
Two candidate keys:
s_ID, dept_name,
a_ID, s_ID
dept_study_advisor is in 3NF
s_ID, dept_name → a_ID
s_ID, dept_name is a superkey
a_ID → dept_name
a_ID is not a superkey
dept_name is contained in a candidate key
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 34
Redundancy in 3NF
There is some redundancy in this schema
Example of problems due to redundancy in 3NF
dept_study_advisor (s_ID, a_ID, dept_name)
F = {s_ID, dept_name → a_ID,
a_ID → dept_name}
repetition of information (e.g., the relationship Jane, FI)
e.g., (a_ID, dept_name)
need to use null values (e.g., to represent the relationship
Karol, ESF where there is no corresponding value for s_ID).
e.g., a relation (a_ID, dept_name) must exist if there is no other
separate relation mapping instructors to departments
s_ID
Adam
Bob
Joe
null
a_ID
Jane
Jane
Jane
Karol
dept_name
FI
FI
FI
ESF
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 35
Second Normal Form
A functional dependency  →  is called a partial dependency
if there is a subset  of , i.e.,   , such that  → .
We say that  is partially dependent on .
A relation R is in second normal form (2NF) if it is in 1NF and
each attribute A in R meets one of the following:
A appears in a candidate key;
A is not partially dependent on any candidate key.
i.e., A is dependent on a complete candidate key, but it may be
a transitive dependence.
Every 3NF is in 2NF.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 36
Testing for BCNF
To check if a non-trivial dependency → causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
Simplified test: To check if a relation schema R is in BCNF, it suffices to
check only the dependencies in the given set F for violation of BCNF, rather
than checking all dependencies in F+.
If none of the dependencies in F causes a violation of BCNF, then none of
the dependencies in F+ will cause a violation of BCNF either.
However, simplified test using only F is incorrect when testing a relation
in a decomposition of R
Consider R = (A, B, C, D, E), with F = { A → B, BC → D}
Decompose R into R1 = (A,B) and R2 = (A,C,D,E)
Neither of the dependencies in F contain only attributes from
(A,C,D,E) so we might be misled into thinking R2 satisfies BCNF.
In fact, dependency AC → D in F+ shows R2 is not in BCNF.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 37
Testing Decomposition for BCNF
To check if a relation Ri in a decomposition of R is in BCNF,
Either test Ri for BCNF with respect to the restriction of F+ to Ri
(that is, all dependences in F+ that contain only attributes from Ri)
or use the original set of dependencies F that hold on R, but with
the following test:
for every set of attributes   Ri, check that + (the
attribute closure of ) either includes no attribute of Ri - ,
or includes all attributes of Ri.
If the condition is violated by some →  in F,
the dependency
 → (+ - )  Ri
can be shown to hold on Ri, and Ri violates BCNF.
We use above dependency to decompose Ri
So it is a trivial FD.
So  is a superkey.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 38
BCNF Decomposition Algorithm
result := {R }; -- a set of relational schemata
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let  →  be a nontrivial functional dependency that
holds on Ri such that  → Ri is not in F +,
and    = ;
result := (result – Ri )  (Ri – )  (,  );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 39
Example of BCNF Decomposition
class (course_id, title, dept_name, credits, sec_id, semester, year,
building, room_number, capacity, time_slot_id)
Functional dependencies:
course_id → title, dept_name, credits
building, room_number → capacity
course_id, sec_id, semester, year → building, room_number,
time_slot_id
A candidate key {course_id, sec_id, semester, year}.
BCNF Decomposition:
course_id → title, dept_name, credits holds
but course_id is not a superkey.
We replace class by:
course(course_id, title, dept_name, credits)
class-1 (course_id, sec_id, semester, year, building,
room_number, capacity, time_slot_id)
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 40
BCNF Decomposition (Cont.)
course(course_id, title, dept_name, credits)
class-1 (course_id, sec_id, semester, year, building,
room_number, capacity, time_slot_id)
course is in BCNF
How do we know this?
building, room_number → capacity holds on class-1
but {building, room_number} is not a superkey for class-1.
We replace class-1 by:
classroom (building, room_number, capacity)
section (course_id, sec_id, semester, year, building,
room_number, time_slot_id)
classroom and section are in BCNF.
course_id → title, dept_name, credits
building, room_number → capacity
course_id, sec_id, semester, year → building, room_number, time_slot_id
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2022 41
Testing for 3NF
Optimization
Need to check only dependences in F.
Need not check all dependences in F+.
Use attribute closure to check for each dependency  → , if  is a
superkey.
If  is not a superkey, we have to verify whether each attribute in -
is contained in a candidate key of R
This test is rather more expensive, since it involves finding
candidate keys.
Testing for 3NF has been shown to be NP-hard.
Interestingly, decomposition into third normal form (described
shortly) can be done in polynomial time.