BKM_DATS: Databázové systémy
8. Relational DB Design
Vlastislav Dohnal
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 2
Relational Database Design
Features of Good Relational Design
Atomic Domains and First Normal Form
Decomposition Using Functional Dependencies
Functional Dependency Theory
Algorithms for Functional Dependencies
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 3
Combine Schemas?
Suppose we combine instructor(ID, name, salary, dept_name) and
department(dept_name, building, budget) into inst_dept
No connection to a relationship set inst_dept !
Result is possible repetition of information
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 4
What About Smaller Schemas?
Suppose we had started with
inst_dept (ID, name, salary, dept_name, building, budget)
How would we know to split up (decompose) it into instructor and
department?
Write a rule “if there were a schema (dept_name, building, budget), then
dept_name would be a candidate key”
Denote as a functional dependency:
dept_name → building, budget
In inst_dept, because dept_name is not a candidate key, the building and
budget of a department may have to be repeated.
This indicates the need to decompose inst_dept
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 5
What About Smaller Schemas? (cont.)
inst_dept (ID, name, salary, dept_name, building, budget)
Not all decompositions are good.
Suppose we decompose employee(ID, name, street, city, salary) into
instructor(ID, name, salary) and department(dept_name, building, budget)
Do we lose information?
We cannot reconstruct the original employee relation.
This is a lossy decomposition.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 6
A Lossy Decomposition
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 7
Example of Lossless Decomposition
Lossless decomposition
Decomposition of
R = (A, B, C) into R1 = (A, B) R2 = (B, C)
r =? A,B (r)  B,C (r)
B
1
2
B,C(r)
C
A
B
A B


1
2
C
A
B
A B


1
2
r
C
A
B
A


B
1
2
A,B(r)
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 8
Goal: Devise a Theory for the Following
Decide whether a particular relation R is in a “good” form.
In the case that a relation R is not in “good” form, decompose it into a
set of relations {R1, R2, ..., Rn} such that
each relation is in good form
the decomposition is a lossless decomposition
Our theory is based on:
functional dependencies
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 9
Functional Dependencies
Constraints on the set of legal relations.
Require that the value for a particular set of attributes determines the
value for another set of attributes uniquely.
E.g., employee_id determines employee name and address.
A functional dependency is a generalization of the notion of a key.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 10
Functional Dependencies (Cont.)
Let R be a relation schema   R and   R are non-empty
The functional dependency
 → 
holds on R if and only if for any legal relation r(R), whenever any
two tuples t1 and t2 of r agree on the attributes , they also agree
on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
Read  →  as “ depends on  ”
Example:
Consider r(A,B) with the following instance of r.
On this instance, A → B does NOT hold, but B → A does hold.
1 4
1 5
3 7
A B
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 11
Use of Functional Dependencies
We use functional dependencies to:
test relations to see if they are legal under a given set of functional
dependencies.
If a relation r is legal under a set F of functional dependencies,
we say that r satisfies F.
specify constraints on the set of legal relations
We say that F holds on R if all legal relations on R satisfy the
set of functional dependencies F.
Note
A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on
all legal instances.
For example, a specific instance of instructor(ID, name, salary)
may, by chance, satisfy
name → ID.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 12
Use of Functional Dependencies (Cont.)
K is a super key for a relation schema R if and only if K → R
K is a candidate key for R if and only if
K → R, and
for no   K,  → R
Meaning: there is only one value for each value of K.
Functional dependencies allow us to express constraints that cannot
be expressed using super keys.
Consider the schema:
inst_dept (ID, name, salary, dept_name, building, budget)
We expect these functional dependencies to hold:
dept_name → building
ID → building
ID → dept_name
but would not expect the following to hold:
dept_name → salary
There is only one building
for each department.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 13
Functional Dependencies (Cont.)
A functional dependency is trivial if it is satisfied by all instances of a
relation
Example:
ID, name → ID
name → name
In general,  →  is trivial if   
Example
Design a university system for managing departments, their
instructors, offered courses and enrolled students.
departments have name, building, address,
instructors have ID, name and affiliation with the home department
and courses they teach,
courses have ID, title, number of credits,
student have ID, name, enrolled semester,
students sign up to courses and have their grading and date of
passing.
Is this schema OK? What are the functional dependencies?
sys(dept_name, building, dept_address, instr_id, instr_name,
instr_dept, course_id, course_title, credits, stud_id, stud_name,
stud_sem, grading, passed_on)
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 14
Example
dept
_na
me
building dept_add
ress
instr_i
d
instr_n
ame
inst
r_de
pt
course
_id
course_title cr
ed
its
stud_id stud_na
me
stud_se
m
gradi
ng
passe
d_on
ESF Lipová Lipová
41a, Brno
2952 Dohnal FI BKM_
DATS
Databázové
systémy
6 402874 Niki
Lauda
1/2020 D 2021-
12-14
FI Botanick
á
Botanická
68a, Brno
2952 Dohnal FI PB168 DB a IS 4 581623 Max
Verstape
n
2/2021 B 2022-
01-10
ESF Lipová Lipová
41a, Brno
2952 Dohnal FI BKM_
DATS
Databázové
systémy
6 340265 Keke
Rosberg
1/2020 A 2021-
12-19
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 15
dept_name → building, dept_address
instr_id → instr_name, instr_dept
stud_id → stud_name, stud_sem
course_id → course_title, credits
course_id → instr_id
stud_id, course_id → grading, passed_on
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 16
Closure of a Set of Functional Dependencies
Given a set F of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
Example
If A → B and B → C, then we can infer that A → C
The set of all functional dependencies logically implied by F is the
closure of F.
We denote the closure of F by F+
.
F+ is a superset of F.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 17
Closure of a Set of Functional Dependencies
We can find F+, the closure of F, by repeatedly applying
Armstrong’s Axioms:
if   , then  →  (reflexivity)
if  → , then   →   (augmentation)
if  → , and  → , then  →  (transitivity)
These rules are
sound (generate only functional dependencies that actually hold),
and
complete (generate all functional dependencies that hold).
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 18
Example
R = (A, B, C, G, H, I)
F = { A → B
A → C
CG → H
CG → I
B → H}
some members of F+
A → H
by transitivity from A → B and B → H
AG → I
by augmenting A → C with G, to get AG → CG
and then transitivity with CG → I
CG → HI
by augmenting CG → I to infer CG → CGI,
and augmenting of CG → H to infer CGI → HI,
and then transitivity
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 20
Closure of Attribute Sets
Given a set of attributes , define the closure of  under F as a set of
attributes that are functionally determined by  under F
Denoted by +
Algorithm to compute +, the closure of  under F
result := ;
while (changes to result) do
for each  →  in F do
begin
if   result then result := result  
end
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 21
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A → B
A → C
CG → H
CG → I
B → H}
(AG)+
1. result = AG
2. result = ABCG (A → C and A → B)
3. result = ABCGH (CG → H and CG  AGBC)
4. result = ABCGHI (CG → I and CG  AGBCH)
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 22
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
Testing for super key:
To test if  is a super key, we compute +, and check if + contains
all attributes of R.
Testing functional dependencies
To check if a functional dependency  →  holds (or, in other
words, is in F+), just check if   +.
That is, we compute + by using attribute closure, and then
check if it contains .
It is a simple and cheap test, and very useful.
Computing closure of F (F+)
For each   R, we find the closure +, and for each S  +, we
output a functional dependency  → S.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 23
Example of Test for Candidate Key
R = (A, B, C, G, H, I)
F = {A → B
A → C
CG → H
CG → I
B → H}
Is AG a candidate key?
1. Is AG a super key?
1. Does AG → R?  Is (AG)+  R ?
(AG)+ = ABCGHI
2. Is any subset of AG a super key?
1. Does A → R?  Is (A)+  R ?
2. Does G → R?  Is (G)+  R ?
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 24
Design Goals
Goal for a relational database design is:
BCNF, and
Lossless, and
Dependency preservation.
If we cannot achieve this, we accept one of
Lack of dependency preservation
Redundancy due to the use of 3NF
Interestingly, SQL does not provide a direct way of specifying
functional dependencies other than super-keys.
Can specify functional dependencies using assertions, but they
are expensive to test and currently not supported by any widely
used databases!
Even if we had a dependency preserving decomposition, using SQL,
we would not be able to efficiently test a functional dependency whose
left-hand side is not a key.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 25
Lossless Decomposition
For the case of R = (R1, R2), we require that for all possible relations r
on schema R
r = R1 (r )  R2 (r )
A decomposition of R into R1 and R2 is lossless if at least one of the
following dependencies is in F+:
R1  R2 → R1
R1  R2 → R2
The above functional dependencies are a sufficient condition for
lossless decomposition.
The dependencies are a necessary condition only if all constraints are
functional dependencies.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 26
Dependency Preservation
Let Fi be the set of dependencies F + that include only attributes in Ri.
A decomposition is dependency preserving, if
(F1  F2  …  Fn )+ = F +
If it is not, then checking updates for violation of functional
dependencies may require computing joins, which is
expensive.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 27
Example
R = (A, B, C )
F = { A → B
B → C }
Key = {A}
R is not in BCNF
Decomposition R1 = (A, B), R2 = (B, C)
R1 and R2 in BCNF
Lossless decomposition
Dependency preserving
Alternative decomposition R1 = (A, B), R2 = (A, C)
Lossless decomposition?
R1  R2 = {A} and A → AB
Dependency preserving?
We cannot check B → C without computing R1  R2
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 28
First Normal Form
Domain is atomic if its elements are indivisible units
Examples of non-atomic domains:
Set of names, composite attributes
Identification numbers like CS101 that can be broken up into
parts (department code and course id)
A relational schema R is in first normal form if the domains of all
attributes of R are atomic
Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
Example
Set of accounts stored with each customer, and set of owners
stored with each account
We assume all relations are in first normal form
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 29
First Normal Form (Cont.)
Atomicity is a property of how the elements of the domain are used.
Example
Strings would normally be considered indivisible
Suppose that students are given roll numbers which are strings of
the form CS0012 or EE1127
If the first two characters are extracted to find the department,
the domain of roll numbers is not atomic.
Doing so is a bad idea:
leads to encoding of information in application program
rather than in the database.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 30
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
 → 
where   R and   R, at least one of the following holds:
 →  is trivial (i.e.,   )
 is a super key for R (i.e.,  → R)
Example schema not in BCNF:
instr_dept (ID, name, salary, dept_name, building, budget )
because dept_name → building, budget holds on instr_dept,
but dept_name is not a super key.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 31
Decomposing a Schema into BCNF
Suppose we have a schema R
A non-trivial dependency  →  causes a violation of BCNF, so
we decompose R into:
R1 = (    )
R2 = ( R - (  -  ) )
In our example, dept_name → building, budget
 = dept_name
 = building, budget
and inst_dept is replaced by
R1 = (    ) = ( dept_name, building, budget )
R2 = ( R - (  -  ) ) = ( ID, name, salary, dept_name )
instr_dept (ID, name, salary, dept_name, building, budget )
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 32
BCNF and Dependency Preservation
Constraints, including functional dependencies, are costly to check in
practice unless they pertain to only one relation
A decomposition is dependency preserving
If it is sufficient to test only dependencies on each individual
relation of the decomposition in order to ensure that all functional
dependencies hold.
Because it is not always possible to achieve both BCNF and
dependency preservation, we consider a weaker normal form, known
as third normal form.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 33
Third Normal Form
A relation schema R is in third normal form (3NF) if for all:
 →  in F+
where   R and   R, at least one of the following holds:
 →  is trivial (i.e.,   )
 is a super key for R
Each attribute A in  –  is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF, it is in 3NF
Since in BCNF one of the first two conditions above must hold.
Third condition is the minimal relaxation of BCNF to ensure
dependency preservation.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 34
BCNF and Dependency Preservation
It is not always possible to get a BCNF decomposition that is
dependency preserving.
Relation dept_study_advisor (s_ID, a_ID, dept_name)
F = { s_ID, dept_name → a_ID,
a_ID → dept_name }
Two candidate keys = s_ID, dept_name and
s_ID, a_ID
dept_study_advisor is not in BCNF
Any decomposition of dept_study_advisor will fail to preserve
s_ID, dept_name → a_ID
This implies that testing for s_ID, dept_name → a_ID
requires a join.
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 35
3NF Example
Relation dept_study_advisor:
dept_study_advisor (s_ID, a_ID, dept_name)
F = {s_ID, dept_name → a_ID,
a_ID → dept_name}
Two candidate keys:
s_ID, dept_name,
a_ID, s_ID
dept_study_advisor is in 3NF
s_ID, dept_name → a_ID
s_ID, dept_name is a superkey
a_ID → dept_name
a_ID is not a superkey
dept_name is contained in a candidate key
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 36
Redundancy in 3NF
There is some redundancy in this schema
Example of problems due to redundancy in 3NF
dept_study_advisor (s_ID, a_ID, dept_name)
F = {s_ID, dept_name → a_ID,
a_ID → dept_name}
repetition of information (e.g., the relationship Jane, FI)
e.g., (a_ID, dept_name)
need to use null values (e.g., to represent the relationship
Karol, ESF where there is no corresponding value for s_ID).
e.g., a relation (a_ID, dept_name) must exist if there is no other
separate relation mapping instructors to departments
s_ID
Adam
Bob
Joe
null
a_ID
Jane
Jane
Jane
Karol
dept_name
FI
FI
FI
ESF
BKM_DATS, Vlastislav Dohnal, FI MUNI, 2023 37
Second Normal Form
A functional dependency  →  is called a partial dependency
if there is a subset  of , i.e.,   , such that  → .
We say that  is partially dependent on .
A relation R is in second normal form (2NF) if it is in 1NF and
each attribute A in R meets one of the following:
A appears in a candidate key;
A is not partially dependent on any candidate key.
i.e., A is dependent on a complete candidate key, but it may be
a transitive dependence.
Every 3NF is in 2NF.