MDA104: Tutorial 3 Functional Dependencies Vlastislav Dohnal Functional dependency: Definition ◼ What does functional dependence express? faculty → address ◼ Is this dependency satisfied in the following relations? ◼ If the dependency holds on to the relational schema, then each instance satisfies that dependency. ❑ i.e., the definition of integrity constraint. MDA104, Vlastislav Dohnal, FI MUNI, 2024 faculty address FI Botanická 68a, 602 00 Brno PřF Kotlářská 267/2, 611 37 Brno department faculty address area Geogr PřF Kotlářská 267/2, 611 37 Brno 250 KIT FI Botanická 68a, 602 00 Brno 200 KPGD FI Botanická 68a, 602 00 Brno 180 KTP FI Botanická 68a, 602 00 Brno 210 Creation of func. deps and their validity ◼ Consider the registry of enrolled courses with the following requirements: ❑ The student has a name and his or her own unique ID; ❑ The course has a title, number of credits obtained when passed, and its own unique code; ❑ The student can enroll in more courses; ❑ More students can enroll in one course; ❑ The student has a certain completion selected for the course he or she has signed up for. ◼ Task: Formulate functional dependencies. MDA104, Vlastislav Dohnal, FI MUNI, 2024 Creation of func. deps and their validity (solution) ◼ Functional dependencies: ❑ student_id → first_name, last_name ❑ course_code → title, credits ❑ student_id, course_code → completion_type ◼ Commentary: ❑ There is many-to-many relationship between students and courses. ❑ student_id → course_code ◼ It models “one course can have multiple students”, but a student cannot enroll into multiple courses. ❑ course_code → student_id ◼ vice versa ( “one students can enroll more courses, but just one student can join a course!) MDA104, Vlastislav Dohnal, FI MUNI, 2024 Creation of func. deps and … (solution) (cont.) ◼ The issue is how to handle many-to-many relationship using functional dependencies: ❑ student_id, course_code → completion_type ❑ If there is no “completion_type”, then we cannot have the left-hand side only, ◼ i.e. “student_id, course_code → Ø” ❑ Instead, we may not write any dep. ◼ and the process of decomposition will leave “student_id, course_code” in a table. MDA104, Vlastislav Dohnal, FI MUNI, 2024 Inference of additional dependencies ◼ Armstrong axioms ❑ Reflexivity: ❑ Augmentation: ❑ Transitivity: ◼ Consider the following dependencies F = { student_id → first_name student_id → last_name student_id → avg_grade course_code → title, credits student_id, course_code → completion_type study_program, avg_grade → scholarship_amount } ◼ Task: Prove that the following dependencies are satisfied: ❑ student_id → first_name, last_name ❑ course_code → title ❑ student_id, study_program → scholarship_amount MDA104, Vlastislav Dohnal, FI MUNI, 2024 Inference of add…: Solution ◼ Solution: ❑ student_id → first_name, last_name ◼ augmentation of student_id → first_name with last_name: student_id, last_name → first_name, last_name ◼ augmentation of student_id → last_name with student_id: student_id → last_name, student_id ◼ transitivity of both these gets the needed. ❑ course_code → title ◼ reflexivity: title, credits → title ◼ transitivity: course_code → title, credits and title, credits → title gets the needed. ❑ student_id, study_program → scholarship_amount ◼ augmentation of student_id → avg_grade with study_program: student_id, study_program → avg_grade, study_program ◼ transitivity of the augmented f.d. and study_program, avg_grade → scholarship_amount gets the needed. MDA104, Vlastislav Dohnal, FI MUNI, 2024 Schema keys ◼ Revision: ❑ K is the super-key for the relation schema R, if and only if K → R ◼ K is the candidate key R if and only if it holds ❑ K → R, and at the same time ❑ for none α ⊂ K, the functional dep. α → R holds ◼ Consider the following relation schema and functional dependencies: r(student_id, first_name, last_name, course_code, title, credits, completion_type) F = { student_id → first_name, last_name course_code → title, credits student_id, course_code → completion_type } ❑ Task: Resolve the primary key. MDA104, Vlastislav Dohnal, FI MUNI, 2024 Revision: Attribute set closure algorithm ◼ For a set of attributes , we define the closure of it, denoted as +, as a set of attributes that are functionally dependent according to the set of functional dependencies F. ◼ Algorithm to get +: result := ; previousResult := null; while (result != previousResult ) do { previousResult := result; for each  →  in F do { if   result then result := result   ; } } ◼ Use this algorithm to verify the choice of keys from the previous task. MDA104, Vlastislav Dohnal, FI MUNI, 2024 Schema keys: Solution ◼ Solution to “Resolve the primary key”. ❑ We need to identify all candidate keys, so then we may pick one as the primary key. ◼ Candidate key is the minimal super key, so we start with individual attributes. ❑ No individual attribute “implies” R. ◼ So, a pair of attributes… Possible pairs can be formed by the attributes on the left-hand sides of F.D.s. -> There is only one: ❑ {student_id, course_code}+ = R, so this holds: ❑ student_id, course_code → student_id, first_name, last_name, course_code, title, credits, completion_type) MDA104, Vlastislav Dohnal, FI MUNI, 2024 First normal form (1NF) ◼ Definition: ❑ The relation schema R is in first normal form if the domains of all attributes are atomic. ◼ Task: Are the following relations in 1NF? ❑ If not, then convert them to 1NF. MDA104, Vlastislav Dohnal, FI MUNI, 2024 course_code title credits FI:PB154 Základy databázových systémů 3 PřF:Bi8150 Evoluční biologie 3 address course employee address phone_numbers Zdeněk Plotní 30, Brno {549 49 1810, 736 09 1809} Petr Horní 12, Brno {727 49 1911, 549 49 1911} First normal form (1NF) ◼ Solution: ❑ Attributes may not contain more than “one” value. ◼ Address(employee, street, house_no, city) ❑ address decomposed ◼ Phone(employee, phone_number) ❑ phone_numbers (originally an array) took into a new table (one row per phone number) ◼ Course(faculty_code, cource_code, title, credits) ❑ the composed value of course_code decomposed and faculty code extracted into a new attribute. MDA104, Vlastislav Dohnal, FI MUNI, 2024 Second normal form (2NF) ◼ Definition: A relational schema R is in 2NF if it is in 1NF, and for each attribute A of R, any of the following statements are true: ❑ A is part of a candidate key (CK) (so trivially CK → A), or ❑ A is not partially dependent on any candidate key. ◼  →  is partial dependency, if   ⊂  such that  →  ◼ Task: Decide if the relation thesis is in 2NF This relation records data necessary for the application for the defense of the final thesis. A student can study more than one program, then he or she has their own final thesis for each study. MDA104, Vlastislav Dohnal, FI MUNI, 2024 thesis F = { student_id, program → spec, supervisor, thesis_title spec → guarantor, program supervisor → supervisor_dept student_id → first_name, last_name } student_id first_name last_name program spec thesis_title supervisor supervisor_dept guarantor 12345 Jan Novák Inf. Mat. inf. Teorie sčítání Brázdil KTP Hliněný 67890 Lenka Šťastná Apl. inf. Bioinf. Život brouka Lexa KTP Brim Second normal form (2NF) ◼ Solution: Decompose into 2NF ❑ State candidate keys: A. student_id, program B. student_id, spec ❑ first_name, last_name depend on a part of a candidate key, so break 2NF ◼ since student_id → first_name, last_name ❑ guarantor depends on a part of a candidate key, so breaks 2NF. ◼ since spec → guarantor, program ❑ supervisor_dept is OK ◼ since supervisor → supervisor_dept and supervisor depends on a whole candidate key. ◼ by analogy thesis_title ❑ the others are part of a candidate key ◼ Resulting relations: ❑ student(student_id, first_name, last_name) ❑ specialization(spec, program, guarantor) ❑ thesis(student_id, program, spec, thesis_title, supervisor, supervisor_dept) MDA104, Vlastislav Dohnal, FI MUNI, 2024 Third normal form (3NF) ◼ Definition: The relation schema R is in 3NF, if it is in 2NF and for each A in R, any of the statements hold: ❑ A is part of a candidate key, or ❑ A is not transitively dependent on any candidate key. ◼ A is transitively dependent on , if  → ,  → A, and  → , where A is not part of  either . MDA104, Vlastislav Dohnal, FI MUNI, 2024 Third normal form (3NF) ◼ Task: Decide if the following relations are in 3NF MDA104, Vlastislav Dohnal, FI MUNI, 2024 thesis student_id first_name last_name 12345 Jan Novák 67890 Lenka Šťastná student F2 = { student_id → first_name, last_name } student_id program spec thesis_title supervisor supervisor_de pt 12345 Inf. Mat. inf. Teorie sčítání Brázdil KTP 67890 Apl. Inf. Bioinf. Život brouka Lexa KTP F1 = { student_id, program → spec, supervisor, thesis_title spec → program supervisor → supervisor_dept } spec program guarantor Mat. inf. Inf. Hliněný Bioinf. Apl. Inf. Brim specialization F3 = { spec → program, guarantor } Third normal form (3NF) ◼ Solution: Decide if the following relations are in 3NF ◼ student(student_id, first_name, last_name) ❑ trivially in 3NF, since student_id → first_name, last_name ◼ specialization(spec, program, guarantor) ❑ trivially in 3NF, since spec → program, guarantor ◼ thesis(student_id, program, spec, thesis_title, supervisor, supervisor_dept) ❑ not in 3NF because of supervisor_dept, so decompose into: ❑ thesis(student_id, program, spec, thesis_title, supervisor) ❑ supervisor(supervisor, supervisor_dept) MDA104, Vlastislav Dohnal, FI MUNI, 2024 * final relations are in bold Boyce-Codd normal form (BCNF) ◼ Definition: The relation schema R is in BCNF w.r.t. F, if it is in 1NF and for each f. dep.  →  in F+ the following holds: ❑  →  is a trivial dependency (i.e.,  ⊆ ), or ❑  is a super-key in R. Usage: If we verify that a relation is in BCNF, it is sufficient to verify the listed properties only for  →  in F. After the decomposition, it is necessary to consider F+ to verify the resulting relations. ◼ Task: Decide whether the following relation is in BCNF MDA104, Vlastislav Dohnal, FI MUNI, 2024 thesis student_id program spec thesis_title supervisor 12345 Inf. Mat. inf. Teorie sčítání Brázdil 67890 Apl. Inf. Bioinf. Život brouka Lexa F = { student_id, program → spec, supervisor, thesis_title spec → program } Decomposition to BCNF ◼ Algorithm: Let be R not in BCNF. Then there is at least one non-trivial functional dependence  →  such that  is not a super-key of R. Relation R gets replaced by: ❑ R1 = ( ∪ ) ❑ R2 = (R – ( – )) Caution: After executing the algorithm, it is necessary to check whether R1, R2 already meet BCNF – see the previous procedure, this time it is necessary to verify for all dependencies in F+! ◼ Task: Convert the following relation to BCNF. Does the result preserve all functional dependencies? MDA104, Vlastislav Dohnal, FI MUNI, 2024 thesis student_id program spec thesis_title supervisor 12345 Inf. Mat. inf. Teorie sčítání Brázdil 67890 Apl. Inf. Bioinf. Život brouka Lexa F = { student_id, program → spec, supervisor, thesis_title spec → program } Decomposition to BCNF ◼ Solution: Convert the following relation to BCNF. ❑ Check each f.d. ❑ student_id, program → spec, supervisor, thesis_title ◼ ok, since student_id, program is a candidate key. ❑ spec → program ◼ break BNCF, since spec is not a super-key. ◼ Extract program for the relation and create a new table. ◼ thesis(student_id, spec, thesis_title, supervisor) ❑ F={student_id, spec → thesis_title, supervisor} ◼ spec_program(spec, program) (Ensures no redundancy in storing spec and program pair, as was in original thesis.) ❑ F={spec → program} ◼ Solution: Does the result preserve all functional dependencies? ❑ No, student_id, program → spec cannot be verify unless join of thesis and spec_program is done. MDA104, Vlastislav Dohnal, FI MUNI, 2024 Final result in BCNF ◼ student(student_id, first_name, last_name) ❑ F={student_id → first_name, last_name} ◼ supervisor(supervisor, supervisor_dept) ❑ F={supervisor → supervisor_dept} ◼ thesis(student_id, spec, thesis_title, supervisor) ❑ F={student_id, spec → thesis_title, supervisor} ◼ specialization(spec, program, guarantor) (It removes redundancy by factoring out dependencies related to spec.) ❑ F={spec → program, guarantor) ◼ spec_program(spec, program) (Ensures no redundancy in storing spec and program pair, as was in original thesis.) ❑ F={spec → program} ◼ Mind student_id, program → spec is not preserved! (Attributes are not part of one relation.) MDA104, Vlastislav Dohnal, FI MUNI, 2024 Example (voluntary) ◼ Design a database in BCNF/3NF according to the following requirements: ❑ We keep records of books and their physical copies in the library. ❑ Each book has its own unique ISBN, title, one or more authors (the order of authors matters), publication number and category. ❑ We keep records of the first and last names of the authors of the books. ❑ Each book can be present in the library in the form of several copies. For each copy, the library keeps track of its condition ("new", "damaged", "under repair", ...). ❑ Books in the "new" category can be borrowed for a maximum of a week, books in the "archival materials" category are not borrowed, others can be borrowed for a month. ◼ i.e., there are special categories of books that have a loan period limited to a specific period. MDA104, Vlastislav Dohnal, FI MUNI, 2024