PA152: Efficient Use of DB
4. Indexing
Vlastislav Dohnal
PA152, Vlastislav Dohnal, FI MUNI, 2024 2
Indexing
◼ Reason: faster access to records
than sequential (table) scan
◼ Variants:
Conventional indexes
B-tree
Hashing
? recordkey value
file
key value
……
PA152, Vlastislav Dohnal, FI MUNI, 2024 3
Terminology
◼ Sequential file
 Index-sequential file
◼ Index
 Primary index
 Secondary index
 Dense index
 Sparse index
 Multilevel index
◼ Search key
 Primary key
PA152, Vlastislav Dohnal, FI MUNI, 2024 4
File
Sequential file
20
10
40
30
60
50
80
70
100
90
Index-sequential file
20
10
40
30
60
50
80
70
100
90
10
30
50
70
90
PA152, Vlastislav Dohnal, FI MUNI, 2024 5
Index
◼ Collection of items:
<key value, pointer to record/block>
20
10
40
30
60
50
80
70
Sparse index
10
30
50
70
Dense index
10
20
30
40
50
60
70
80
PA152, Vlastislav Dohnal, FI MUNI, 2024 9
Problem of Duplicate Keys
◼ Index type
dense index?
sparse index?
10
10
20
10
30
20
30
30
45
40
File
PA152, Vlastislav Dohnal, FI MUNI, 2024 27
Secondary Index
◼ File ordered by another key
i.e., index created for different key than the
primary file
Or the file is not ordered at all
◼ Which type:
Dense or sparse?
PA152, Vlastislav Dohnal, FI MUNI, 2024 28
Secondary Index
◼ Must use pointers to records!
Search key
50
30
70
20
40
80
10
100
60
90
Dense index
10
20
30
40
50
60
70
...
10
50
90
...
Sparse
for higher levels
PA152, Vlastislav Dohnal, FI MUNI, 2024 29
Secondary Index: Duplicate Keys
◼ Replicated in index
Increases
◼ space
requirements
◼ access time
10
20
40
20
40
10
40
10
40
30
10
10
10
20
20
30
40
40
40
40
...
PA152, Vlastislav Dohnal, FI MUNI, 2024 30
Secondary Index: Duplicate Keys
◼ Index item contains list of pointers
But the item is of variable length
10
20
40
20
40
10
40
10
40
30
10
40
30
20
PA152, Vlastislav Dohnal, FI MUNI, 2024 31
Secondary Index: Duplicate Keys
◼ Shift the variable-length list to “buckets”
10
20
40
20
40
10
40
10
40
30
10
20
30
40
50
60
...
buckets
index blocks
file blocks
PA152, Vlastislav Dohnal, FI MUNI, 2024 32
Secondary Index: Buckets with Pointers
◼ Advantage: a list of records for querying
Evaluate more selection constrains without
accessing records
◼ Example:
Relation
◼ employee(name, department, room)
Indexes:
◼ name – primary index
◼ department – secondary index
◼ room – secondary index
PA152, Vlastislav Dohnal, FI MUNI, 2024 33
Secondary Index: Duplicate Keys
◼ Query employees of Toys dept. in room G243
◼ Intersect buckets
 To get pointers to matching employee records
 Also used in text information retrieval
Toys G243
File
(employee)Index (department) Index (room)
Example: Text Information Retrieval
◼ “Full-text” index for documents
◼ Split documents into words
◼ Build an inverted file
over all documents
i.e., a file of records <word; [docId, docId2, …]>
PA152, Vlastislav Dohnal, FI MUNI, 2024 34
Document 1 Word List for Doc1
Caesar and
Brutus are
ambitious.
•Caesar
•Brutus
•ambitious
1. Split to words
2. Stemming
3. Ignore words in stop-list
Example: Text Information Retrieval
◼ Inverted file
◼ Retrieve docs containing Brutus & Caesar
Read posting lists for Brutus and Caesar
Intersect them
PA152, Vlastislav Dohnal, FI MUNI, 2024 35
Term docID
ambitious 1
brutus 1
brutus 3
capitol 2
caesar 1
caesar 2
Term Posting list of docIDs
ambitious 1
brutus 1, 3
capitol 1
caesar 1, 2
Relational
view
Inverted file
PA152, Vlastislav Dohnal, FI MUNI, 2024 37
B-trees
◼ Another index type
 Sequential order not necessary
 Balanced – max I/Os guarantee
◼ More variants
 B-tree, B+-tree, B*-tree, …
◼ Typically, by saying “B-tree” we mean “B+-tree”!
◼ Origin
 Rudolf Bayer and Ed McCreight invented the B-tree while
working at Boeing Research Labs in 1971 (Bayer & McCreight
1972)
◼ They did not explain what, if anything, the B stands for.
◼ Douglas Comer explains:
 The origin of "B-tree" has never been explained by the authors. As we
shall see, "balanced," "broad," or "bushy" might apply. Others suggest
that the "B" stands for Boeing. Because of his contributions, however, it
seems appropriate to think of B-trees as "Bayer"-trees.
* Source: Wikipedia
PA152, Vlastislav Dohnal, FI MUNI, 2024 38
B+-tree
◼ Example n=4
100
120
150
180
30
3
5
11
30
35
100
101
110
120
130
150
156
179
180
200
… pointers to record in file …
PA152, Vlastislav Dohnal, FI MUNI, 2024 65
B+-tree
◼ B+-tree as file
Leaves store the records themselves.
◼ Duplicate keys
Pointers in leaves = pointers to buckets
◼ i.e., blocks with a list of record pointers with the same
key value
◼ Variable-length key values (e.g., strings)
Store completely → low arity, varying arity, …
Use prefixes (prefix compression)
Lecture’s Takeaways
◼ Principle of indexing
Use of record pointers and their utilization
Handling duplicate keys
◼ Efficiency of B+ trees
also, with respect to query types
◼ Revision of terminology
Dense / sparse index
Primary / secondary index
◼ Clustered / non-clustered index
Covering index
PA152, Vlastislav Dohnal, FI MUNI, 2024 66