PUBLIC Radim Benek, SAP Labs Czech Republic 2022 Introduction to In-memory Column-based Databases 2PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ SAP Labs Czech Republic 3PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ SAP Labs Czech Republic 4PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Introduction ▪ Changes in Hardware ▪ Data Layout ▪ Dictionary Encoding ▪ Compression ▪ Delete, Insert, Update ▪ Tuple Reconstruction ▪ Scan Performance ▪ Demo Agenda Introduction 7PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Achievements of column store in-memory database: ▪ 150 sensors ▪ 2GB of data in one lap ▪ 3TB in a single race ▪ „SAP HANA enables McLaren existing systems to process these data 14 000 times faster then before.“ ▪ „Analysis that previously took almost a week to process, can be completed in a span of a pit stop.“ Introduction Changes in Hardware 9PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Multi-core CPU introduction (32 cores/CPU) ▪ Multi-CPU boards massively used (8 CPUs/board) ▪ CPU cache grows ▪ RAM capacity grows ▪ RAM speed grows ▪ New interfaces (QPI, HT) → Enormous bandwidth and performance potential HDDs still dominated overall performance and design mindset Changes in Hardware Evolution Data Layout 11PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Data Layout Database Data Layouts ▪ What are the most common layouts of relational data in main memory? – For each layout we present the pros and cons of their approach 12PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Data Layout Row Data Layouts ▪ Data is stored tuple-wise ▪ Leverage co-location of attributes for a single tuple ▪ Low cost for reconstruction, but higher cost for sequential scan of a single attribute 13PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Data Layout Columnar Data Layouts ▪ Data is stored attribute-wise ▪ Leverage sequential scan-speed in main memory ▪ Tuple reconstruction is expensive Dictionary Encoding 15PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ 8 billion humans ▪ Attributes: – first name – last name – gender – country – city – birthday → 200 byte per tuple ▪ Each attribute is dictionary encoded Dictionary Encoding Example 16PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Main memory access is the new bottleneck ▪ Compression reduces number of I/O operations to main memory ▪ Operation directly on compressed data ▪ Offsetting with bit-encoded fixed-length data types ▪ Based on limited value domain Dictionary Encoding Motivation 17PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Dictionary Encoding Sample Data Table: world_population 18PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ A column is split into a dictionary and an attribute vector ▪ Dictionary stores all distinct values with implicit valueID ▪ Attribute vector stores valueIDs for all entries in the column ▪ Position is stored implicitly ▪ Enables offsetting with bit-encoded fixed-length data types Dictionary Encoding Dictionary Encoding a Column 19PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Search for Attribute Value (i.e. retrieve all persons with fname “Mary”) 1. Search valueIDs for requested value (“Mary”) 2. Scan Attribute Vector for valueID (“24”) 3. Replace valueIDs in result with corresponding dictionary value Dictionary Encoding Querying Data using Dictionaries 20PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Dictionary entries are sorted either by their numeric value or lexicographically – Dictionary lookup complexity: O(log(n)) instead of O(n) ▪ Dictionary entries can be compressed to reduce the amount of required storage Dictionary Encoding Sorted Dictionary 21PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Dictionary Encoding Data Size Examples Column Cardi-nality Bits Needed Item Size Plain Size Size with Dictionary (Dictionary + Column) Compression Factor First names 5 millions 23 bit 50 Byte 400GB 250MB + 23GB ≈ 17 Last names 8 millions 23 bit 50 Byte 400GB 400MB + 23GB ≈ 17 Gender 2 1 bit 1 Byte 8GB 2b + 1GB ≈ 8 City 1 million 20 bit 50 Byte 400GB 50MB + 20GB ≈ 20 Country 200 8 bit 47 Byte 376GB 9.4kB + 8GB ≈47 Birthday 40000 16 bit 2 Byte 16GB 80kB + 16GB ≈ 1 Totals 200 Byte ≈ 1.6TB ≈ 92GB ≈ 17 Compression 23PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Heavy weight vs. light weight techniques ▪ Focus on light weight techniques for databases ▪ For attribute vector – Prefix encoding – Run length encoding – Cluster encoding – Sparse encoding – Indirect encoding ▪ For dictionary – Delta compression for strings – Other data types are stored as sorted arrays Compression Compression Techniques 24PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ 200 countries = 8 bit ▪ 1 million cities = 20 bit ▪ 100 different 2nd nationalities = 7 bit ▪ 5 million first names = 23 bit Compression Example Table 25PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Used if the column starts with a long sequence of the same value ▪ One predominant value in a column and the remaining values are mostly unique or have low redundancy Example: country column, table sorted by population of country Compression Prefix Encoding 26PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Replace sequence of the same value with a single instance of the value and a. Its number of occurrences b. Its start position (shown below) ▪ Variant b) speeds up access compared to a) Compression Run Length Encoding Direct access! 27PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Attribute vector is partitioned into N blocks of fixed size (typically 1024) ▪ If a cluster contains only a single value, it is replaced by a single occurrence of this value ▪ A bit vector of length N indicates which clusters were replaced by a single value ▪ Example: city column, table sorted by country, city – Cluster size: 1024 elements →7.8 mio blocks – Worst case assumption: 1 uncompressible block per city – Uncompressible blocks: 1 mio × 1024 × 20 bit 2441 MB – Compressible blocks: (7.8 - 1) mio × 20 bit + 16 MB – Bit vector: 7.8 million × 1 bit + 1 MB ≈2.4 GB Compression Cluster Encoding No direct access! Compute position via bit vector. 28PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Remove the value v that appears most often ▪ A bit vector indicates at which positions v was removed from the original sequence Example: 2nd nationality column, regardless of sorting order of table Compression Sparse Encoding 29PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Sequence is partitioned into N blocks of size S (typically 1024) ▪ If a block contains only a few distinct values an additional dictionary is used to encode the values in that block ▪ Additionally: links to the new dictionaries + blocks that have a dictionary Example: fname column, table sorted by country Direct access! Compression Indirect Encoding 30PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ For sorted string values ▪ Block-‐wise compression (typically 16 strings per block) Compression Delta Encoding for Dictionary Dictionary: 1 million cities à 49 byte ≈ 46.7 MB 31PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Most compression techniques require sorted sets, but a table can only be sorted by one column or cascading ▪ No direct access to rows in some cases, but offset has to be computed Compression Keep in Mind Tuple Reconstruction 33PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Tuple Reconstruction Accessing a record in a row store ▪ All attributes are stored consecutively ▪ 200 byte → 4 cache accesses à 64 byte → 256 byte ▪ Read with 4MB/ms/core ▪ → ≈ 0.064 μs with 1 core Data loaded and used Data loaded but not used 34PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Tuple Reconstruction Virtual record IDs ▪ All attributes are stored in separate columns ▪ Implicit record Ids are used to reconstruct rows 35PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Tuple Reconstruction Virtual record IDs ▪ 1 cache access for each attribute ▪ 6 cache accesses à 64 byte → 384 byte ▪ Read with 4MB/ms/core ▪ → ≈ 0.096 μs with 1 core Data loaded and used Data loaded but not used Scan Performance 37PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Scan Performance ▪ 8 billion humans ▪ Attributes: – first name – last name – gender – country – city – birthday → 200 byte per tuple ▪ Question: How many women, how many men? ▪ Assumed scan speed: 4MB/ms/core 38PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Scan Performance Row Store – Layout 39PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Scan Performance Row Store – Layout ▪ Table size: 8 billion tuples × 200 bytes per tuple ≈ 1.6 TB 40PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Scan Performance ▪ Table size: 8 billion tuples × 200 bytes per tuple ≈ 1.6 TB ▪ Scan through all rows with 4MB/ms/core → 400 s with 1 core Row Store – Full Table Scan Data loaded and used Data loaded but not used 41PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Scan Performance Row Store – Stride Access „Gender“ Data loaded and used Data loaded but not used ▪ 8 billion cache accesses à 64 byte ≈ 512 GB ▪ Read with 4MB/ms/core → 128 s with 1 core 42PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Scan Performance Column Store – Full Column Scan „Gender“ Data loaded and used Data loaded but not used ▪ Size of attribute vector “Gender”: 8 billion tuples × 1 bit per tuple ≈ 1 GB ▪ Scan through column with 4MB/ms/core →0.25 s with 1 core 43PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Scan Performance How many women, how many men? Row Store Column Store Full table scan Stride access Time in seconds 400 128 0.25 Delete, Insert, Update DELETE 46PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Physical DELETE – Removed tuple is removed from database and you cannot access it anymore ▪ Logical DELETE – Validity of this tuple is set to non-valid and this tuple can be accessed in historic queries or reporting ▪ Operation DELETE is very expensive to perform ▪ SQL-Syntax: DELETE FROM table_name WHERE attribute_name = some_value Database Operations DELETE 47PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Remove Jane Doe from the database table Database Operations DELETE - example 48PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Remove Jane Doe from the database table Database Operations DELETE - example 49PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Remove Jane Doe from the database table Database Operations DELETE - example 50PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Remove Jane Doe from the database table Database Operations DELETE - example INSERT 52PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ INSERT without new dictionary entry – New entry is already in dictionary, new valueID is appended to the attribute vector ▪ INSERT with new dictionary entry – New entry is added to the dictionary, dictionary is sorted, valueIDs are updated in attribute vector, new valueID is appended to the attribute vector ▪ SQL-Syntax: INSERT INTO table_name VALUES (value1,value2) Database Operations INSERT 53PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (Without New Dictionary Entry) AV – Attribute Vector D – Dictionary 54PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (Without New Dictionary Entry) 1. Look-up on dictionary → entry found AV – Attribute Vector D – Dictionary 55PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (Without New Dictionary Entry) 1. Look-up on dictionary → entry found 2. Append valueID to attribute vector AV – Attribute Vector D – Dictionary 56PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) AV – Attribute Vector D – Dictionary 57PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found AV – Attribute Vector D – Dictionary 58PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found 2. Append new value to dictionary AV – Attribute Vector D – Dictionary 59PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found 2. Append new value to dictionary 3. Append valueID to attribute vector AV – Attribute Vector D – Dictionary 60PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) AV – Attribute Vector D – Dictionary 61PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found AV – Attribute Vector D – Dictionary 62PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found 2. Append new value to dictionary AV – Attribute Vector D – Dictionary 63PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found 2. Append new value to dictionary 3. Sort Dictionary AV – Attribute Vector D – Dictionary 64PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found 2. Append new value to dictionary 3. Sort Dictionary 4. Change valueIDs in attribute vector AV – Attribute Vector D – Dictionary 65PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ INSERT INTO world_population VALUES (Karen, Schulze, f, GER, Rostock, 06-20-2014) Database Operations INSERT – example (With New Dictionary Entry) 1. Look-up on dictionary → no entry found 2. Append new value to dictionary 3. Sort Dictionary 4. Change valueIDs in attribute vector 5. Append new valueID to attribute vector AV – Attribute Vector D – Dictionary UPDATE 67PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Combination of DELETE and INSERT operation ▪ SQL-Syntax: UPDATE world_population SET city = „Bamberg“ WHERE fname = „Hanna“ AND lname = „Schulze“ Database Operations UPDATE 68PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ UPDATE world_population SET city = „Bamberg“ WHERE lname = „Schulze“ Database Operations UPDATE – example 1. Look-up „Bamberg“ in dictionary → entry not found 69PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ UPDATE world_population SET city = „Bamberg“ WHERE lname = „Schulze“ Database Operations UPDATE – example 1. Look-up „Bamberg“ in dictionary → entry not found 2. Append new value „Bamberg“ to dictionary 70PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ UPDATE world_population SET city = „Bamberg“ WHERE lname = „Schulze“ Database Operations UPDATE – example 1. Look-up „Bamberg“ in dictionary → entry not found 2. Append new value „Bamberg“ to dictionary 3. Reorganize dictionary 71PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ UPDATE world_population SET city = „Bamberg“ WHERE lname = „Schulze“ Database Operations UPDATE – example 1. Look-up „Bamberg“ in dictionary → entry not found 2. Append new value „Bamberg“ to dictionary 3. Reorganize dictionary 4. Replace old values with new values in attribute vector (expensive) Demo 73PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ System QM0 – 48 TB / 1100 CPUs ▪ System HANA Express edition (VM) – 16 GB / 4 CPUs Demo Performance Table Store Rows Size Time ACDOCA_C Column 110 million 5 GB Table Store Rows Size Time 74PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ System QM0 – 48 TB / 1100 CPUs ▪ System HANA Express edition (VM) – 16 GB / 4 CPUs Demo Performance Table Store Rows Size Time ACDOCA_C Column 110 million 5 GB 1,8 s Table Store Rows Size Time 75PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ System QM0 – 48 TB / 1100 CPUs ▪ System HANA Express edition (VM) – 16 GB / 4 CPUs Demo Performance Table Store Rows Size Time ACDOCA_C Column 110 million 5 GB 1,8 s ACDOCA_R Row 110 million 240 GB Table Store Rows Size Time 76PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ System QM0 – 48 TB / 1100 CPUs ▪ System HANA Express edition (VM) – 16 GB / 4 CPUs Demo Performance Table Store Rows Size Time ACDOCA_C Column 110 million 5 GB 1,8 s ACDOCA_R Row 110 million 240 GB 22,5 s Table Store Rows Size Time 77PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ System QM0 – 48 TB / 1100 CPUs ▪ System HANA Express edition (VM) – 16 GB / 4 CPUs Demo Performance Table Store Rows Size Time ACDOCA_C Column 110 million 5 GB 1,8 s ACDOCA_R Row 110 million 240 GB 22,5 s ACDOCA Column 19,5 billion 1,3 TB Table Store Rows Size Time 78PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ System QM0 – 48 TB / 1100 CPUs ▪ System HANA Express edition (VM) – 16 GB / 4 CPUs Demo Performance Table Store Rows Size Time ACDOCA_C Column 110 million 5 GB 1,8 s ACDOCA_R Row 110 million 240 GB 22,5 s ACDOCA Column 19,5 billion 1,3 TB 139 s Table Store Rows Size Time 79PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ System QM0 – 48 TB / 1100 CPUs ▪ System HANA Express edition (VM) – 16 GB / 4 CPUs Demo Performance Table Store Rows Size Time ACDOCA_C Column 110 million 5 GB 1,8 s ACDOCA_R Row 110 million 240 GB 22,5 s ACDOCA Column 19,5 billion 1,3 TB 139 s ACDOCA_sm Column 5 million 140 MB 0,5 s CDHR Column 31 million 1,3 GB 12,4 s CDPOS Column 730 million 44 GB Table Store Rows Size Time ACDOCA_sm Column 5 million 140 MB 0,9 s 80PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Demo Columns compression 81PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ SAP HANA, express edition is a database and application development platform. You can run it for free (up to 32GB of RAM) on your laptop and start building new apps. SAP HANA, Express Edition 82PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ SAP HANA Cloud trial is a trial version of HANA DB. You can run it for free with following resources: 32GB of RAM, 120GB Storage, 2vCPU. SAP HANA Cloud 83PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ SAP HANA Cloud Resources 85PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ ▪ Plattner, Hasso. "In-Memory Data Management 2015" OpenHPI. Hasso-Plattner-Institute, 07 Sept. 2015. Web. 13 July 2017. https://open.hpi.de/courses/imdb2015 ▪ SAP HANA Cloud https://developers.sap.com/topics/hana.html ▪ SAP HANA trial: https://www.sap.com/products/hana/express-trial.html ▪ SAP HANA Academy Videos: https://www.youtube.com/user/saphanaacademy ▪ SAP Help Portal - SAP HANA Platform: https://help.sap.com/viewer/product/SAP_HANA_PLATFORM/ Resources 86PUBLIC© 2021 SAP SE or an SAP affiliate company. All rights reserved. ǀ Appendix Radim Benek Development Expert, AIS Financials Brno, SAP Labs Czech Republic SAP CR, spol. s r.o. Holandská 2/4 639 00 Brno radim.benek@sap.com linkedin.com/in/radimbenek/ Thank you.