Column-family Stores, Cassandra
Lecture 7 of NoSQL Databases (PA195)
David Novak & Vlastislav Dohnal
Faculty of Informatics, Masaryk University, Brno
NoSQL Databases and Data Types
1. Key-value stores:
○ Can store any (text or binary) data
■ often, if using JSON data, additional functionality is available
2. Document databases
○ Structured text data - Hierarchical tree data structures
■ typically JSON, XML
3. Column-family stores
○ Rows that have many columns associated with a row key
■ can be written as JSON
2
Agenda
● Data Model
○ Column families, super columns, two points of view
● Column-family Stores
○ Google BigTable, Cassandra, HBase
● Apache Cassandra in Detail
○ Data model 1.0 vs. 2.0
○ Cassandra Query Language (CQL)
○ Data partitioning, replication
○ Local Data Persistence
○ Query processing, Indexes, Lightweight Transactions
3
Column-family Stores: Basics
● AKA: wide-column, columnar
○ not to confuse with column-oriented RDBMS
● Data model: rows that have many columns
associated with a row key
● Column families are groups of related data
(columns) that are often accessed together
○ e.g., for a customer we typically access all profile
information at the same time, but not customer’s orders
4
Data Model: Column
● Column = the basic data item
○ a 3-tuple consisting of
■ column name
■ value
■ timestamp
○ Can be modeled as follows
{ name: "firstName",
value: "Martin",
timestamp: 12345667890 }
● In the following, we will ignore the timestamp 5
Data Model: Row
● Row: a collection of columns attached to a row key
○ Columns can be added to any row at any time
■ without having to add it to other rows
// row
"martin-fowler" : {
firstName: "Martin",
lastName: "Fowler",
location: "Boston"
}
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/ 6
Data Model: Column Family
● CF = a set of columns containing “related” data
user_id (row key)
column key column key ...
column value column value ...
1
login first_name ...
honza Jan ...
4
login age ...
david 35 ...
5
first_name last_name ...
Irena Holubová ...
...
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 7
Data Model: Column Family (2)
● Column family - example as JSON
// row (cols from the same CF)
"martin-fowler" : {
firstName: "Martin",
lastName: "Fowler",
location: "Boston",
active: "true" }
}
{
// row (columns from a CF)
"pramod-sadalage" : {
firstName: "Pramod",
lastName: "Sadalage",
lastVisit: "2012/12/12"
},
source: Sadalage & Fowler: NoSQL Distilled, 2012 8
Data Model: Super Column Family
● Super column
○ A column whose value is composed of a map of columns
○ Used in some column-family stores (Cassandra 1.0)
● Super column family
○ A column family consisting of super columns
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/ 9
Super Column Family: Addresses
user_id
(row key)
super column key super column key ...
subcolumn key subcolumn key ... subcolumn key subcolumn key ... ...
subcolumn value subcolumn value ... subcolumn value subcolumn value ...
1
home_address work_address
city street ... city street ...
Brno Krásná 5 ... Praha Pracovní 13 ...
4
home_address temporary_address
city street ... city PSČ
Plzeň
sady
Pětatřicátníků 35
... Praha 111 00
...
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 10
Super Column Family: Generic
{ // row
”Cath”: {
”username”: { ”firstname”: ”Cath”, ”lastname”: ”Yoon” },
”address”: { ”city”: ”Seoul”, ”postcode”: ”1234” }
}
// row
”Terry”: {
”username”: { ”firstname”: ”Terry”, ”lastname”: ”Cho” },
”account”: { ”bank”: ”Hana”, ”accounted”: 1234 },
”preferences”: { ”color”: ”blue”, ”style”: ”simple” }
}
}
11
Data Model: Interpretation 1
1. Each column family = a relational table
○ with (a lot of) null values
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/ 12
Data Model: Interpretation 2
2. Column family = a map of maps (nested map)
Map<RowKey, Map<ColumnKey, ColumnValue>>
● Super column family:
Map<RowKey, Map<SuperColumnKey,
Map<ColumnKey, ColumnValue>>>
● The column-family data model can be viewed as
JSON-like documents with restrictions on the format
13
Example: Visualization
source: Sadalage & Fowler: NoSQL Distilled, 201214
Column Family Stores
15
Agenda
● Data Model
○ Column families, super columns, two points of view
● Column-family Stores
○ Google BigTable, Cassandra, HBase
● Apache Cassandra in Detail
○ Data model 1.0 vs. 2.0
○ Cassandra Query Language (CQL)
○ Data partitioning, replication
○ Local Data Persistence
○ Query processing, Indexes, Lightweight Transactions
16
Column Family Stores: Features
● Data model: Column families
● System architecture
○ data partitioning
● Local persistence
○ update log, memory, disk...
● Data replication
○ balancing of the data
● Query processing
○ query language
● Indexes
17
Representatives
Ranked list: http://db-engines.com/en/ranking/wide+column+store 18
BigTable
● Google’s paper:
○ Chang, F. et al. (2008). Bigtable: A Distributed Storage
System for Structured Data. ACM TOCS, 26(2), pp 1–26.
● Proprietary, not distributed outside Google
○ used in Google Cloud Platform
● Data model: column families as defined above
“A table in Bigtable is a sparse, distributed, persistent
multidimensional sorted map. “
(row:string, column:string, time:int64) → string
http://en.wikipedia.org/wiki/BigTable 19
”com.ccn.www”
column family
BigTable: Example
● “BigTable = sparse, distributed, persistent, multi-dimensional
sorted map indexed by (row_key, column_key, timestamp)”
row key row
column column column column column
“contents:html” “param:lang” “param:enc” “a:cnnsi.com” “a:ihned.cz”
column names
<html>...
<html>...
t2
t6
t8
<html>... EN UTF-8 CNN.com CNN
t2 t2 t3 t7
20
HBase
“Open source, non-relational, distributed database modeled
after Google's BigTable. “
● Initial release: 2008
● Implementation: Java
○ Based on Apache Hadoop (HDFS)
● Open source: Apache Software License 2.0
● Systems: Linux, Unix, Windows (only via Cygwin)
“If you have hundreds of millions or billions of rows, then
HBase is a good candidate. “
http://hbase.apache.org/ 21
Cassandra
● Developed at Facebook
○ now under Apache Software License 2.0
● Initial release: 2008 (stable release 3.11 in 2017)
● Written in: Java
● OS: cross-platform
● Operations:
○ CQL (Cassandra Query Language)
○ MapReduce support (can cooperate with Hadoop)
● Professional support by DataStax
○ http://www.datastax.com/
http://cassandra.apache.org/ 22
Agenda
● Data Model
○ Column families, super columns, two points of view
● Column-family Stores
○ Google BigTable, Cassandra, HBase
● Apache Cassandra in Detail
○ Data model 1.0 vs. 2.0
○ Cassandra Query Language (CQL)
○ Data partitioning, replication
○ Local Data Persistence
○ Query processing, Indexes, Lightweight Transactions
23
Cassandra 1.0: Data Model
● Column families, super column families
○ Can define metadata about columns
○ Now denoted as: Thrift API
● Static – similar to a relational database table
○ Rows have the same set of columns
■ Not required to have all of the columns set
● Dynamic – takes advantage of Cassandra's ability
to use arbitrary column names
http://cassandra.apache.org/ 25
● A key must be specified
● Data types for columns can be specified
● Options can be specified
CREATE COLUMNFAMILY Fish (key blob PRIMARY KEY);
CREATE COLUMNFAMILY FastFoodEatings (user text PRIMARY KEY)
WITH comparator=timestamp AND default_validation=int;
CREATE COLUMNFAMILY MonkeyTypes (
key uuid PRIMARY KEY,
species text, alias text,
population varint
) WITH comment='Important biological records'
AND read_repair_chance = 1.0;
Cassandra 1.0: Column Families
27
● Comparator = a data type for a column name
● Validator = a data type of a column value
○ or content of a row key
● Data types do not need to be defined
○ Default: BytesType, i.e., arbitrary hexadecimal bytes
● Basic operations: GET, SET, DEL
Cassandra 1.0: Column Families (2)
28
create column family users
with key_validation_class = Int32Type
and comparator = UTF8Type
and default_validation_class = UTF8Type;
Cassandra 1.0: Data Manipulation
// set column values in row with key 7
set users[7]['login'] = utf8('honza');
set users[7]['name'] = utf8('Jan Novák');
set users[7]['email'] = utf8('jan@novak.name');
set users[13]['login'] = utf8('fantomas');
set users[13]['name'] = utf8('incognito');
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 29
get users[7]['login'];
=> (name=login, value=honza, timestamp=1429268223462000)
get users[13];
=> (name=login, value=fantomas, timestamp=1429268224554000)
=> (name=name, value=incognito, timestamp=1429268224555000)
Cassandra 1.0: Data Manipulation (2)
list users;
RowKey: 7
=> (name=email, value=jan@novak.name, timestamp=14292682...)
=> (name=login, value=honza, timestamp=1429268223462000)
=> (name=name, value=Jan Novák, timestamp=1429268223471000)
-------------------
RowKey: 13
=> (name=login, value=fantomas, timestamp=1429268224554000)
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 30
● CQL: Cassandra Query Language
○ SQL-like commands
■ CREATE, ALTER, UPDATE, DROP, DELETE, TRUNCATE, INSERT, …
○ Simpler than SQL
● Since CQL 3 (Cassandra 1.2)
○ Column -> cell
○ Column family -> table
● Dynamic columns (wide rows) still supported
○ CQL supports everything that was possible before
○ “Old” approach (Thrift API) can be used as well
Cassandra: Sparse Tables
http://www.datastax.com/documentation/cql/3.1/ 31
CREATE TABLE users (
user_id int PRIMARY KEY,
login text,
name text,
email text );
INSERT INTO users (user_id, login, name)
VALUES (3, 'honza', 'Jan Novák');
SELECT * FROM users;
user_id | email | login | name
---------+-------+-------+-----------
3 | null | honza | Jan Novák
Working with Tables
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 32
● Values can use “collection” types:
○ set – unordered unique values
○ list – ordered list of elements
○ map – name + value pairs
■ a way to realize super-columns
● Realization of the original idea of free columns
○ Internally, all values in collections as individual columns
○ Cassandra can handle “unlimited” number of columns well
Tables: Dynamic Columns
33
CREATE TABLE users (
login text PRIMARY KEY,
name text,
emails set<text>, // column of type “set”
profile map<text, text> // column of type “map”
)
INSERT INTO users (login, name, emails, profile)
VALUES ( 'honza', 'Jan Novák', { 'honza@novak.cz' },
{ 'colorschema': 'green', 'design': 'simple' }
);
UPDATE users
SET emails = emails + { 'jn@firma.cz' }
WHERE login = 'honza';
Tables: Dynamic Columns (2)
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 34
● Compound primary key
CREATE TABLE mytable (
row_id int, column_name text, column_value text,
PRIMARY KEY (row_id, column_name)
);
INSERT INTO mytable (row_id, column_name, column_value)
VALUES ( 3, 'login', 'honza');
INSERT INTO mytable (row_id, column_name, column_value)
VALUES ( 3, 'name', 'Jan Novák');
INSERT INTO mytable (row_id, column_name, column_value)
VALUES ( 3, 'email', 'honza@novak.cz');
Dynamic Columns: Another Way
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 35
Cassandra: Working with Data
36
Data Sharding in Columnar Systems
System Terminology
BigTable tablets
HBase regions
Cassandra partitions
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p. 37
● Entries in each table are split by partition key
○ Which is a selected column (or a set of columns)
○ Specifically, the first column (or columns) from the primary
key is the partition key of the table
CREATE TABLE tab ( a int, b text, c text, d text,
PRIMARY KEY ( a, b, c)
);
CREATE TABLE tab ( a int, b text, c text, d text,
PRIMARY KEY ( (a, b), c)
);
Data Sharding in Cassandra
38
● All entries with the same partition key
○ Will be stored on the same physical node
○ => efficient processing of queries on one partition key
Data Sharding in Cassandra (2)
CREATE TABLE mytable (
row_id int, column_name text, column_value text,
PRIMARY KEY (row_id, column_name) );
● The rest of the columns in the primary key
Are so called clustering columns
○ Rows are locally sorted by values in the clustering columns
■ the order for physical storing rows
39
● Cassandra adopts peer-to-peer replication
○ The same principles like in key-value stores & document DB
○ Read/Write quora to balance between
availability and consistency guarantees
Data Replication
● HBase (and Google BigTable)
○ Physical data distribution & replication is done by the
underlying distributed file system
○ HDFS, GFS (see below)
40
BigTable: Architecture
41
HBase: Architecture
http://bigdatariding.blogspot.cz/2013/10/hbase-overview-of-architecture-and-data.html 42
Local Persistence
● Organization of local data store at nodes
● Objectives:
○ Persistent, durable (ensure persistence after commit)
○ High performant in reads & writes
● Approach:
○ Memory tables
○ Append-only update log
○ SSTable disk-storage format: immutable
○ Compaction
43
Local Persistence
● Persistency + durability
● but also high throughput of write operations
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p.
write op. read op.
main memory
disk
Write Ahead Log
SSTable files
44
Compaction, Consolidation
● Data in SSTables are immutable
● A regular process of data compaction
I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a NoSQL databáze. Praha: Grada Publishing, 2015. 288 p.
main memory
disk
SSTable files
merge
the data
delete
records
files + free disk
space
45
Cassandra: Querying
46
● The syntax of CQL is similar to SQL
○ But search just in one table (no joins)
SELECT <selectExpr>
FROM [<keyspace>.]<table>
[WHERE <clause>]
[ORDER BY <clustering_colname> [DESC]]
[LIMIT m];
Cassandra Query Language (CQL)
SELECT column_name, column_value
FROM mytable
WHERE row_id=3
ORDER BY column_name;
47
CREATE TABLE mytable (
row_id int,
column_name text,
column_value text,
PRIMARY KEY (row_id, column_name)
);
● The search condition can be:
○ on columns in the partition key
■ And only using operators = and IN
... WHERE row_id IN (3, 4, 5)
■ Therefore, the query hits only one or several physical nodes (not all)
CQL: Limitations on “Where” Part
CREATE TABLE mytable (
row_id int,
column_name text,
column_value text,
PRIMARY KEY
(row_id, column_name)
);
○ on columns from the clustering key
■ Especially, if there is also condition on the partition key
... WHERE row_id=3 AND column_name='login'
■ If it is not, the system must filter all entries
SELECT * FROM mytable
WHERE column_name IN ('login', 'name') ALLOW FILTERING;
48
● Other columns can be filtered on
○ if there is an index built on the column(s).
● Indexes can also be built on collection columns
(set, list, map)
○ And then queried by CONTAINS like this
SELECT login FROM users
WHERE emails CONTAINS 'jn@firma.cz';
SELECT * FROM users
WHERE profile CONTAINS KEY 'colorschema';
CQL: Limitations on “Where” Part (1)
49
CREATE INDEX ON users (emails);
● Secondary indexes on any column
○ B+-Tree indexes
○ User-defined implementation of indexes
CREATE INDEX ON users (emails);
Indexes
50
Queries: HBase and Cassandra
● Cassandra: Direct support for secondary indexes
● HBase: Indirect ways to build secondary indexes
● In general, all systems support MapReduce
○ HBase & Cassandra: Hadoop MapReduce
● Comparison of HBase and Cassandra:
http://www.infoworld.com/article/2610656/database/big-data-showdown--cassandra-vs--hbase.html
51
Transactions
● Cassandra 2.x supports “lightweight transactions”
○ compare and set operations
○ using Paxos consensus protocol
■ nodes agree on proposed data additions/modifications
■ faster than Two-phase commit (2PC) protocol
INSERT INTO users (login, name, emails)
VALUES ('honza', 'Jan Novák', { 'honza@novak.cz' })
IF NOT EXISTS;
UPDATE mytable SET column_value = 'honza@firma.cz'
WHERE row_id = 3 AND column_name = 'email'
IF column_value = 'honza@firm.cz';
52
Summary
● Column-family stores
○ are worth only for large data and large query throughput
○ two ways to see the data model:
■ large sparse tables or multidimensional (nested) maps
○ data distribution is via row key
■ analogue of document ID or key in document or key-value stores
○ efficient disk + memory local data storage
● Cassandra
○ CQL: structured after SQL, easy transition from RDBMS
53
Questions?
Please, ANY questions?
54
References
● I. Holubová, J. Kosek, K. Minařík, D. Novák. Big Data a
NoSQL databáze. Praha: Grada Publishing, 2015. 288 p.
● RNDr. Irena Holubova, Ph.D. MMF UK course NDBI040:
Big Data Management and NoSQL Databases
● Chang, F. et al. (2008). Bigtable: A Distributed Storage
System for Structured Data. ACM TOCS, 26(2), pp 1–26.
● http://www.datastax.com/documentation/cassandra/1.2/
● http://www.datastax.com/documentation/cassandra/2.0/
● http://wiki.apache.org/cassandra/
● http://hbase.apache.org/
55