DynamoDB Daniel Charvát, Denisa Šrámková, Viliam Juríček, Šimon Berka Introduction to DynamoDB ● Key-value document database developed by Amazon ○ They discovered that 90 % of their operations query a single table ○ SQL database tables was thus mostly redundant ● Fully managed, multimaster, durable database with in-memory caching ● Partition through consistent hashing to spread data across instance nodes ● Size is defined through read and write capacity units ○ Allowed number of operations per second ○ Generally cheaper with less frequent usage Companies that are using DynamoDB Source: https://www.featuredcustomers.com/vendor/amazon-dynamodb/customers Ranking Source: https://db-engines.com/en/ranking/ Features Strengths ● Seamless scalability through automatic instance expansion ● Data backed up to Amazon S3 ● Ease of integration with other AWS services Drawbacks ● No ACID transactions ○ Although eventual consistency is almost guaranteed ● Not suitable for large binary objects. ● Cross-region replicability not available Setting up DynamoDB There are several options: ● Local ○ Windows, Linux, Mac OS (downloadable version) ○ Apache Maven (POM file) ○ Docker ● Web service 1. Sign up to AWS 2. Get an AWS access key 3. Configure credentials Accessing DynamoDB Again there are several options: ● AWS Management Console ● AWS Command Line Interface ● DynamoDB API - supports Java, JavaScript, .NET, Node.js, PHP, Python (AWS SDK called Boto 3), Ruby, C++, Go, Android and iOS Core components ● a table is a collection of items ● each item is a collection of attributes ● primary keys are used to uniquely identify each item in a table ● secondary indexes provide more querying flexibility Other than the primary key, the ‘People’ table is schemaless, which means that neither the attributes nor their data types need to be defined beforehand. Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html Core components The primary key for table ‘Music’ consists of two attributes (Artist and SongTitle). Each item in the table must have these two attributes. The combination of Artist and SongTitle distinguishes each item in the table from all of the others. Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html Primary key ● must be specified when creating a table ● DynamoDB support two kinds of primary keys: ○ partition key: composed of one attribute (PersonID) ○ partition key and sort key: composed of two attributes (Artist, SongTitle) ● each primary key attribute must be a scalar (of a data type: string, number, or binary) Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html Secondary indexes ● we can query the data in the table using an alternate key (in addition to queries against the primary key) ● two kinds of indexes: ○ global: both partition and sort key can be different from those on the table ○ local: same partition key as the table, but a different sort key ● default indexes limit per table: 20 global, 5 local Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html Secondary indexes We can query data items by Artist (partition key) or by Artist and SongTitle (partition key and sort key). If we also wanted to query the data by Genre and AlbumTitle: 1. Create an index on Genre and AlbumTitle 2. Query the index Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html RDBMS vs. DynamoDB Source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.html Characteristics Relational Database Management System (RDBMS) Amazon DynamoDB Optimal Workloads Ad hoc queries; data warehousing; OLAP Web-scale applications Data Model Requires a well-defined schema (data is normalized into tables, rows, and columns). Schemaless - Can manage structured or semistructured data. Performance optimized for storage optimized for compute Scaling Scale up through faster hardware tables can be span across multiple hosts in a distributed system (upper limits on scalability). Designed to scale out using distributed clusters of hardware (No upper limit). Creating a table - schema example { "TableName": "TestCertificates", "KeySchema": [ { "AttributeName": "cert_info_link", "KeyType": "HASH" }, { "AttributeName": "cert_authority", "KeyType": "RANGE" } ], "GlobalSecondaryIndexes": [ { "IndexName": "log_type_index", "KeySchema": [ { "AttributeName": "log_type", "KeyType" : "HASH" }, { "AttributeName": "cert_authority", "KeyType" : "RANGE" } ], cert_info_link not_valid_before not_valid_afte r cert_common_name cert_authority log_type http://ct.google... 1573257600 1581119999 thesmartlocal0... cPanel, Inc. X509LogEntry Creating a table - schema example "Projection": { "ProjectionType": "ALL" }, "ProvisionedThroughput" : { "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 } } ], "AttributeDefinitions": [ { "AttributeName": "cert_info_link", "AttributeType": "S" }, { "AttributeName": "cert_authority", "AttributeType": "S" }, { "AttributeName": "log_type", "AttributeType": "S" } ], "ProvisionedThroughput": { "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 } } LIVE DEMO