Challenges of quality management in cloud applications Mgr. David Gešvindr MCSE: Data Platform | MCT | MSP gesvindr@mail.muni.cz Motto  Classical web application designed for on-premise environment cannot utilize the full potential of the cloud  Higher operation costs  Cloud application has to be designed in different way applying different tactics and patterns • Cloud platform offers a wide portfolio of services • Anytime anything can fail • Need to optimize based on multiple conflicting criteria:  High operation costs for relational database x Lower development costs Outline 1. Introduction to a cloud environment and its foundations 2. Identification of relevant software quality attributes in the cloud 3. Frequent mistakes in cloud application design 4. Cloud specific architectural tactics and guidelines for their application Outline 1. Introduction to a cloud environment and its foundations 2. Identification of relevant software quality attributes in the cloud 3. Frequent mistakes in cloud application design 4. Cloud specific architectural tactics and guidelines for their application Cloud definition „Cloud computing is a model for enabling ubiquitous, convenient, ondemand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.“ - National Institute of Standards and Technology Characteristics of the cloud  On-demand self service  Broad network access  Resource pooling  Rapid elasticity  Measured service Service model  Software as a Service (SaaS)  Platform as a Service (PaaS)  Infrastructure as a Service (IaaS)  Public cloud  Private cloud  Hybrid cloud Deployment model Outline 1. Introduction to a cloud environment and its foundations 2. Identification of relevant software quality attributes in the cloud 3. Frequent mistakes in cloud application design 4. Cloud specific architectural tactics and guidelines for their application Availability  Cloud platform offers service availability 99.95-99.99%  Be aware of transient errors • Need to implement detection and retry policy to prevent random faults  User error – damaged data, recovery  Data center outage • Design of a cross data center cloud application requires application of complex tactics to work properly Incoming service requests Region 1 Datacenter ServiceDeployment Load Balancer Storage Services Web Role Servers Application Services Region 2 Datacenter ServiceDeployment Load Balancer Storage Services Web Role Servers Application Services One-way storage replication Azure Traffic Manager Active datacenter Passive datacenter Throughput  Measure of the amount of work an application must perform in a unit of time  Throughput is being quantified with a number of transactions, operations or requests that the system can handle per second or other time unit.  Strongly dependent on throughput of application components involved in the request processing  Early identification of the bottleneck  Be aware of the difference between average and peek throughput Web Server Database Blob Storage 1 incoming request GET Products Page 2 database requests 1 storage request Throughput 100 request/s Throughput 500 request/s 50 web server request/s 500 web server request/s Service throughput 50 request/s Response Time  Response time is a measure of the latency an application exhibits in processing a business transaction  Is determined by • Communication latency • Request processing time Web Server Database Blob Storage GET Products Page Load product from DB Load categories from DB Load precomputed pro duct recommendations from blob storage << return >> << return >> << return >> 170 ms 255 ms 380 ms 900 ms Scalability  Scalability characterizes how well a solution to some problem will work when the size of the problem increases  Be aware of the difference: • Performance related attributes - Specify application behavior for a static instance of cloud environment conguration • Elasticity – The degree to which a system is able to adapt to workload changes by provisioning and deprovisioning resources in an autonomic manner  If the application is not scalable it cannot effectively utilize the potentially unlimited amount of processing resources that the cloud platform offers Operation costs  It is necessary to precisely evaluate all operation costs  Problem: • Service is billed based on real resource usage – how to effectively predict operation costs? Development costs  Multiple services offer similar services but the difference is critically important • Storage services differs in offered functionality (SQL vs. NoSQL database) • Differences in scalability  Integration costs are based on the functionality, available libraries and tools  For instance: • Azure Storage x Azure SQL Database x DocumentDB Outline 1. Introduction to a cloud environment and its foundations 2. Identification of relevant software quality attributes in the cloud 3. Frequent mistakes in cloud application design 4. Cloud specific architectural tactics and guidelines for their application Case study: Elections in USA  This case study was presented at TechEd 2014 by Azure CTO Mark Russinovich  Video recording of the session: http://channel9.msdn.com/Events/TechEd/Europe/2014/CDP-B337  System for presenting results of US elections Service architecture  Election results are uploaded to Azure Storage  Worker role continuously processes the results  Processed results are stored in relational database  Web servers load results from DB Built-in Azure Web Role Load Balancer Web Role Virtual Servers 2 … Azure SQL Database Main DB Results DB 1 n Worker Role Virtual Servers … FrontendLayer Azure Blob Storage Containers 21 n 1 2 3 4 5 …6 n 1 2 3 4 5 …6 nPrecincts Storage Layer Processing Layer Staging Layer Expected load  Every user request results in 10 database requests  Expected service load  Problems: • Azure SQL throughput is limited up to 1000 requests per second Expected Load Scenarios Expected Page Views Time Window (hrs) Page View/sec 10X/pvs DB Calls/sec Average 10,000,000 4 694 6,944 Peak Hour 6,000,000 1 1,667 16,667 Update of application’s architecture 1. Addition of cache layer which is composed of worker servers hosting in-memory cache  Throughput of the storage increased in order of magnitude Built-in Azure Web Role Load Balancer Web Role Virtual Servers 2 … Azure SQL Database Main DB Results DB 1 n FrontendLayerStorageLayer In-memory cache Virtual Servers 2 Cache Requests Load Balancer 1 3 4 Rest of the architecture remains the same Real load Allocated capacity  With database  With in-memory cache Time Actual Page Views Time Window (sec) Page View/sec Calls/sec Difference Calls/sec Request s served 8pm+10 secs 448932 10 44893 448932 -447932 0,22% 8pm+30 secs 206925 20 10346 103463 -102463 0,97% 8:01 odp. 171231 30 5708 57077 -56077 1,75% 8:03 odp. 37835 120 3153 31529 -30529 3,17% 8:10 odp. 494423 420 1177 11772 -10772 8,49% 8:30 odp. 416379 1200 347 3470 -2470 28,82% Time Actual Page Views Time Window (sec) Page View/sec Calls/sec Difference Calls/sec Request s served 8pm+10 secs 448932 10 44893 448932 -288932 35,64% 8pm+30 secs 206925 20 10346 103463 56537 100,00% 8:01 odp. 171231 30 5708 57077 102923 100,00% 8:03 odp. 37835 120 3153 31529 128471 100,00% 8:10 odp. 494423 420 1177 11772 148228 100,00% 8:30 odp. 416379 1200 347 3470 156530 100,00% Case study 2: Shopping in Amazonu  Do you know how is rendered a product page in Amazon e-shop?  Product page including recommended products for a specific user is pre-rendered as a fragment and stored in S3 storage  Page transmitted to the user is just simple composition of prerendered fragments Indirect dependencies  Compute operations are done asynchronously/independently on synchronous processing of user request  Compute operation is stored in scalable queue service and worker processes load task definitions from the queue Web Server 1 Incoming requests queue LoadBalancer Incoming requests Web Server 2 Incoming requests queue Web Server 3 Incoming requests queue 5s 5s 2,5 s 2,5 s 2 s 2 s Infrastructure state 5 seconds later Web Server 1 Incoming requests queue Web Server 2 Incoming requests queue Web Server 3 Incoming requests queue 5s idle idle Case study 3: Smart thermostats HVAC Thermostat Thermostat BuildingRouter Web Role (Synchronous HTTP) Web Role Mobile User Interface Azure On-Prem Gateway Weather WCF Service Azure Queue Email Queue SMTP Relay Addition of asynchronous dependencies HVAC Thermostat Thermostat BuildingRouter Web Role (Synchronous HTTP) Web Role Mobile User Interface Service Bus Email Queue Azure On-Prem Gateway SMTP RelayWeather WCF Service Worker Role (write-leveling) Case study 3: Conclusion  Initial tests failed with 35 000 connected thermostats  Goal was 100 000 (150 000) thermostats  Main issues: • Synchronous HTTP handler • Row-level updates of the DB (instead of batch updates) • Database tuning • Queue scalability issues, resolved by an application of partitioning Azure Service Bus Queue Worker Role Virtual Server Message Procesor Application Synchronous Message Retrieval Synchronous Message Processing Azure Service Bus Queue Worker Role Virtual Server Message Procesor Application Batch Message Retrieval Asynchronous Message Processing Outline 1. Introduction to a cloud environment and its foundations 2. Identification of relevant software quality attributes in the cloud 3. Frequent mistakes in cloud application design 4. Cloud specific architectural tactics and guidelines for their application Cloud specific architectural tactics 1. Multi-tiered Storage Tactic 2. Indirect Dependency Tactic 3. Results Pre-computation Tactic 4. Component Co-deployment Tactic 1. Multi-tiered Storage Tactic  Goal: Combine various storage services to enhance their advantages and mitigate their weaknesses  Relational Database + Transaction processing, integrity constraints, complex quering – Limited scalability, expensive operation  NoSQL databáze (Azure Table Storage) + Good scalability, cheap operation – Complex key design – Ability to query data based only on combination of partition and row key  In-memory cache (Redis Cache) + Extremely scalable, high throughput, very low response time – Only Key/value store, very expensive Storage Comparison in Microsoft Azure Database Redis Azure Storage +$0.0036 per 100,000 transactions 2. Indirect Dependency Tactic  User requests are not processed synchronously be the web server  Task requests are stored in a scalable queue service  Number of running workers is variable (cost effective)  Results are stored in a scalable storage service  Problem: How to notify user about changes or errors in processing? Built-in Azure Web Role Load Balancer Web Role Virtual Servers … Azure SQL DB n Worker Role Virtual Servers 2 … FrontendLayer n Storage Layer Processing Layer Azure Storage Queue High priority queue Low priority queue Azure Blob and Table Storage 1 1 Asynchronousoperations Storage Layer 3. Results Pre-computation Tactic  Effective combination of previously mentioned tactics: • Multi-tiered Storage Tactic – provides cost effective scalable storage • Indirect Dependency Tactic – provides spare compute resources  Goal: Increase cost effectivity of reserved compute resources  Workers are billed based on their uptime independently on their CPU load  Results are stored in a form which does not require additional processing 4. Component Co-deployment Tactic  Goal: Minimize inter role communication latency  In the cloud communication latency between different services may be significantly higher than in on-premise environment  Necessary to minimize service calls  Apply Affinity Groups Worker Role Virtual Servers IaaS SQL Server App Server App Server … Web Role Virtual Servers Web Server Web Server … IaaS SQL Server Web Role Virtual Servers Web + App Server Web + App Server … Outline 1. Introduction to a cloud environment and its foundations 2. Identification of relevant software quality attributes in the cloud 3. Frequent mistakes in cloud application design 4. Cloud specific architectural tactics and guidelines for their application