Challenges of quality
management in cloud
applications
Mgr. David Gešvindr
MCSE: Data Platform | MCT | MSP
gesvindr@mail.muni.cz
Motto
 Classical web application designed for on-premise environment
cannot utilize the full potential of the cloud
 Higher operation costs
 Cloud application has to be designed in different way applying
different tactics and patterns
• Cloud platform offers a wide portfolio of services
• Anytime anything can fail
• Need to optimize based on multiple conflicting criteria:
 High operation costs for relational database x Lower development costs
Outline
1. Introduction to a cloud environment and its foundations
2. Identification of relevant software quality attributes in the cloud
3. Frequent mistakes in cloud application design
4. Cloud specific architectural tactics and guidelines for their
application
Outline
1. Introduction to a cloud environment and its foundations
2. Identification of relevant software quality attributes in the cloud
3. Frequent mistakes in cloud application design
4. Cloud specific architectural tactics and guidelines for their
application
Cloud definition
„Cloud computing is a model for enabling ubiquitous, convenient, ondemand
network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with minimal
management effort or service provider interaction.“
- National Institute of Standards and Technology
Characteristics of the cloud
 On-demand self service
 Broad network access
 Resource pooling
 Rapid elasticity
 Measured service
Service model
 Software as a Service (SaaS)
 Platform as a Service (PaaS)
 Infrastructure as a Service (IaaS)
 Public cloud
 Private cloud
 Hybrid cloud
Deployment model
Outline
1. Introduction to a cloud environment and its foundations
2. Identification of relevant software quality attributes in the
cloud
3. Frequent mistakes in cloud application design
4. Cloud specific architectural tactics and guidelines for their
application
Availability
 Cloud platform offers service availability
99.95-99.99%
 Be aware of transient errors
• Need to implement detection and
retry policy to prevent random faults
 User error – damaged data, recovery
 Data center outage
• Design of a cross data center cloud
application requires application of complex
tactics to work properly
Incoming service
requests
Region 1 Datacenter
ServiceDeployment
Load Balancer
Storage Services
Web Role Servers
Application Services
Region 2 Datacenter
ServiceDeployment
Load Balancer
Storage Services
Web Role Servers
Application Services
One-way storage replication
Azure Traffic Manager
Active datacenter Passive datacenter
Throughput
 Measure of the amount of work an application must perform in
a unit of time
 Throughput is being quantified with a number of transactions,
operations or requests that the
system can handle per second or
other time unit.
 Strongly dependent on throughput of
application components involved in
the request processing
 Early identification of the bottleneck
 Be aware of the difference between average and peek throughput
Web Server
Database Blob Storage
1 incoming request
GET Products Page
2 database
requests
1 storage
request
Throughput 100 request/s Throughput 500 request/s
50 web server request/s 500 web server request/s
Service throughput
50 request/s
Response Time
 Response time is a measure of the
latency an application exhibits in
processing a business transaction
 Is determined by
• Communication latency
• Request processing time
Web Server Database Blob Storage
GET Products Page
Load product from DB
Load categories
from DB
Load precomputed pro duct recommendations
from blob storage
<< return >>
<< return >>
<< return >>
170 ms
255 ms
380 ms
900 ms
Scalability
 Scalability characterizes how well a solution to some problem will
work when the size of the problem increases
 Be aware of the difference:
• Performance related attributes - Specify application behavior for a static
instance of cloud environment conguration
• Elasticity – The degree to which a system is able to adapt to workload
changes by provisioning and deprovisioning resources in an autonomic
manner
 If the application is not scalable it cannot effectively utilize the
potentially unlimited amount of processing resources that the cloud
platform offers
Operation costs
 It is necessary to precisely evaluate all operation costs
 Problem:
• Service is billed based on real resource usage – how to effectively predict
operation costs?
Development costs
 Multiple services offer similar services but the difference is critically
important
• Storage services differs in offered functionality (SQL vs. NoSQL database)
• Differences in scalability
 Integration costs are based on the functionality, available libraries
and tools
 For instance:
• Azure Storage x Azure SQL Database x DocumentDB
Outline
1. Introduction to a cloud environment and its foundations
2. Identification of relevant software quality attributes in the cloud
3. Frequent mistakes in cloud application design
4. Cloud specific architectural tactics and guidelines for their
application
Case study: Elections in USA
 This case study was presented at TechEd 2014 by Azure CTO Mark
Russinovich
 Video recording of the session:
http://channel9.msdn.com/Events/TechEd/Europe/2014/CDP-B337
 System for presenting results of US elections
Service architecture
 Election results are
uploaded to Azure
Storage
 Worker role continuously
processes the results
 Processed results are
stored in relational
database
 Web servers load results
from DB
Built-in Azure Web Role Load Balancer
Web Role
Virtual Servers 2 …
Azure
SQL Database
Main
DB
Results
DB
1 n
Worker Role
Virtual Servers
…
FrontendLayer
Azure Blob
Storage
Containers
21 n
1 2 3 4 5 …6 n
1 2 3 4 5 …6 nPrecincts
Storage
Layer
Processing
Layer
Staging
Layer
Expected load
 Every user request results in 10 database requests
 Expected service load
 Problems:
• Azure SQL throughput is limited up to 1000 requests per second
Expected Load
Scenarios
Expected
Page Views
Time Window
(hrs) Page View/sec
10X/pvs
DB Calls/sec
Average 10,000,000 4 694 6,944
Peak Hour 6,000,000 1 1,667 16,667
Update of application’s architecture
1. Addition of cache layer which
is composed of worker
servers hosting in-memory
cache
 Throughput of the storage
increased in order of
magnitude
Built-in Azure Web Role Load Balancer
Web Role
Virtual Servers 2 …
Azure
SQL Database
Main
DB
Results
DB
1 n
FrontendLayerStorageLayer
In-memory cache
Virtual Servers 2
Cache Requests Load Balancer
1 3 4
Rest of the architecture remains the same
Real load
Allocated capacity
 With database
 With in-memory cache
Time
Actual Page
Views
Time Window
(sec)
Page
View/sec Calls/sec
Difference
Calls/sec
Request
s served
8pm+10
secs 448932 10 44893 448932 -447932 0,22%
8pm+30
secs 206925 20 10346 103463 -102463 0,97%
8:01 odp. 171231 30 5708 57077 -56077 1,75%
8:03 odp. 37835 120 3153 31529 -30529 3,17%
8:10 odp. 494423 420 1177 11772 -10772 8,49%
8:30 odp. 416379 1200 347 3470 -2470 28,82%
Time
Actual Page
Views
Time Window
(sec)
Page
View/sec Calls/sec
Difference
Calls/sec
Request
s served
8pm+10
secs 448932 10 44893 448932 -288932 35,64%
8pm+30
secs 206925 20 10346 103463 56537 100,00%
8:01 odp. 171231 30 5708 57077 102923 100,00%
8:03 odp. 37835 120 3153 31529 128471 100,00%
8:10 odp. 494423 420 1177 11772 148228 100,00%
8:30 odp. 416379 1200 347 3470 156530 100,00%
Case study 2: Shopping in Amazonu
 Do you know how is rendered a product page in Amazon
e-shop?
 Product page including recommended products for a specific user
is pre-rendered as a fragment and stored in S3 storage
 Page transmitted to the user is just simple composition of prerendered
fragments
Indirect dependencies
 Compute operations are done asynchronously/independently on
synchronous processing of user request
 Compute operation is stored in scalable queue service and worker
processes load task definitions from the queue
Web Server 1
Incoming requests queue
LoadBalancer
Incoming
requests
Web Server 2
Incoming requests queue
Web Server 3
Incoming requests queue
5s 5s
2,5 s 2,5 s
2 s 2 s
Infrastructure state 5 seconds later
Web Server 1
Incoming requests queue
Web Server 2
Incoming requests queue
Web Server 3
Incoming requests queue
5s
idle
idle
Case study 3: Smart thermostats
HVAC
Thermostat
Thermostat
BuildingRouter
Web Role
(Synchronous HTTP)
Web Role
Mobile User Interface
Azure
On-Prem
Gateway
Weather WCF
Service
Azure Queue Email Queue
SMTP Relay
Addition of asynchronous dependencies
HVAC
Thermostat
Thermostat
BuildingRouter
Web Role
(Synchronous HTTP)
Web Role
Mobile User Interface
Service Bus
Email Queue
Azure
On-Prem
Gateway
SMTP RelayWeather WCF
Service
Worker Role
(write-leveling)
Case study 3: Conclusion
 Initial tests failed with 35 000
connected thermostats
 Goal was 100 000 (150 000) thermostats
 Main issues:
• Synchronous HTTP handler
• Row-level updates of the DB (instead of batch updates)
• Database tuning
• Queue scalability issues, resolved by
an application of partitioning
Azure Service Bus Queue
Worker Role Virtual Server
Message Procesor Application
Synchronous Message Retrieval
Synchronous Message Processing
Azure Service Bus Queue
Worker Role
Virtual Server
Message
Procesor
Application
Batch Message
Retrieval
Asynchronous Message
Processing
Outline
1. Introduction to a cloud environment and its foundations
2. Identification of relevant software quality attributes in the cloud
3. Frequent mistakes in cloud application design
4. Cloud specific architectural tactics and guidelines for their
application
Cloud specific architectural tactics
1. Multi-tiered Storage Tactic
2. Indirect Dependency Tactic
3. Results Pre-computation Tactic
4. Component Co-deployment Tactic
1. Multi-tiered Storage Tactic
 Goal: Combine various storage services to enhance their advantages and
mitigate their weaknesses
 Relational Database
+ Transaction processing, integrity constraints, complex quering
– Limited scalability, expensive operation
 NoSQL databáze (Azure Table Storage)
+ Good scalability, cheap operation
– Complex key design
– Ability to query data based only on combination of partition and row key
 In-memory cache (Redis Cache)
+ Extremely scalable, high throughput, very low response time
– Only Key/value store, very expensive
Storage Comparison in Microsoft Azure
Database Redis
Azure Storage
+$0.0036 per 100,000 transactions
2. Indirect Dependency Tactic
 User requests are not processed
synchronously be the web server
 Task requests are stored in a
scalable queue service
 Number of running workers is
variable (cost effective)
 Results are stored in a scalable
storage service
 Problem: How to notify user
about changes or errors in
processing?
Built-in Azure Web Role Load Balancer
Web Role
Virtual Servers
…
Azure
SQL DB
n
Worker Role
Virtual Servers
2
…
FrontendLayer
n
Storage
Layer
Processing
Layer
Azure
Storage
Queue
High priority queue
Low priority queue
Azure
Blob and Table
Storage
1
1
Asynchronousoperations
Storage
Layer
3. Results Pre-computation Tactic
 Effective combination of previously mentioned tactics:
• Multi-tiered Storage Tactic – provides cost effective scalable storage
• Indirect Dependency Tactic – provides spare compute resources
 Goal: Increase cost effectivity of reserved compute resources
 Workers are billed based on their uptime independently on their
CPU load
 Results are stored in a form which does not require additional
processing
4. Component Co-deployment Tactic
 Goal: Minimize inter role communication latency
 In the cloud communication latency between different services may
be significantly higher than in on-premise environment
 Necessary to minimize service
calls
 Apply Affinity Groups
Worker Role
Virtual Servers
IaaS
SQL Server
App
Server
App
Server
…
Web Role
Virtual Servers
Web
Server
Web
Server
…
IaaS
SQL Server
Web Role
Virtual Servers
Web +
App
Server
Web +
App
Server
…
Outline
1. Introduction to a cloud environment and its foundations
2. Identification of relevant software quality attributes in the cloud
3. Frequent mistakes in cloud application design
4. Cloud specific architectural tactics and guidelines for their
application