Models of Streaming Applications Filip Nálepa 2 Outline ● Motivation ● Components of a streaming application ● Operator placement problem ● Performance models 3 Motivation ● Infinite sequence of data ● Processing data in motion ● Scenarios – Event detection – Image stream processing – Surveillance video analysis Staticflickr.com 4 Streaming Application as a Graph ● Node – operator/task ● Edge – stream ● Stream – infinite sequence of data items/events 5 Operator Placement ● Assignment of operators to computational resources ● Metrics: throughput, latency CPU 1 CPU 2 CPU 3 6 Needed Models ● Model of operator placement problem – Purpose: place operators on resources ● Performance model – Purpose: retrieve metrics of the system OK FINISH Improve 7 Models Use Cases ● Initial operator placement – Placement and measurement ● Dynamic adaption to changes – Change/problem detection – proactive/reactive – New placement and verification 8 Model of Operator Placement Problem ● Computational resources, underlying network ● Streaming graph, operators, streams ● Optimization criteria ● Purpose: place operators on resources 9 Performance Analysis ● Accuracy vs efficiency ● Simulation and experiments ● Formal methods 10 Performance model Processing model Load model (input) Load model (output) Resource model (input) Resource model (output) 11 Standard Event Models ● Periodic ● Periodic with jitter ● Burst – period, maximal number of items, minimal distance between items ● Sporadic – minimal distance between items ● Advantages: simple, easy to analyze ● Disadvantages: too restrictive, unrealistic assumptions 12 Real-Time Calculus ● Arrival function α(Δ) – maximal number of data items that can arrive in any time interval of length Δ ● Service function β(Δ) – minimal number of data items that can be processed in any time interval of length Δ 13 Arrival and Service Function # data items Δ Arrival function Service function Latency Backlog 14 Real-Time Calculus ● Arrival function α(Δ) – maximal number of data items arrived in any time interval of length Δ ● Service function β(Δ) – minimal number of data items that can be processed in any time interval of length Δ ● Analysis based on algebraic computations ● Advantage: efficient ● Disadvantage: no state dependencies 15 Event Count Automata [0, 4] [0, 4] X1 ≤ 4 ∧ x2 ≤ 7; x2 := 0 X2 ≤ 4 ∧ x1 ≤ 7; x1 := 0 [1, 3] [1, 3] y1 ≥ 1 ∧ y2 ≥ 5; y2 := 0 y2 ≥ 1 ∧ y1 ≥ 5; y1 := 0 Buffer Arrival Service 16 Event Count Automata ● Arrival and service function represented as automata ● Automata connected by buffers ● Network of automata described as a Colored Petri Net for analysis ● Advantage: very accurate ● Disadvantage: state-space explosion → inneficient 17 Performance Analysis Summary ● Simulation – easy to use, no guarantees ● Standard event models – simple, not accurate ● Real-time calculus – efficient, captures burstiness, no state dependencies ● Event count automata – accurate, not efficient ● Combinations – tradeoffs 18 Summary ● Streaming application – directed graph of operators and streams ● Operator placement problem ● Performance models Thank you for your attention.