ARCHITECTURES OF REALWORLD
MACHINE LEARNING
SYSTEMS INTHE CLOUD
Lukas Grolig
• WE WILL COVER SOME BASICS
• ON CASE STUDY WE WILL SHOW PROCESS OF
DEVELOPING ML SYSTEM
• WE WILL DISCUSS SOME PROCESSES AND GOTCHAS
STRUCTURE OFTHISTALK
• Every big company wants to apply ML
• No one wants to stay behind
• They don’t know where to start
• Some have data / some will start collecting them
HYPE AROUND MACHINE
LEARNING
WHATTHEY CAN SOLVE?
• Predict sales - units / proﬁts
• Motivate users to buy something more
• Segment clients
• Find patterns in user workﬂow and move him to next step
• Automatic form handling
• Chatbots
• …
So you want to do
machine learning?
FIRST WHAT ARE NECESSARY
SKILLS
• You must know technologies used in their systems
(Java, .Net)
• DB knowledge is a must (MS SQL, Oracle, Postgres,
Mongo, Exadata, Greenplum,Vertica)
• Python & Javascript
• Infrastructure (Kubernetes plus any cloud of your choice)
HOW WE STARTED
• Oriﬂame wanted do machine learning
• There were no use cases and we were suggesting
what we shall look
• As a ﬁrst step we wanted to predict sales
REQUIREMENTS
• Especially in ML topics you client has only rough idea what he wants when he
approaches you. You don’t receive any requirements.
• Communication is key. Discuss what options there are.Where you can
start and describe all steps.This way you can get rough basic plan.
• Always set right prices. In begin get ﬂexible budget to do exploration or offer
exploration as service.
• When at least part of system is clear formalise requirements and agree on
budget on that part.
• Never promise high accuracy!
STORY CONTINUES
• We were looking for contact person who has
some data for our case.
• Data were available so we started building models.
• Also we extended scope to do automatic
catalogue planning.
COMMUNICATION IN FIRST
STAGE
• This is most critical part.You have to build
conﬁdence of your client in you.
• Get direct contacts. It is better to get information
that by using mediator.
PROTOTYPE FAST
• Mock as many things as possible. Upload CSVs to blob storage. Make HTTP
calls to storage.
• Use Python to fast prototype things.
• Before you get into ML explore the data ﬁrst. (you probably heard about
histogram, median, standard deviation). Use some automatic visualisations heatmap
using SOM, do clustering. You must understand data you
have.
• Sell results to your client. Even those visualisations can have high value.
Always perfectly describe business case that you can do based on what you
have.
START WITH FIRST MODELS
• In the begin never develop custom ML
model.
• Often linear regression is enough. If you
don’t think it will be enough use automl.
• Important note: double check that you don’t measure
accuracy on training data. High percentages in this
stage are suspicious.
SIDE NOTE
• In most business cases you don’t do
image recognition.
• Regression does not have so high accuracies.
SO WE STARTED BUILDING
SYSTEM
• We had service that forecasted sales of catalogue.
• We started catalogue planning based on genetic
algorithm.
• Note: there are algorithms like differential evolution,
particle swarm optimisation but all are mainly tested
on contiguous space. Often your space is discrete and
values are not ordinal.
TRY DIFFERENT APPROACHES
• Forecasting
• we tried single model for everything. Different model per
country. Multiple models for country.
• Catalogue Planning
• In genetic algorithms good approach is use coevolution
algorithms.Works good for many dimensions. Come
implementations like C3 work on paper (developed on
continuous space) but reality is harder.
JUPYTER NOTEBOOK ISYOUR
BEST FRIEND
APPROACHTO ML
FINETUNING
• Different algorithms have differences in accuracy in
percentages. Rather focus on getting more data
columns, model relations, or manipulate data often
applying logarithmic or exponential function
will give you much more that trying different
algorithms.
APPROACHTO ML
FINETUNING
• In optimisation problems research papers are only
starting point.You have to work with discrete
space.You are looking for solutions how to make
ordinal values from categoric attributes.
STORY CONTINUES
• We started replacing mocked parts with
connections to real systems.This sometimes
involves creating tunnels between cloud
and on-premise systems.
• When forecasting catalogue payload for simpliﬁed
version of 1.5mb. Population of 100
chromosomes takes time to forecast.
PYTHON IS SINGLE
THREADED
• Python becomes in later stage pain point.You have to start doing
multithreading.
• Option 1: parallel map function spamming processes.
• Option 2: use celery plus redis
• Option 3: migrate to python 3.7 and use async/await
• Option 4: you can spam many kubernetes pods.
• Does not work: lambda functions - startup is too slow, calculation slow,
responses high.We easily took down Azure Functions,AWS works so so.
YOU HAVETO START SCALING
CLUSTERS AND OPTIMISE
• In Azure cluster of DS2_V2 is pricy.With 10 machines that
handled running system in reasonable time (getting catalogue
in 10s minutes) costs over 3000 euro per month.
• Forecasting training times were in tens of hours.Algorithms
like LSTM don’t work well on GPU. Use GRU orTNN or
XGBOOST.
• Evaluation of bigger ML model takes 1s or more. Count with
that.
START BUILDING REAL
SYSTEM
• You know requirements.
• Deﬁne technologies - Python is not best choice in this stage.
• Connect to live systems = replace mocks.Your mocks
probably have some rough contract already.Time to reﬁne it.
• Learn how to doVPN tunnels. Never forget to cache or
store data on your side (pool them once per day).
RESPONSESTAKE A LOT OF
TIME
• Azure or AWS terminates you connection when you
don’t send response within given time.
• Build asynchronous APIs. Start task return HTTP 201
Created with location of result. Pool the location.
• Or use real-time communication like websockets.This
will lead to custom protocol that is harder to use.
OPTIMIZETRANSFERS
• It is easy to do 100s of MBs spikes in trafﬁc.
• Start gzipping requests (not only responses!)
• Move to binary protocol if possible.Websockets
or grpc handle those situations better than HTTP.
BUILD ML-OPS
• In machine learning you must also create pipelines.
• When new data arrives, start training. Evaluate
new model and historical models.
• Deploy appropriate model to production system.
CREATE FRONTEND
• We suggest to start building Excel plugin ﬁrst. It is written in react
and can be easily converted to standalone site or widget.
• Excel will give users many function that you would have to
otherwise develop in application.They will provide feedback faster.
• Notes:
• You have to know Javascript and React
• Debugging Excel add-ins (especially on Mac) is brutal pain.
HOW IT CONTINUED
• Because number of constrains on catalogue planning grew
Oriﬂame wanted automated way.
• Getting patterns from data was hard. It was easier to do
anomaly detection so we integrated that to ﬁtness function.
• We started taking side projects like OCR of customer IDs
or passports, sentiment analysis of messages on FB or
support chat…
DON’T FORGET PRESENT
RESULTS
• Present results and progress often.
• Always have business case in mind and sell it again and again!
Customer will come with additional projects.
• Even 1 or 2 percent improvement will make you client happy.
• In optimisation number of constrains will grow with each presentation.
• Present results across whole company.There will be other people who
will have business case for ML.
WELL,THAT’S IT FROM MY
SIDE.
ANY QUESTIONS?
THANKYOU AND HAVE A
NICE DAY