Machine Learning
in ECG Data
Bc. Barbora Škrabalová, Bc. Ronald Luc, Bc. Martin Kozlovský,
Bc. Adam Žitňanský, Sunil Aryal
How does the data look?
single lead,
2-lead,
4-lead,
12-lead.
The ECG data are a kind of time
series. We distinguish four types
based on the number of
electrodes placed on the body:
In publicly available datasets, the
single, 2- and 12-lead types are
usually used.
Example of 12- lead
ECG signal
PP interval
RR interval
PR interval
QT interval
For ML usage, features derived
from those graphs are usually
used:
R peak - averaged
maximal amplitude of
the signals, etc.
Time intervals
between key
parts of the
signal
Another feature extraction possibility is using Gabor Filter, which extracts
features like variance, skewness, mean, standard deviation, kurtosis,
entropy, and energy. This filter is a linear filter used for texture analysis.
Algorithms used
ML algorithms can be used for both preprocessing and classification, but we are
interested in applying those methods for classification (or categorization).
Popular Deep Learning methods are Convolutional Neural Networks (CNN), Long
Short-Term Memory (LSTM), Recurrent Neural Network (RNN), Bidirectional
Recurrent Neural Network (BRNN), Multilayer Perceptron (MLP), XGBoost or even
ensembles of those.
In our project, we will use both Deep Learning methods and classical ML
algorithms like Random Forest, Gaussian Naive Bayes, k-Nearest Neighbours, etc.
Performance evaluation
Common quantitative metrics such as
accuracy, area under curve,
specificity, precision, recall,
F-measure, and positive/negative
predictive value are used.
Nowadays, DL models reach accuracy
bigger than 99%, so it will be hard to
beat them in our project. The model
may perform even better than
cardiologists themselves in some
cases based on the metrics above.
For other diseases than arrhythmia,
the models can suffer from low
sensitivity. At least, these can be
suitable for excluding the illness from
the patient's diagnosis.
Sometimes, models can perform well
on a specific dataset (from one
hospital, etc.), but poorly on another.
Despite the big accuracy score, the
ML models are still seen as a black
box. It negatively affects trust and
placement in the medical practice.
To resolve this conflict, we can
perform a saliency analysis. It
should reveal which parameters
are weighted the most (thus, which
features are the most important) in
the model.
Why is it not widely
used in practice yet?
ML can be widely used in medicine.
It can help predict heart attack or
classify different diseases from ECG
signals, such as multiple forms of
arrhythmia (atrial fibrillation,
right/left bundle branch block,
bradycardia, tachycardia, flutter),
ischaemia, cardiomyopathy,
valvulopathy or even Covid-19.
Application
Resources will be available in the State-of-the-art paper of this project group.