Machine Learning in ECG Data Bc. Barbora Škrabalová, Bc. Ronald Luc, Bc. Martin Kozlovský, Bc. Adam Žitňanský, Sunil Aryal How does the data look? single lead, 2-lead, 4-lead, 12-lead. The ECG data are a kind of time series. We distinguish four types based on the number of electrodes placed on the body: In publicly available datasets, the single, 2- and 12-lead types are usually used. Example of 12- lead ECG signal PP interval RR interval PR interval QT interval For ML usage, features derived from those graphs are usually used: R peak - averaged maximal amplitude of the signals, etc. Time intervals between key parts of the signal Another feature extraction possibility is using Gabor Filter, which extracts features like variance, skewness, mean, standard deviation, kurtosis, entropy, and energy. This filter is a linear filter used for texture analysis. Algorithms used ML algorithms can be used for both preprocessing and classification, but we are interested in applying those methods for classification (or categorization). Popular Deep Learning methods are Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), Bidirectional Recurrent Neural Network (BRNN), Multilayer Perceptron (MLP), XGBoost or even ensembles of those. In our project, we will use both Deep Learning methods and classical ML algorithms like Random Forest, Gaussian Naive Bayes, k-Nearest Neighbours, etc. Performance evaluation Common quantitative metrics such as accuracy, area under curve, specificity, precision, recall, F-measure, and positive/negative predictive value are used. Nowadays, DL models reach accuracy bigger than 99%, so it will be hard to beat them in our project. The model may perform even better than cardiologists themselves in some cases based on the metrics above. For other diseases than arrhythmia, the models can suffer from low sensitivity. At least, these can be suitable for excluding the illness from the patient's diagnosis. Sometimes, models can perform well on a specific dataset (from one hospital, etc.), but poorly on another. Despite the big accuracy score, the ML models are still seen as a black box. It negatively affects trust and placement in the medical practice. To resolve this conflict, we can perform a saliency analysis. It should reveal which parameters are weighted the most (thus, which features are the most important) in the model. Why is it not widely used in practice yet? ML can be widely used in medicine. It can help predict heart attack or classify different diseases from ECG signals, such as multiple forms of arrhythmia (atrial fibrillation, right/left bundle branch block, bradycardia, tachycardia, flutter), ischaemia, cardiomyopathy, valvulopathy or even Covid-19. Application Resources will be available in the State-of-the-art paper of this project group.