1 Glycemia Forecasting Andrej Kubanda 2 - humans need sugar for energy - insulin hormone regulates blood glucose levels (Glycemia) - hypo- and hyperglycemia Sugars, insulin, energy 3 Type 1 Diabetes Type 2 - autoimmune condition - insulin-producing cells in pancreas attacked - lifetime insulin therapy required for survival - little production or resistance to insulin - often caused by obesity - treatment: lifestyle changes 4 BGLP Challenge: OhioT1DM dataset - 8 weeks of data of 12 patients - glycemia measured every 5 mins - insulin doses (bolus & basal) - self-reported meal times & estimates - physiological data - exercise, sleep, stress, work 5 BGLP Challenge Task: predict Glycemia 30 and 60 minutes into the future - 1st cohort available for training - 2nd cohort split into train & test set - per-patient evaluation on 2nd cohort test sets - RMSE & MAE 6 BGLP Challenge: Results 7 Forecasting as Supervised Problem - ffill - Gaussian filter smoothing - glycemia differencing 8 - multi-step forecast - 2h history - transfer learning Method Convolutional RNN Model ArchitecturePreprocessing - glycemia - insulin basal - insulin bolus - meal carbs Features 9 Convolutional RNN: Evaluation - interpolation - standardization - discard or 0-replace missing values 10 - non-personalized - single-step forecast - 30min history Method LSTM Attention Model ArchitecturePreprocessing - glycemia Features 11 LSTM Attention: Evaluation 12 LSTM Attention: Cont. - longer history leads to worse performance - long patterns are more personalized 13 Dicatil Project - data collection & domain knowledge - data storage & administration - data analysis & predictor 14 Dicatil Project: Task - 30-90 min glycemia forecasts - long Glycemia forecasts are useless - forecasts using planned meals, steps & activities - forecasts of morning glycemia 15 Dicatil+ Dataset - 12 patients (data quality & quantity varies) - dirty, outliers, data mixed from multiple sources - different glycemia sensor → irregular & longer intervals - no insulin, little sleep, no stress data - richer nutrition data Glycemia Nutrition Physical Activity 16 Dicatil+ Glycemia Data Volume 17 Infrastructure & Time Series Framework - all data & compute in Kubernetes 18 Dataset Pipeline 1. raw data extraction from DB 2. anomaly detection 3. resampling & aggregation 4. feature engineering - moving averages, moving sums, datetime features 5. dataset file & HTML report 19 Dataset Pipeline: Anomaly Detection 1. smooth data using a filter 2. fit spline & compute distances to raw data points 3. fit IsolationForest & predict outliers 20 ML Pipeline 1. train-validation-test split 2. method-specific data preprocessing & windowing - missing values, standardization / normalization / scaling, … 3. training (k times) - transfer learning, sampling strategy 4. evaluation 5. explainability 21 Dataset Predictability OhioT1DM Dicatil+ 22 CRNN Experiments 23 Single vs Multi Horizon Models - one model per horizon - slightly better performance? - model capacity focused - classic approach & metrics - single model - more coherent forecast - model capacity divided - specialized metrics 24 Predictions Example - 25 Model Personalization - improves performance in general - short-term patterns ~ more general - long-term patterns ~ more personalized 26 XAI: Permutation Feature Importance - different patients react to different features 27 Future plans - ensembles - window sampling strategies - Time2Vec - additional xAI methods (e.g. Shap) - morning Glycemia forecasts (need more data) - better feature engineering - XGBoost, LightGBM 28 Sources - https://cgi.csc.liv.ac.uk/~frans/PostScriptFiles/bglp_final_2020.pdf - https://arxiv.org/pdf/1807.03043.pdf - http://smarthealth.cs.ohio.edu/bglp/bglp-results.html - https://is.muni.cz/auth/th/j0gda/Thesis.pdf