Mining temporal data
Samuel Gazda, Dominik Macko, Šárka Ščavnická and Katarína Švecová
Faculty of Informatics, Masaryk University, Brno
1. Abstract
Data can be represented in various formats, but the storage of temporal data provided researchers with the possibility of its chronological organisation.
Mining the temporal data can bring us an important insight into real-life phenomena. By using machine learning models, we can find patterns in data
that would be non-trivial to identify otherwise. Information provided by this can help us classify the data, find interesting properties of sequences, or
even carry out predictions. Time series prediction is one of the most interesting, as reducing the future uncertainty is valuable in areas such as stock
exchange or infrastructure load.[1] In this review, we explore the possibilities in temporal data exploration. We describe this type of data, explore several
parametric and non-parametric models and last but not least we introduce libraries used in the time series mining.
2. Data Characteristics
Temporal data, often called time series, represents
chronologically ordered observations of a
variable. When there is more than one variable,
we consider it to be multivariate. Time series
can contain many patterns, like trend – longterm
decrease or increase, seasonality – cyclic
changes in constant intervals, and residue –
short-term fluctuations.
4. LSTM
First introduced in 1997, Long short-term memory is a deep-learning recurrent neural network
used to learn and predict sequential data. Each LSTM cell has an input, an output gate, and a forget
gate to include current and previous states in calculations. Having these extra gates compared to
a classic recurrent network helps to solve the vanishing gradient problem. For effective calculation
with matrices, several transformer models are used specifically for time series computation, such as
probabilistic Informer or deterministic Query Selector. Although LSTM is often outperformed by
other models on smaller datasets, to this day, it is used as one of the most common solutions for
time series, as the whole model or a part of it. [2, 3]
5. SCI-Net
Sample convolutional and interaction network (SCInet) is a special neural network architecture
for time series forecasting. Its hierarchical structure iteratively extracts and exchanges information
at different temporal resolutions and learns an effective representation for the predictability.
SCInet is composed of SCIblocks, which downsamples input data into two sub-sequences using
distinct convolutional filters and then perform interactive learning between the two convolutional
features. Experiments on real-world datasets show that the SCInet achieved on average more than
40% relative improvement compared to the contemporary state of the art approaches. [4]
6. SARIMA
Seasonal Autoregressive Integrated Moving Average (SARIMA) is an extension of ARIMA
that supports univariate time series data with the seasonal component. The results show that
SARIMA is the only statistical method able to outperform (without a statistical difference) these
machine learning algorithms: ANN, SVM, and kNN-TSPI. [5]
In the time series prediction is very important to search for the best parameter setting to fit a
model according to a dataset. The main parameter estimation methods are Holdout validation,
Cross-validation and Box-Jenkins methods. Parameters for SARIMA can be defined using the
Box-Jenkins method, which is also minimizing the Akaike Information Criterion (AIC). [5]
3. Libraries
Because temporal data is very specific, there are
many Python libraries intended to be used for it.
Tsfresh is a library which can be used to automatically
extract various features. Moreover,
statistical models and tests for time series are
available in statsmodels. There are also libraries
with machine learning models, like sktime
and darts, which have a syntax similar
to scikit-learn. Furthermore, libraries such as
PyTorch Forecasting and tsai, which is built
on top of PyTorch and fastai, contain various
deep learning models useful for time series. AutoML
can also be leveraged with temporal data
by using specific libraries like AutoTS, AtsPy,
and PyCaret. Additionally, Kats, a library by
Facebook, intends to be a lightweight framework
for a complete solution of time series analysis
such as forecasting, detection, feature engineering,
and even utilities like time series simula-
tion.
7. References
[1] Piccialli Mancuso and Sudoso. A machine
learning approach for forecasting hierarchical
time series. Expert Systems with Applications,
182:115102, November 2021.
[2] Cummins Gers, Schmidhuber. Learning to forget:
continual prediction with lstm. 2:850–855
vol.2, 1999.
[3] Yong Yu, Xiaosheng Si, Changhua Hu, and
Jianxun Zhang. A review of recurrent neural
networks: Lstm cells and network architectures.
Neural Computation, 31(7):1235 – 1270, 2019.
[4] Liu Et Al. Time series is a special sequence:
Forecasting with sample convolution and interaction.
2021.
[5] Antonio Rafael Sabino Parmezan, Vinicius M.A.
Souza, and Gustavo Batista. Evaluation of statistical
and machine learning models for time series
prediction: Identifying the state-of-the-art
and the best conditions for the use of each model.
Information Sciences, 484:302–337, 2019.