| Title: | Time Series Prediction with Integrated Tuning |
|---|---|
| Description: | Time series prediction is a critical task in data analysis, requiring not only the selection of appropriate models, but also suitable data preprocessing and tuning strategies. TSPredIT (Time Series Prediction with Integrated Tuning) is a framework that provides a seamless integration of data preprocessing, decomposition, model training, hyperparameter optimization, and evaluation. Unlike other frameworks, TSPredIT emphasizes the co-optimization of both preprocessing and modeling steps, improving predictive performance. It supports a variety of statistical and machine learning models, filtering techniques, outlier detection, data augmentation, and ensemble strategies. More information is available in Salles et al. <doi:10.1007/978-3-662-68014-8_2>. |
| Authors: | Eduardo Ogasawara [aut, ths, cre] (ORCID: <https://orcid.org/0000-0002-0466-0626>), Cristiane Gea [aut], Diego Carvalho [ctb], Diogo Santos [aut], Arthur Garcia [aut], Eduardo Bezerra [ctb], Esther Pacitti [ctb], Fabio Porto [ctb], Fernando Alexandrino [aut], Rebecca Salles [aut], Vitoria Birindiba [aut], CEFET/RJ [cph] |
| Maintainer: | Eduardo Ogasawara <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 2.0.707 |
| Built: | 2026-05-22 19:09:05 UTC |
| Source: | https://github.com/cefet-rj-dal/tspredit |
Extracts a subset of a time series object based on specified rows and columns. The function allows for flexible indexing and subsetting of time series data.
## S3 method for class 'ts_data' x[i, j, drop = FALSE]## S3 method for class 'ts_data' x[i, j, drop = FALSE]
x |
|
i |
row i or linear index when a single subscript is supplied |
j |
column j |
drop |
Ignored. |
A new ts_data object with preserved metadata and column names.
data(tsd) data10 <- ts_data(tsd$y, 10) ts_head(data10) #single line data10[12,] #range of lines data10[12:13,] #single column data10[,1] #range of columns data10[,1:2] #range of rows and columns data10[12:13,1:2] #single line and a range of columns data10[12,1:2] #range of lines and a single column data10[12:13,1] #single observation data10[12,1]data(tsd) data10 <- ts_data(tsd$y, 10) ts_head(data10) #single line data10[12,] #range of lines data10[12:13,] #single column data10[,1] #range of columns data10[,1:2] #range of rows and columns data10[12:13,1:2] #single line and a range of columns data10[12,1:2] #range of lines and a single column data10[12:13,1] #single observation data10[12,1]
ts_data
Convert a compatible dataset to a ts_data object by setting
column names, class, and the sw attribute consistently.
adjust_ts_data(data)adjust_ts_data(data)
data |
Matrix or data.frame to adjust. |
An adjusted ts_data.
ts_data_mv
Restore the multivariate time-series metadata after subsetting or data manipulation.
adjust_ts_data_mv( data, y, x = NULL, sw = 1, variables = NULL, lags = NULL, representation = c("aligned", "windowed") )adjust_ts_data_mv( data, y, x = NULL, sw = 1, variables = NULL, lags = NULL, representation = c("aligned", "windowed") )
data |
Matrix or data.frame. |
y |
Character scalar. Target variable name. |
x |
Character vector. Auxiliary variable names. |
sw |
Integer. Temporal width of the representation. |
variables |
Character vector. Variables represented in the object. |
lags |
Named list of lag positions per variable. |
representation |
Character. Either |
This helper mirrors adjust_ts_data() from the univariate workflow. It
preserves whether the multivariate object is aligned (sw = 1) or lagged
(sw > 1).
A ts_data_mv object.
Bioenergy data from FAOSTAT. Data Type: Bioenergy consumption and production. Category: Environment. Creation Date 2024.
data(bioenergy)data(bioenergy)
A list of time series.
Series are named as <country>_<bio_consumption|bio_production> and contain annual values.
FAO 2024. FAOSTAT Bioenergy, FAO, Rome, Italy. ; United Nations Statistics Division (UNSD), 2011; International Recommendations for Energy Statistics (IRES).
# Load bioenergy list and plot one series data(bioenergy) # bioenergy <- loadfulldata(bioenergy) series <- bioenergy[[1]] ts.plot(series, ylab = "TJ", xlab = "Year", main = "Bioenergy example")# Load bioenergy list and plot one series data(bioenergy) # bioenergy <- loadfulldata(bioenergy) series <- bioenergy[[1]] ts.plot(series, ylab = "TJ", xlab = "Year", main = "Bioenergy example")
Univariate time series from the CATS (Competition on Artificial Time Series) benchmark. Data Type: Artificial time series with missing blocks. Category: Benchmark. Observations: 5,000 (4,900 known, 100 missing). The dataset contains five non-consecutive blocks of 20 missing values each. Competitors were asked to predict these 100 unknown points, and performance was evaluated using MSE (E1 for all unknowns and E2 for the first 80 points).
data(CATS)data(CATS)
A data frame with five columns and 980 rows. Each column represents a known segment of the time series.
The CATS benchmark contains artificial series with five nonconsecutive missing blocks of 20 points each. Models must impute or forecast the missing blocks; evaluation typically uses MSE over all missing points.
Lendasse, A., Oja, E., Simula, O., Verleysen, M., et al. (2004). Time Series Prediction Competition: The CATS Benchmark. In IJCNN'2004 - International Joint Conference on Neural Networks. Lendasse, A., Oja, E., Simula, O., Verleysen, M. (2007). Time Series Prediction Competition: The CATS Benchmark. Neurocomputing, 70(13-15), 2325–2329.
# Load CATS dataset data(CATS) # CATS <- loadfulldata(CATS)# Load CATS dataset data(CATS) # CATS <- loadfulldata(CATS)
Statistics of surface temperature anomalies on land, based on NASA-GISS GISTEMP data. Data Type: Temperature Anomalies. Category: Environment. Creation Date 2024.
data(climate)data(climate)
A list of time series.
FAO, 2024. FAOSTAT Land, Inputs and Sustainability; Climate Change Indicators; Temperature change on land. GISTEMP Team, 2024: GISS Surface Temperature Analysis. NASA Goddard Institute for Space Studies. Hansen, J. et al., 1981–2019: Multiple foundational studies on global temperature analysis.
# Load climate list and plot one series data(climate) # climate <- loadfulldata(climate) series <- climate[[1]] ts.plot(series, ylab = "Temperature change (°C)", xlab = "Year", main = "Temperature change on land")# Load climate list and plot one series data(climate) # climate <- loadfulldata(climate) series <- climate[[1]] ts.plot(series, ylab = "Temperature change (°C)", xlab = "Year", main = "Temperature change on land")
Generic for fitting a time series model.
Descendants should implement do_fit.<class>.
do_fit(obj, x, y = NULL)do_fit(obj, x, y = NULL)
obj |
Model object to be fitted. |
x |
Matrix or data.frame with input features. |
y |
Vector or matrix with target values. |
A fitted object (same class as obj).
Generic for predicting with a fitted time series model.
Descendants should implement do_predict.<class>.
do_predict(obj, x)do_predict(obj, x)
obj |
Fitted model object. |
x |
Matrix or data.frame with input features to predict. |
Numeric vector with predicted values.
National and global estimates of greenhouse gas (GHG) emissions. Data Type: Greenhouse gas emissions. Category: Environment. Creation Date 2023.
data(emissions)data(emissions)
A list of time series.
FAO, 2023. FAOSTAT Climate Change: Agrifood systems emissions, Emissions Totals. IPCC Guidelines and Reports: 1996, 2000, 2006, 2014, 2019. PRIMAP-hist dataset v2.4.2: Gütschow et al., 2023.
# Load emissions list and plot one series data(emissions) # emissions <- loadfulldata(emissions) series <- emissions[[1]] ts.plot(series, ylab = "kt CO2e", xlab = "Year", main = "Emissions example (CH4/N2O)")# Load emissions list and plot one series data(emissions) # emissions <- loadfulldata(emissions) series <- emissions[[1]] ts.plot(series, ylab = "kt CO2e", xlab = "Year", main = "Emissions example (CH4/N2O)")
Half-hourly electrical load time series from the EUNITE forecasting competition. Data Type: Electrical load measurements. Category: Benchmark. Observations: 730 days, 48 intervals per day. This dataset contains univariate time series with half-hour resolution covering 1997–1998. It was used to forecast daily maximum loads in January 1999. Competitors were evaluated using MAPE and MAXIMAL prediction errors. Regressors such as temperature and calendar variables were also provided.
data(EUNITE.Loads)data(EUNITE.Loads)
A data frame with 730 rows and 48 numeric columns. Each column corresponds to one half-hour interval, from 00:00 to 24:00.
The EUNITE competition focused on forecasting maximum daily electrical loads for January 1999 using half-hourly load profiles and auxiliary regressors. Series are provided in a wide format with 48 half-hour intervals as columns.
EUNITE Competition 2001 dataset (original competition website currently unavailable).
Chen, B.-J., Chang, M.-W., & Lin, C.-J. (2004). Load forecasting using support vector machines: a study on EUNITE competition 2001. IEEE Transactions on Power Systems, 19(4), 1821-1830.
# Load the dataset data(EUNITE.Loads) # EUNITE.Loads <- loadfulldata(EUNITE.Loads) # Inspect the first few half-hourly columns (00:00 to 24:00 by 30 minutes) head(names(EUNITE.Loads)) # Plot a single half-hour interval across days ts.plot(EUNITE.Loads[["X24.00"]], ylab = "Load (MW)", xlab = "Day", main = "EUNITE: Half-hour interval 24:00")# Load the dataset data(EUNITE.Loads) # EUNITE.Loads <- loadfulldata(EUNITE.Loads) # Inspect the first few half-hourly columns (00:00 to 24:00 by 30 minutes) head(names(EUNITE.Loads)) # Plot a single half-hour interval across days ts.plot(EUNITE.Loads[["X24.00"]], ylab = "Load (MW)", xlab = "Day", main = "EUNITE: Half-hour interval 24:00")
Daily holiday and weekday indicators used as regressors in the EUNITE load forecasting competition.
Data Type: Categorical indicators. Category: Benchmark. Observations: 730 (1997–1998).
This dataset provides binary holiday flags and weekday identifiers to support the prediction of daily maximum electrical loads.
It complements the datasets EUNITE.Loads and EUNITE.Temp.
A test set with corresponding regressors for January 1999 is available.
data(EUNITE.Reg)data(EUNITE.Reg)
A data frame with 730 rows and 3 columns:
Binary indicator (1 = holiday, 0 = regular day).
Integer encoding (1 = Sunday, ..., 7 = Saturday).
Split into train and test
Regressors complement the load profiles by providing daily-level covariates (e.g., holidays and weekdays), which are known to improve forecast accuracy when used with temperature.
EUNITE Competition 2001 dataset (original competition website currently unavailable).
Chen, B.-J., Chang, M.-W., & Lin, C.-J. (2004). Load forecasting using support vector machines: a study on EUNITE competition 2001. IEEE Transactions on Power Systems, 19(4), 1821-1830.
# Load EUNITE regressors data(EUNITE.Reg) # EUNITE.Reg <- loadfulldata(EUNITE.Reg) # Peek at the first rows head(EUNITE.Reg)# Load EUNITE regressors data(EUNITE.Reg) # EUNITE.Reg <- loadfulldata(EUNITE.Reg) # Peek at the first rows head(EUNITE.Reg)
Average daily temperatures collected for the EUNITE load-forecasting competition. Data Type: Meteorological measurements. Category: Benchmark. Observations: 1,461. The series covers 1995-1998 and was used as an exogenous regressor for predicting maximum daily electrical loads. Participants were asked to forecast January 1999 values.
data(EUNITE.Temp)data(EUNITE.Temp)
A data frame with one numeric column and 1,461 rows (average daily temperature).
Daily temperatures are commonly used as exogenous variables for load forecasting due to strong weather dependence. This series aligns with the period covered by EUNITE.Loads.
EUNITE Competition 2001 dataset (original competition website currently unavailable).
Chen, B.-J., Chang, M.-W., & Lin, C.-J. (2004). Load forecasting using support vector machines: a study on EUNITE competition 2001. IEEE Transactions on Power Systems, 19(4), 1821-1830.
# Load daily temperature series data(EUNITE.Temp) # EUNITE.Temp <- loadfulldata(EUNITE.Temp) # Plot temperature over time ts.plot(EUNITE.Temp$Temperature, ylab = "Temperature (°C)", xlab = "Day", main = "EUNITE: Daily Temperature")# Load daily temperature series data(EUNITE.Temp) # EUNITE.Temp <- loadfulldata(EUNITE.Temp) # Plot temperature over time ts.plot(EUNITE.Temp$Temperature, ylab = "Temperature (°C)", xlab = "Day", main = "EUNITE: Daily Temperature")
Statistics on agricultural use, production, and trade of chemical and mineral fertilizers. Data Type: Fertilizers use, production and trade. Category: Environment. Creation Date 2024.
data(fertilizers)data(fertilizers)
A list of time series.
FAOSTAT Fertilizers by Nutrient.
FAO, 2024. FAOSTAT: Fertilizers by Nutrient. FAO & UNSD (2017). System of Environmental-Economic Accounting for Agriculture, Forestry and Fisheries (SEEA AFF). UNSD (2017). Framework for the Development of Environment Statistics (FDES).
# Load fertilizers list and plot one series data(fertilizers) # fertilizers <- loadfulldata(fertilizers) series <- fertilizers[[1]] ts.plot(series, ylab = "tonnes", xlab = "Year", main = "Fertilizers example")# Load fertilizers list and plot one series data(fertilizers) # fertilizers <- loadfulldata(fertilizers) series <- fertilizers[[1]] ts.plot(series, ylab = "tonnes", xlab = "Year", main = "Fertilizers example")
Summary of global and regional trends in GDP and agriculture value. Data Type: macroeconomic indicators. Category: Economy. Creation Date 2024.
data(gdp)data(gdp)
list of time series.
FAOSTAT Macro Indicators Database
FAO. 2024. Gross domestic product and agriculture value added 2013–2022 – Global and regional trends. FAOSTAT Analytical Briefs, No. 85. Rome. doi:10.4060/cd0763en
# Load GDP list and plot one series data(gdp) # gdp <- loadfulldata(gdp) series <- gdp[[1]] ts.plot(series, ylab = "US$", xlab = "Year", main = "GDP example")# Load GDP list and plot one series data(gdp) # gdp <- loadfulldata(gdp) series <- gdp[[1]] ts.plot(series, ylab = "US$", xlab = "Year", main = "GDP example")
Daily economic time series from Ipea (Institute for Applied Economic Research, Brazil).
Data Type: Macroeconomic indicators. Category: Public data. Observations: 901 to 8,154 per series, 12 series.
This dataset contains the most requested time series provided by Ipea with daily frequency, including exchange rates, stock index, interest rates, imports and exports.
The series span from 1962 to September 2017. Missing values were removed using na.omit.
The last 30 observations are for test set.
data(ipeadata.d)data(ipeadata.d)
A data frame with up to 8,154 rows and 12 columns. Each column corresponds to a different univariate daily time series.
Contains daily macroeconomic indicators frequently used in empirical forecasting. Series are cleaned with na.omit.
Ipea - Ipeadata Portal, section "Most Requested Series", filtered by frequency "Daily".
Ipea (2017). Ipeadata – Macroeconomic and Regional Data. Technical Report. http://www.ipeadata.gov.br
# Load Ipea daily dataset and plot the first series data(ipeadata.d) # ipeadata.d <- loadfulldata(ipeadata.d) series <- ipeadata.d[[1]] ts.plot(series, ylab = "Value", xlab = "Day", main = "Ipea daily example")# Load Ipea daily dataset and plot the first series data(ipeadata.d) # ipeadata.d <- loadfulldata(ipeadata.d) series <- ipeadata.d[[1]] ts.plot(series, ylab = "Value", xlab = "Day", main = "Ipea daily example")
Monthly economic time series from Ipea (Institute for Applied Economic Research, Brazil).
Data Type: Macroeconomic indicators. Category: Public data. Observations: 156 to 1019 per series, 23 series.
This dataset contains the most requested time series provided by Ipea, including exchange rates, inflation indices, unemployment rates, interest rates, minimum wage, and GDP.
The series span from 1930 to September 2017. Missing values were removed using na.omit.
The last 12 observations are for testing set.
data(ipeadata.m)data(ipeadata.m)
A data frame with up to 1019 rows and 23 columns. Each column corresponds to a different univariate monthly time series.
Contains monthly macroeconomic indicators; the last 12 observations are intended as a test set.
Ipea - Ipeadata Portal, section "Most Requested Series", filtered by frequency "Monthly".
Ipea (2017). Ipeadata – Macroeconomic and Regional Data. Technical Report. http://www.ipeadata.gov.br
# Load Ipea monthly dataset and plot the first series data(ipeadata.m) # ipeadata.m <- loadfulldata(ipeadata.m) series <- ipeadata.m[[1]] ts.plot(series, ylab = "Value", xlab = "Month", main = "Ipea monthly example")# Load Ipea monthly dataset and plot the first series data(ipeadata.m) # ipeadata.m <- loadfulldata(ipeadata.m) series <- ipeadata.m[[1]] ts.plot(series, ylab = "Value", xlab = "Month", main = "Ipea monthly example")
Downloads and loads the full .RData object referenced by attr(x, "url")
from a mini dataset object loaded from data/.
loadfulldata(x)loadfulldata(x)
x |
A mini dataset object that contains |
The full dataset object loaded from the remote .RData file.
Time series data from the first Makridakis forecasting competition (M1), held in 1982. Data Type: Forecasting benchmark dataset. Category: Forecasting. Creation Date: 1982.
data(m1)data(m1)
A list of dataframes containing time series.
Consolidated list with frequencies as keys (e.g., monthly, quarterly, yearly). Each element is a list of series.
See Makridakis et al. (1982) for competition design and evaluation.
The accuracy of extrapolation (time series) methods: Results of a forecasting competition
Makridakis et al. (1982). The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journal of Forecasting, 1(2), 111–153.
# Load consolidated M1 list data(m1) # m1 <- loadfulldata(m1) # List available frequency keys names(m1) # Plot one series from a frequency bucket series <- m1$monthly[[1]] ts.plot(series, main = "M1 monthly series")# Load consolidated M1 list data(m1) # m1 <- loadfulldata(m1) # List available frequency keys names(m1) # Plot one series from a frequency bucket series <- m1$monthly[[1]] ts.plot(series, main = "M1 monthly series")
Time series data from the third Makridakis forecasting competition (M3), held in 2000. Data Type: Forecasting benchmark dataset. Category: Forecasting. Creation Date: 2000.
data(m3)data(m3)
A list of lists containing time series.
Consolidated list keyed by frequency (e.g., monthly, other, quarterly, yearly). Each holds a list of numeric vectors.
See Makridakis & Hibon (2000) for competition results and implications.
doi:10.1016/S0169-2070(00)00057-1
Makridakis and Hibon (2000). The M3-Competition: Results, conclusions and implications. International Journal of Forecasting, 16(4), 451–476.
# Load consolidated M3 list and plot one monthly series data(m3) # m3 <- loadfulldata(m3) series <- m3$monthly$M1 ts.plot(series, main = "M3 monthly series: M1")# Load consolidated M3 list and plot one monthly series data(m3) # m3 <- loadfulldata(m3) series <- m3$monthly$M1 ts.plot(series, main = "M3 monthly series: M1")
Time series data from the fourth Makridakis forecasting competition (M4), held in 2018. Data Type: Forecasting benchmark dataset. Category: Forecasting. Creation Date: 2018.
data(m4)data(m4)
A list of lists containing time series.
Consolidated list keyed by frequency (e.g., daily, hourly, monthly, ...). Each holds a list of numeric vectors.
See Makridakis et al. (2020) for an overview of M4 findings.
Makridakis et al. (2020). The M4 Competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 36(1), 54–74.
# Load consolidated M4 list and plot one available series data(m4) # m4 <- loadfulldata(m4) freq_name <- names(m4)[1] series_name <- names(m4[[freq_name]])[1] series <- m4[[freq_name]][[series_name]] ts.plot(series, main = paste("M4", freq_name, "series:", series_name))# Load consolidated M4 list and plot one available series data(m4) # m4 <- loadfulldata(m4) freq_name <- names(m4)[1] series_name <- names(m4[[freq_name]])[1] series <- m4[[freq_name]][[series_name]] ts.plot(series, main = paste("M4", freq_name, "series:", series_name))
Compute mean squared error (MSE) between actual and predicted values.
MSE.ts(actual, prediction)MSE.ts(actual, prediction)
actual |
Numeric vector of observed values. |
prediction |
Numeric vector of predicted values. |
MSE = mean((actual - prediction)^2).
Numeric scalar with the MSE.
Monthly time series from the NN3 forecasting competition. Data Type: Empirical business time series. Category: Benchmark. Observations: 50 to 126 per series, 111 series. The dataset contains 111 univariate monthly time series from real business processes. Each series has between 50 and 126 observations. Participants were asked to forecast the next 18 values, and performance was evaluated using the mean sMAPE across all series.
data(NN3)data(NN3)
A data frame with up to 126 rows and 111 columns. Each column corresponds to a different univariate monthly time series.
NN3 comprises monthly business time series with varying lengths. Forecast accuracy is typically evaluated using sMAPE across a fixed holdout horizon.
NN3 Time Series Forecasting Competition
Crone, S.F., Hibon, M., & Nikolopoulos, K. (2011). Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction. International Journal of Forecasting, 27(3), 635–660. NN3 Competition (2007). http://www.neural-forecasting-competition.com/NN3/index.htm
# Load NN3 dataset data(NN3) # NN3 <- loadfulldata(NN3) # Select one series by name and plot series <- NN3[["NN3_111"]] ts.plot(series, ylab = "Value", xlab = "Month", main = "NN3 example series")# Load NN3 dataset data(NN3) # NN3 <- loadfulldata(NN3) # Select one series by name and plot series <- NN3[["NN3_111"]] ts.plot(series, ylab = "Value", xlab = "Month", main = "NN3 example series")
Daily time series from the NN5 forecasting competition. Data Type: ATM withdrawal amounts. Category: Benchmark. Observations: 735 per series, 111 series. The dataset contains 111 univariate time series representing daily cash withdrawals from ATMs in England. Each series includes 735 observations and may contain missing values and multiple seasonal patterns. Participants were asked to forecast the next 56 values for each series, and performance was evaluated using the mean sMAPE across all series.
data(NN5)data(NN5)
A data frame with 735 rows and 111 columns. Each column corresponds to a different univariate daily time series.
NN5 consists of daily ATM withdrawal amounts with complex multiple seasonalities and occasional missing values. Forecasts are evaluated via sMAPE on a 56-day horizon.
NN5 Time Series Forecasting Competition
Crone, S.F. (2008). Results of the NN5 Time Series Forecasting Competition. IEEE WCCI 2008, Hong Kong. NN5 Competition (2008). http://www.neural-forecasting-competition.com/NN5/index.htm
# Load NN5 dataset data(NN5) # NN5 <- loadfulldata(NN5) # Select one series and plot series <- NN5[["NN5.111"]] ts.plot(series, ylab = "Withdrawals", xlab = "Day", main = "NN5 example series")# Load NN5 dataset data(NN5) # NN5 <- loadfulldata(NN5) # Select one series and plot series <- NN5[["NN5.111"]] ts.plot(series, ylab = "Withdrawals", xlab = "Day", main = "NN5 example series")
Statistics on the use of major pesticide groups and relevant chemical families. Data Type: pesticides use. Category: Environments. Creation Date 2024.
data(pesticides)data(pesticides)
A list of time series.
Series are named by country with _pesticides suffix; values are annual usage amounts.
FAO. 2024. FAOSTAT: Pesticides Use. RP_e_README_Domain_Information_2024. FAOSTAT Pesticides Use Database
# Load pesticides list and plot one series data(pesticides) # pesticides <- loadfulldata(pesticides) series <- pesticides[[1]] ts.plot(series, ylab = "tonnes", xlab = "Year", main = "Pesticides example")# Load pesticides list and plot one series data(pesticides) # pesticides <- loadfulldata(pesticides) series <- pesticides[[1]] ts.plot(series, ylab = "tonnes", xlab = "Year", main = "Pesticides example")
Plot observed and forecast trajectories for the target series and auxiliary variables returned by the multivariate workflow.
plot_ts_pred_mv( history, future = NULL, prediction, variable = NULL, label_x = "", label_y = "Value", color = "black", color_adjust = "blue", color_prediction = "green" )plot_ts_pred_mv( history, future = NULL, prediction, variable = NULL, label_x = "", label_y = "Value", color = "black", color_adjust = "blue", color_prediction = "green" )
history |
A |
future |
Optional |
prediction |
Multivariate forecast returned by |
variable |
Optional character scalar. Name of a single variable to plot. When omitted, plots are returned for every variable in the prediction. |
label_x |
x-axis label. |
label_y |
y-axis label prefix. The variable name is appended when several plots are returned. |
color |
observed series color. |
color_adjust |
history color. |
color_prediction |
prediction color. |
plot_ts_pred_mv() extends the visual logic already used in the univariate
examples. It reuses daltoolbox::plot_ts_pred() variable by variable and
returns either:
one plot when variable is provided
a named list of plots when variable = NULL
The intended workflow is:
fit a ts_regsw_mv model
call predict(..., return_all = TRUE)
compare the predicted paths against the held-out aligned multivariate data
A single ggplot object or a named list of ggplot objects.
library(daltoolbox) data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_regsw_mv( model_y = ts_mv_spec(ts_mlp(ts_norm_gminmax(), input_size = 4), variables = c("y", "x1", "x2")), models_x = list( x1 = ts_mv_spec(ts_arima()), x2 = ts_mv_spec(ts_rf(ts_norm_gminmax(), input_size = 4, ntree = 10), variables = c("x2", "y")) ), window_size = 5 ) model <- daltoolbox::fit(model, samp$train) pred <- predict(model, steps_ahead = 5, return_all = TRUE) plots <- plot_ts_pred_mv(samp$train, samp$test, pred)library(daltoolbox) data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_regsw_mv( model_y = ts_mv_spec(ts_mlp(ts_norm_gminmax(), input_size = 4), variables = c("y", "x1", "x2")), models_x = list( x1 = ts_mv_spec(ts_arima()), x2 = ts_mv_spec(ts_rf(ts_norm_gminmax(), input_size = 4, ntree = 10), variables = c("x2", "y")) ), window_size = 5 ) model <- daltoolbox::fit(model, samp$train) pred <- predict(model, steps_ahead = 5, return_all = TRUE) plots <- plot_ts_pred_mv(samp$train, samp$test, pred)
Compute coefficient of determination (R-squared).
R2.ts(actual, prediction)R2.ts(actual, prediction)
actual |
Numeric vector of observed values. |
prediction |
Numeric vector of predicted values. |
R-squared is computed as 1 - SSE / SST, where SSE is the sum of
squared residuals and SST is the total sum of squares around the mean of
actual. If actual is constant, the statistic is undefined and NA_real_
is returned.
Interpretation:
R2 = 1 means perfect predictions.
R2 = 0 means the predictor is no better than always using mean(actual).
R2 < 0 means the predictions are worse than that mean baseline.
In forecasting, negative R2 values are common when the horizon is difficult
or when a recursive predictor accumulates error over several future steps.
Numeric scalar with R-squared.
Univariate time series A from the Santa Fe Time Series Competition. Data Type: Laser-generated nonlinear time series. Category: Benchmark. Observations: 1,100. This benchmark dataset consists of a low-dimensional nonlinear and stationary series recorded from a Far-Infrared-Laser in a chaotic regime. Competitors were asked to predict the last 100 observations, and performance was evaluated using NMSE.
data(SantaFe.A)data(SantaFe.A)
A data frame with one column and 1,100 rows, containing numeric time series values.
Series A is a classic nonlinear laser dataset used to assess forecasting methods under chaotic dynamics.
Santa Fe Time Series Competition dataset (original archive URL unavailable).
Weigend, A.S. (1993). Time Series Prediction: Forecasting the Future and Understanding the Past. Reading, MA: Westview Press.
# Load Santa Fe A series and plot data(SantaFe.A) # SantaFe.A <- loadfulldata(SantaFe.A) series <- SantaFe.A$V1 ts.plot(series, ylab = "Value", xlab = "Index", main = "Santa Fe A")# Load Santa Fe A series and plot data(SantaFe.A) # SantaFe.A <- loadfulldata(SantaFe.A) series <- SantaFe.A$V1 ts.plot(series, ylab = "Value", xlab = "Index", main = "Santa Fe A")
Univariate time series D from the Santa Fe Time Series Competition. Data Type: Simulated nonlinear time series. Category: Benchmark. Observations: 100,500. This benchmark dataset is composed of a four-dimensional nonlinear and non-stationary series. Competitors were asked to predict the last 500 observations, and performance was evaluated using NMSE.
data(SantaFe.D)data(SantaFe.D)
A data frame with one column and 100,500 rows, containing numeric time series values.
Santa Fe Time Series Competition dataset (original archive URL unavailable).
Weigend, A.S. (1993). Time Series Prediction: Forecasting the Future and Understanding the Past. Reading, MA: Westview Press.
# Load Santa Fe D series and plot a subset data(SantaFe.D) # SantaFe.D <- loadfulldata(SantaFe.D) series <- SantaFe.D$V1 ts.plot(series[1:2000], ylab = "Value", xlab = "Index", main = "Santa Fe D (first 2000)")# Load Santa Fe D series and plot a subset data(SantaFe.D) # SantaFe.D <- loadfulldata(SantaFe.D) series <- SantaFe.D$V1 ts.plot(series[1:2000], ylab = "Value", xlab = "Index", main = "Santa Fe D (first 2000)")
Identifies the optimal hyperparameters by minimizing the error from a dataset of hyperparameters. The function selects the hyperparameter configuration that results in the lowest average error. It wraps the dplyr library.
## S3 method for class 'ts_tune' select_hyper(obj, hyperparameters)## S3 method for class 'ts_tune' select_hyper(obj, hyperparameters)
obj |
a |
hyperparameters |
hyperparameters dataset |
returns the optimized key number of hyperparameters
Compute symmetric mean absolute percent error (sMAPE).
sMAPE.ts(actual, prediction)sMAPE.ts(actual, prediction)
actual |
Numeric vector of observed values. |
prediction |
Numeric vector of predicted values. |
sMAPE = mean( |a - p| / ((|a| + |p|)/2) ), excluding zero denominators.
Numeric scalar with the sMAPE.
S. Makridakis and M. Hibon (2000). The M3-Competition: results, conclusions and implications. International Journal of Forecasting, 16(4).
Historical daily data for the 50 most traded stocks in B3 (IBOVESPA), including opening, high, low, and closing prices, as well as trading volume. Data Type: Financial Time Series. Category: Finance. Creation Date: 2025.
data(stocks)data(stocks)
A list of dataframes containing time series.
Each entry is a data frame with columns date, open, high, low, close, and volume.
B3 - Brasil, Bolsa, Balcão. 2025. Historical stock trading data. B3 Official Website
# Load stocks list and plot closing prices for a ticker (if present) data(stocks) # stocks <- loadfulldata(stocks) if ("VALE3" %in% names(stocks)) { series <- stocks$VALE3$close ts.plot(series, ylab = "Close", xlab = "Index", main = "VALE3 close price") }# Load stocks list and plot closing prices for a ticker (if present) data(stocks) # stocks <- loadfulldata(stocks) if ("VALE3" %in% names(stocks)) { series <- stocks$VALE3$close ts.plot(series, ylab = "Close", xlab = "Index", main = "VALE3 close price") }
Create a time series prediction object based on the AutoRegressive Integrated Moving Average (ARIMA) family.
This constructor sets up an S3 time series regressor that leverages the
forecast package to either automatically select orders via
auto.arima() or fit a user-specified (p, d, q) structure, and provide
one-step and multi-step forecasts.
ts_arima(p = NULL, d = NULL, q = NULL)ts_arima(p = NULL, d = NULL, q = NULL)
p |
Optional integer autoregressive order. Leave |
d |
Optional integer differencing order. Leave |
q |
Optional integer moving-average order. Leave |
ARIMA models combine autoregressive (AR), differencing (I), and
moving average (MA) components to model temporal dependence in a univariate
time series. The fit() method uses forecast::auto.arima() to select
orders using information criteria when p, d, and q are left as
NULL; otherwise it fits the user-specified order directly.
predict() supports both a single one-step-ahead over a horizon (rolling)
and direct multi-step forecasting.
Assumptions include (after differencing) approximate stationarity and homoskedastic residuals. Always inspect residual diagnostics for adequacy.
A ts_arima object (S3), which inherits from ts_reg.
G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung (2015). Time Series Analysis: Forecasting and Control. Wiley.
R. J. Hyndman and Y. Khandakar (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3), 1–22. doi:10.18637/jss.v027.i03
# Example: rolling-origin evaluation with multi-step prediction # Load package and dataset library(daltoolbox) library(tspredit) data(tsd) # 1) Wrap the raw vector as `ts_data` with `sw = 1` ts <- ts_data(tsd$y, 1) ts_head(ts, 3) # 2) Split into train/test using the last 5 observations as test samp <- ts_sample(ts, test_size = 5) # 3) Fit a user-specified ARIMA(5,0,0) model <- ts_arima(p = 5, d = 0, q = 0) model <- daltoolbox::fit(model, x = samp$train) # 4) Predict 5 steps ahead from the most recent observed point prediction <- predict(model, x = samp$test[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(samp$test) # 5) Evaluate forecast accuracy ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Example: rolling-origin evaluation with multi-step prediction # Load package and dataset library(daltoolbox) library(tspredit) data(tsd) # 1) Wrap the raw vector as `ts_data` with `sw = 1` ts <- ts_data(tsd$y, 1) ts_head(ts, 3) # 2) Split into train/test using the last 5 observations as test samp <- ts_sample(ts, test_size = 5) # 3) Fit a user-specified ARIMA(5,0,0) model <- ts_arima(p = 5, d = 0, q = 0) model <- daltoolbox::fit(model, x = samp$train) # 4) Predict 5 steps ahead from the most recent observed point prediction <- predict(model, x = samp$test[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(samp$test) # 5) Evaluate forecast accuracy ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Create a target-centered multivariate regressor based on ARIMA with external regressors.
ts_arimax(models_x = NULL, p = NULL, d = NULL, q = NULL)ts_arimax(models_x = NULL, p = NULL, d = NULL, q = NULL)
models_x |
Optional named list with one univariate model per auxiliary variable. |
p |
Optional integer autoregressive order. Leave |
d |
Optional integer differencing order. Leave |
q |
Optional integer moving-average order. Leave |
ts_arimax() is the singular multivariate counterpart of ts_arima().
The model keeps one target variable y as the main forecasting objective and
uses the aligned auxiliary variables x1, ..., xn as regressors through the
xreg mechanism of the forecast package.
This is the natural choice when the user thinks in the following way:
there is one main series y
the remaining variables help explain or anticipate y
the primary output is the future path of y
In other words, ts_arimax() is a target-centered multivariate model, not a
symmetric system model like ts_var().
For multi-step forecasting, future auxiliary values can be supplied directly
or generated by the auxiliary univariate models stored in models_x.
This makes ts_arimax() a natural member of the new ts_reg_mv branch:
the target forecast remains central
the aligned multivariate system is still available when return_all = TRUE
auxiliary series can be modeled separately when their future path is not known beforehand
The current implementation follows the same philosophy as ts_arima():
if p, d, and q are supplied, fit that specific order
otherwise use forecast::auto.arima() on the target with xreg
This keeps the singular multivariate branch consistent with the existing raw univariate branch of the package.
A ts_arimax object inheriting from ts_reg_mv.
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015). Time Series Analysis: Forecasting and Control. Wiley.
Hyndman RJ, Athanasopoulos G (2021). Forecasting: Principles and Practice. Third Edition. OTexts. https://otexts.com/fpp3/
data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_arimax(models_x = list(x1 = ts_arima(), x2 = ts_arima())) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 5)data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_arimax(models_x = list(x1 = ts_arima(), x2 = ts_arima())) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 5)
Bias the augmentation to emphasize more recent points in each window (recency awareness), increasing their contribution to the augmented sample.
ts_aug_awareness(factor = 1)ts_aug_awareness(factor = 1)
factor |
Numeric factor controlling the recency weighting. |
A ts_aug_awareness object.
Q. Wen et al. (2021). Time Series Data Augmentation for Deep Learning: A Survey. IJCAI Workshop on Time Series.
# Recency-aware augmentation over sliding windows # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to 10-lag sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply awareness augmentation (bias toward recent rows) augment <- ts_aug_awareness() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Recency-aware augmentation over sliding windows # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to 10-lag sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply awareness augmentation (bias toward recent rows) augment <- ts_aug_awareness() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Recency-aware augmentation that also progressively smooths noise before applying the weighting, producing cleaner augmented samples.
ts_aug_awaresmooth(factor = 1)ts_aug_awaresmooth(factor = 1)
factor |
Numeric factor controlling the recency weighting. |
A ts_aug_awaresmooth object.
Q. Wen et al. (2021). Time Series Data Augmentation for Deep Learning: A Survey. IJCAI Workshop on Time Series.
# Recency-aware augmentation with progressive smoothing # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to 10-lag sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply awareness+smooth augmentation and inspect result augment <- ts_aug_awaresmooth() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Recency-aware augmentation with progressive smoothing # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to 10-lag sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply awareness+smooth augmentation and inspect result augment <- ts_aug_awaresmooth() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Time series augmentation by mirroring sliding-window observations around their mean to increase diversity and reduce overfitting.
ts_aug_flip()ts_aug_flip()
This transformation preserves the window mean while flipping the deviations, effectively generating a symmetric variant of the local pattern.
A ts_aug_flip object.
Q. Wen et al. (2021). Time Series Data Augmentation for Deep Learning: A Survey. IJCAI Workshop on Time Series.
# Flip augmentation around the window mean # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply flip augmentation and inspect augmented windows augment <- ts_aug_flip() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Flip augmentation around the window mean # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply flip augmentation and inspect augmented windows augment <- ts_aug_flip() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Time series augmentation by adding low-amplitude random noise to each point to increase robustness and reduce overfitting.
ts_aug_jitter()ts_aug_jitter()
Noise scale is estimated from within-window deviations.
A ts_aug_jitter object.
J. T. Um et al. (2017). Data augmentation of wearable sensor data for Parkinson’s disease monitoring using convolutional neural networks.
Q. Wen et al. (2021). Time Series Data Augmentation for Deep Learning: A Survey. IJCAI Workshop on Time Series.
# Jitter augmentation with noise estimated from windows # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply jitter (adds small noise; keeps target column unchanged) augment <- ts_aug_jitter() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Jitter augmentation with noise estimated from windows # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply jitter (adds small noise; keeps target column unchanged) augment <- ts_aug_jitter() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Identity augmentation that returns the original windows while preserving the augmentation interface and indices.
ts_aug_none()ts_aug_none()
A ts_aug_none object.
# Identity augmentation (no changes to windows) # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # No augmentation; returns the same windows with indices preserved augment <- ts_aug_none() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Identity augmentation (no changes to windows) # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # No augmentation; returns the same windows with indices preserved augment <- ts_aug_none() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Decrease within-window deviation magnitude by a scaling factor to generate lower-variance variants while preserving the mean.
ts_aug_shrink(scale_factor = 0.8)ts_aug_shrink(scale_factor = 0.8)
scale_factor |
Numeric factor used to scale deviations. |
A ts_aug_shrink object.
Q. Wen et al. (2021). Time Series Data Augmentation for Deep Learning: A Survey. IJCAI Workshop on Time Series.
# Shrink augmentation reduces within-window deviations # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply shrink augmentation and inspect augmented windows augment <- ts_aug_shrink() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Shrink augmentation reduces within-window deviations # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply shrink augmentation and inspect augmented windows augment <- ts_aug_shrink() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Increase within-window deviation magnitude by a scaling factor to produce higher-variance variants.
ts_aug_stretch(scale_factor = 1.2)ts_aug_stretch(scale_factor = 1.2)
scale_factor |
Numeric factor used to scale deviations. |
A ts_aug_stretch object.
Q. Wen et al. (2021). Time Series Data Augmentation for Deep Learning: A Survey. IJCAI Workshop on Time Series.
# Stretch augmentation increases within-window deviations # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply stretch augmentation and inspect augmented windows augment <- ts_aug_stretch() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Stretch augmentation increases within-window deviations # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply stretch augmentation and inspect augmented windows augment <- ts_aug_stretch() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Generate augmented windows by selectively replacing lag terms with older lagged values, creating plausible alternative trajectories.
ts_aug_wormhole()ts_aug_wormhole()
This combinatorial replacement preserves overall scale while introducing temporal permutations of lag content.
A ts_aug_wormhole object.
Q. Wen et al. (2021). Time Series Data Augmentation for Deep Learning: A Survey. IJCAI Workshop on Time Series.
# Wormhole augmentation replaces some lags with older values # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply wormhole augmentation and inspect augmented windows augment <- ts_aug_wormhole() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)# Wormhole augmentation replaces some lags with older values # Load package and example dataset library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview xw <- ts_data(tsd$y, 10) ts_head(xw) # Apply wormhole augmentation and inspect augmented windows augment <- ts_aug_wormhole() augment <- daltoolbox::fit(augment, xw) xa <- transform(augment, xw) ts_head(xa)
Create a delegated-differencing ARIMA-like regressor for the
sliding-window workflow of tspredit.
ts_darima( preprocess = ts_norm_none(), input_size = NA, input_map = ts_lagmap(), intercept = TRUE )ts_darima( preprocess = ts_norm_none(), input_size = NA, input_map = ts_lagmap(), intercept = TRUE )
preprocess |
Preprocessing object. This is where delegated differencing
and adaptive normalization usually live. Defaults to |
input_size |
Integer. Number of lagged inputs used by the model. |
input_map |
Lag-selection strategy object created by |
intercept |
Logical. Whether to include an intercept in the linear model fitted over the lagged inputs. |
ts_darima() is a univariate model in the ts_regsw lineage. It was
designed as an elegant tspredit adaptation of classical ARIMA ideas to the
supervised sliding-window world already used by the package.
The key design decision is that the integration component is delegated to the preprocessing pipeline rather than embedded inside the model itself. In practice, this means that:
autoregressive structure is learned directly from lagged windows
the d of the ARIMA logic is handled by preprocessors such as
ts_norm_diff() or ts_norm_an()
multi-step forecasting reuses the standard recursive engine of ts_regsw
This keeps the model computationally light and naturally compatible with the target-centered multivariate workflow, where each endogenous auxiliary variable may need its own univariate learner.
ts_darima() is therefore best understood as a tspredit adaptation,
inspired by ARIMA but intentionally expressed in the package's own
object-oriented pipeline.
In particular, the class is meant to be read together with the package's preprocessing abstractions:
use ts_norm_none() when no integration-like step is desired
use ts_norm_diff() when first differencing should be delegated to the
pipeline
use ts_norm_an() when an adaptive normalization view is preferred
A ts_darima object inheriting from ts_regsw.
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015). Time Series Analysis: Forecasting and Control. Wiley.
Hyndman RJ, Athanasopoulos G (2021). Forecasting: Principles and Practice. Third Edition. OTexts. https://otexts.com/fpp3/
Ogasawara E, Pereira ACM, Bernardes GFR, Brandão AAF, Albuquerque MP (2010). Adaptive normalization: A novel data normalization approach for non-stationary time series. IJCNN.
data(tsd) ts <- ts_data(tsd$y, 8) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_darima(ts_norm_diff(), input_size = 5) model <- daltoolbox::fit(model, io_train$input, io_train$output) prediction <- predict(model, io_test$input[1, ], steps_ahead = 5) predictiondata(tsd) ts <- ts_data(tsd$y, 8) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_darima(ts_norm_diff(), input_size = 5) model <- daltoolbox::fit(model, io_train$input, io_train$output) prediction <- predict(model, io_test$input[1, ], steps_ahead = 5) prediction
Construct a time series data object used throughout the DAL Toolbox.
Accepts either a vector (raw time series) or a matrix/data.frame already
organized in sliding windows. Internally, a ts_data is stored as a matrix
with sw lag columns named t{lag} (e.g., t9, t8, ..., t0). When
sw = 1, the series is stored as a single column (t0).
ts_data(y, sw = 1)ts_data(y, sw = 1)
y |
Numeric vector or matrix-like. Time series values or sliding windows. |
sw |
Integer. Sliding-window size (number of lag columns). Use |
A ts_data object (matrix with attributes and column names).
# Example: building sliding windows data(tsd) head(tsd) # 1) Single-column ts_data (no windows) data <- ts_data(tsd$y) ts_head(data) # 2) 10-lag sliding windows (t9 ... t0) data10 <- ts_data(tsd$y, 10) ts_head(data10)# Example: building sliding windows data(tsd) head(tsd) # 1) Single-column ts_data (no windows) data <- ts_data(tsd$y) ts_head(data) # 2) 10-lag sliding windows (t9 ... t0) data10 <- ts_data(tsd$y, 10) ts_head(data10)
Construct a multivariate time-series object used throughout the target-centered multivariate workflow.
ts_data_mv( data, y = NULL, x = NULL, sw = 1, variables = NULL, lags = NULL, transforms = NULL )ts_data_mv( data, y = NULL, x = NULL, sw = 1, variables = NULL, lags = NULL, transforms = NULL )
data |
data.frame or matrix with one column per variable and one row per
time index. It can also be an existing |
y |
Optional character scalar. Name of the target variable. When |
x |
Optional character vector. Names of the auxiliary variables. By
default, all columns except |
sw |
Integer. Temporal width of the representation. Use |
variables |
Optional character vector. Variables to include when
|
lags |
Optional named list with one integer vector per variable. When
omitted, every variable uses all lags from |
transforms |
Optional named list of raw-series transformations applied per variable before the lagged blocks are built. Each entry can be a single transform object or a list of transforms. |
ts_data_mv() follows the same design principle as ts_data() in the
univariate path:
with sw = 1, it stores aligned multivariate observations
with sw > 1, it materializes multivariate lagged windows
This keeps a single data abstraction for both the aligned and the lagged representations.
In aligned mode (sw = 1):
each row is a time instant
each column is one variable
In lagged mode (sw > 1):
each row is a forecasting origin
each variable contributes one lag block
column names follow the pattern var_tk
Optional variables, lags, and transforms let the caller inspect a
specific multivariate feature space while staying inside the ts_data_mv
abstraction.
A ts_data_mv object.
data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv( data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y" ) ts_head(mv, 3) mv_sw <- ts_data_mv(mv, sw = 5) ts_head(mv_sw, 3)data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv( data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y" ) ts_head(mv, 3) mv_sw <- ts_data_mv(mv, sw = 5) ts_head(mv_sw, 3)
Forecast a univariate series using a deterministic law of formation instead of a statistical learner.
ts_deterministic( mode = c("periodic", "persist"), period = NULL, context_size = NULL )ts_deterministic( mode = c("periodic", "persist"), period = NULL, context_size = NULL )
mode |
Character. Deterministic mode. Supported values are
|
period |
Optional integer. Required when |
context_size |
Optional integer. Number of most recent values used to identify the next state in a periodic cycle. When omitted, the smallest non-ambiguous context is inferred from the learned cycle. |
ts_deterministic() defines a small family of rule-based predictors that
can operate either on raw time series or on sliding-window inputs.
The current deterministic modes are:
"periodic": repeat a learned cycle of fixed length
"persist": repeat the most recent observed value
This family is useful for variables whose future behavior is structurally determined, such as:
day-of-week codes
weekend indicators
fixed operational calendars
slowly changing auxiliary variables
Because the forecasting rule is deterministic, the same object can be used in two contexts:
direct raw-series prediction, in the lineage of ts_arima()
sliding-window prediction, in the lineage of ts_regsw
In other words, ts_deterministic() unifies both views for cases where the
predictive mechanism is a rule, not a learner over lagged attributes.
A ts_deterministic object.
series <- c(4, 5, 6, 7, 1, 2, 3) model <- ts_deterministic("periodic", period = 7) model <- daltoolbox::fit(model, x = series) predict(model, steps_ahead = 5) sw <- ts_data(series, sw = 4) io <- ts_projection(sw) model <- daltoolbox::fit(ts_deterministic("persist"), io$input, io$output) predict(model, io$input[1:2, ], steps_ahead = 1)series <- c(4, 5, 6, 7, 1, 2, 3) model <- ts_deterministic("periodic", period = 7) model <- daltoolbox::fit(model, x = series) predict(model, steps_ahead = 5) sw <- ts_data(series, sw = 4) io <- ts_projection(sw) model <- daltoolbox::fit(ts_deterministic("persist"), io$input, io$output) predict(model, io$input[1:2, ], steps_ahead = 1)
Create a time series prediction object that uses Extreme Learning Machine (ELM) regression.
It wraps the elmNNRcpp package to train single-hidden-layer networks with
randomly initialized hidden weights and closed-form output weights.
ts_elm( preprocess = NA, input_size = NA, input_map = ts_lagmap(), nhid = NA, actfun = "purelin" )ts_elm( preprocess = NA, input_size = NA, input_map = ts_lagmap(), nhid = NA, actfun = "purelin" )
preprocess |
Normalization preprocessor (e.g., |
input_size |
Integer. Number of lagged inputs used by the model. |
input_map |
Lag-selection strategy object created by |
nhid |
Integer. Hidden layer size. |
actfun |
Character. One of 'sig', 'radbas', 'tribas', 'relu', 'purelin'. |
ELMs are efficient to train and can perform well with appropriate
hidden size and activation choice. Consider normalizing inputs and tuning
nhid and the activation function.
A ts_elm object (S3) inheriting from ts_regsw.
G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew (2006). Extreme Learning Machine: Theory and Applications. Neurocomputing, 70(1–3), 489–501.
# Example: ELM with sliding-window inputs # Load package and toy dataset library(daltoolbox) library(tspredit) data(tsd) # Create sliding windows of length 10 (t9 ... t0) ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Split last 5 rows as test set samp <- ts_sample(ts, test_size = 5) # Project to inputs (X) and outputs (y) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define ELM with global min-max normalization and fit imap <- ts_lagmap("acf") model <- ts_elm(ts_norm_gminmax(), input_size = 4, input_map = imap, nhid = 3, actfun = "purelin") model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Forecast 5 steps ahead starting from the last known window prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) # Evaluate forecast error on the test horizon ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Example: ELM with sliding-window inputs # Load package and toy dataset library(daltoolbox) library(tspredit) data(tsd) # Create sliding windows of length 10 (t9 ... t0) ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Split last 5 rows as test set samp <- ts_sample(ts, test_size = 5) # Project to inputs (X) and outputs (y) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define ELM with global min-max normalization and fit imap <- ts_lagmap("acf") model <- ts_elm(ts_norm_gminmax(), input_size = 4, input_map = imap, nhid = 3, actfun = "purelin") model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Forecast 5 steps ahead starting from the last known window prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) # Evaluate forecast error on the test horizon ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Smooth a series by exponentially decaying weights that give more importance to recent observations.
ts_fil_ema(ema = 3)ts_fil_ema(ema = 3)
ema |
exponential moving average size |
EMA is related to simple exponential smoothing; it reacts faster to level changes than a simple moving average while reducing noise.
A ts_fil_ema object.
C. C. Holt (1957). Forecasting trends and seasonals by exponentially weighted moving averages. O.N.R. Research Memorandum.
# Exponential moving average smoothing on a noisy series # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Inject an outlier to illustrate smoothing effect tsd$y[9] <- 2 * tsd$y[9] # Define EMA filter, fit and transform the series filter <- ts_fil_ema(ema = 3) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)# Exponential moving average smoothing on a noisy series # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Inject an outlier to illustrate smoothing effect tsd$y[9] <- 2 * tsd$y[9] # Define EMA filter, fit and transform the series filter <- ts_fil_ema(ema = 3) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)
Empirical Mode Decomposition (EMD) filter that decomposes a signal into intrinsic mode functions (IMFs) and reconstructs a smoothed component.
ts_fil_emd(noise = 0.1, trials = 5)ts_fil_emd(noise = 0.1, trials = 5)
noise |
noise |
trials |
trials |
A ts_fil_emd object.
N. E. Huang et al. (1998). The Empirical Mode Decomposition and the Hilbert Spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society A.
# EMD-based smoothing: remove first IMF as noise # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit EMD filter and reconstruct without the first (noisiest) IMF filter <- ts_fil_emd() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)# EMD-based smoothing: remove first IMF as noise # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit EMD filter and reconstruct without the first (noisiest) IMF filter <- ts_fil_emd() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)
Frequency-domain smoothing using the Fast Fourier Transform (FFT) to attenuate high-frequency components.
ts_fil_fft()ts_fil_fft()
The implementation keeps the lowest frequencies that explain most of the spectral energy and reconstructs the series from that low-pass spectrum.
A ts_fil_fft object.
J. W. Cooley and J. W. Tukey (1965). An algorithm for the machine calculation of complex Fourier series. Math. Comput.
# Frequency-domain smoothing via FFT low-pass reconstruction # Load package and example data library(daltoolbox) library(tspredit) x <- seq(0, 4 * pi, length.out = 128) y <- sin(x) + 0.25 * sin(12 * x) # Fit FFT-based filter and reconstruct the low-frequency signal filter <- ts_fil_fft() filter <- daltoolbox::fit(filter, y) yhat <- transform(filter, y) # Compare original vs frequency-smoothed series plot_ts_pred(y = y, yadj = yhat)# Frequency-domain smoothing via FFT low-pass reconstruction # Load package and example data library(daltoolbox) library(tspredit) x <- seq(0, 4 * pi, length.out = 128) y <- sin(x) + 0.25 * sin(12 * x) # Fit FFT-based filter and reconstruct the low-frequency signal filter <- ts_fil_fft() filter <- daltoolbox::fit(filter, y) yhat <- transform(filter, y) # Compare original vs frequency-smoothed series plot_ts_pred(y = y, yadj = yhat)
Decompose a series into trend and cyclical components using the Hodrick–Prescott (HP) filter and optionally blend with the original series.
This filter removes short-term fluctuations by penalizing changes in the growth rate of the trend component.
ts_fil_hp(lambda = 100, preserve = 0.9)ts_fil_hp(lambda = 100, preserve = 0.9)
lambda |
It is the smoothing parameter of the Hodrick-Prescott filter. Lambda = 100*(frequency)^2 Correspondence between frequency and lambda values annual => frequency = 1 // lambda = 100 quarterly => frequency = 4 // lambda = 1600 monthly => frequency = 12 // lambda = 14400 weekly => frequency = 52 // lambda = 270400 daily (7 days a week) => frequency = 365 // lambda = 13322500 daily (5 days a week) => frequency = 252 // lambda = 6812100 |
preserve |
value between 0 and 1. Balance the composition of observations and applied filter. Values close to 1 preserve original values. Values close to 0 adopts HP filter values. |
The filter strength is governed by lambda = 100 * frequency^2.
Use preserve in (0, 1] to convex-combine the raw series and the HP trend.
A ts_fil_hp object.
R. J. Hodrick and E. C. Prescott (1997). Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking, 29(1).
# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_hp(lambda = 100*(26)^2) #frequency assumed to be 26 filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_hp(lambda = 100*(26)^2) #frequency assumed to be 26 filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)
Estimate a latent trend via a state-space model using the
Kalman Filter (KF), wrapping the KFAS package.
ts_fil_kalman(H = 0.1, Q = 1)ts_fil_kalman(H = 0.1, Q = 1)
H |
variance or covariance matrix of the measurement noise. This noise pertains to the relationship between the true system state and actual observations. Measurement noise is added to the measurement equation to account for uncertainties or errors associated with real observations. The higher this value, the higher the level of uncertainty in the observations. |
Q |
variance or covariance matrix of the process noise. This noise follows a zero-mean Gaussian distribution. It is added to the equation to account for uncertainties or unmodeled disturbances in the state evolution. The higher this value, the greater the uncertainty in the state transition process. |
A ts_fil_kalman object.
R. E. Kalman (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35–45.
# State-space smoothing with Kalman Filter (KF) # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit KF (H = obs noise, Q = process noise) and transform filter <- ts_fil_kalman() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Plot original vs KF-smoothed series plot_ts_pred(y = tsd$y, yadj = y)# State-space smoothing with Kalman Filter (KF) # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit KF (H = obs noise, Q = process noise) and transform filter <- ts_fil_kalman() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Plot original vs KF-smoothed series plot_ts_pred(y = tsd$y, yadj = y)
Locally Weighted Scatterplot Smoothing (LOWESS) fits local regressions to capture the primary trend while reducing noise and spikes.
ts_fil_lowess(f = 0.2)ts_fil_lowess(f = 0.2)
f |
smoothing parameter. The larger this value, the smoother the series will be. This provides the proportion of points on the plot that influence the smoothing. |
A ts_fil_lowess object.
W. S. Cleveland (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association.
# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_lowess(f = 0.2) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_lowess(f = 0.2) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)
Smooth out fluctuations and reduce noise by averaging over a fixed-size rolling window.
ts_fil_ma(ma = 3)ts_fil_ma(ma = 3)
ma |
moving average size |
Larger windows produce smoother series but may lag turning points.
A ts_fil_ma object.
# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_ma(3) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_ma(3) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)
Identity filter that returns the original series unchanged.
ts_fil_none()ts_fil_none()
A ts_fil_none object.
# Identity filter (returns original series) # Load package and example series library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier for comparison # Fit identity filter and transform (no change expected) filter <- ts_fil_none() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Plot original vs (identical) filtered series plot_ts_pred(y = tsd$y, yadj = y)# Identity filter (returns original series) # Load package and example series library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier for comparison # Fit identity filter and transform (no change expected) filter <- ts_fil_none() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Plot original vs (identical) filtered series plot_ts_pred(y = tsd$y, yadj = y)
Double/triple exponential smoothing capturing level, trend, and optionally seasonality components.
ts_fil_qes(gamma = FALSE)ts_fil_qes(gamma = FALSE)
gamma |
If TRUE, enables the gamma seasonality component. |
A ts_fil_qes object.
The transformed series is aligned to the input length and may contain leading
NA values while the Holt-Winters state is being initialized.
P. R. Winters (1960). Forecasting sales by exponentially weighted moving averages. Management Science.
# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_qes() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_qes() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)
Apply recursive linear filtering (ARMA-style recursion) to a univariate series or each column of a multivariate series. Useful for smoothing and mitigating autocorrelation.
ts_fil_recursive(filter)ts_fil_recursive(filter)
filter |
smoothing parameter. The larger the value, the greater the smoothing. The smaller the value, the less smoothing, and the resulting series shape is more similar to the original series. |
A ts_fil_recursive object.
# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_recursive(filter = 0.05) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_recursive(filter = 0.05) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)
Ensemble/robust EMD-based denoising using CEEMD to separate noise-dominated IMFs and reconstruct the signal.
ts_fil_remd(noise = 0.1, trials = 5)ts_fil_remd(noise = 0.1, trials = 5)
noise |
noise |
trials |
trials |
A ts_fil_remd object.
Z. Wu and N. E. Huang (2009). Ensemble Empirical Mode Decomposition: a noise-assisted data analysis method. Advances in Adaptive Data Analysis.
# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_remd() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_remd() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)
Remove the seasonal component from a time series while preserving level and trend, using STL decomposition.
ts_fil_seas_adj(frequency = NULL)ts_fil_seas_adj(frequency = NULL)
frequency |
Frequency of the time series. It is an optional parameter. It can be configured when the frequency of the time series is known. |
A ts_fil_seas_adj object.
R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning (1990). STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1), 3–73.
# Seasonal adjustment using STL at known frequency # Load package and build a seasonal signal library(daltoolbox) library(tspredit) x <- seq_len(120) y <- x / 100 + sin(2 * pi * x / 12) + rnorm(120, sd = 0.05) # Fit seasonal adjustment (set frequency if known) and transform filter <- ts_fil_seas_adj(frequency = 12) filter <- daltoolbox::fit(filter, y) yhat <- transform(filter, y) # Plot original vs seasonally adjusted series plot_ts_pred(y = y, yadj = yhat)# Seasonal adjustment using STL at known frequency # Load package and build a seasonal signal library(daltoolbox) library(tspredit) x <- seq_len(120) y <- x / 100 + sin(2 * pi * x / 12) + rnorm(120, sd = 0.05) # Fit seasonal adjustment (set frequency if known) and transform filter <- ts_fil_seas_adj(frequency = 12) filter <- daltoolbox::fit(filter, y) yhat <- transform(filter, y) # Plot original vs seasonally adjusted series plot_ts_pred(y = y, yadj = yhat)
Exponential smoothing focused on the level component, with optional extensions to trend/seasonality via Holt–Winters variants.
ts_fil_ses(gamma = FALSE)ts_fil_ses(gamma = FALSE)
gamma |
If TRUE, enables the gamma seasonality component. |
A ts_fil_ses object.
The transformed series is aligned to the input length and may contain a
leading NA while the Holt-Winters state is being initialized.
R. G. Brown (1959). Statistical Forecasting for Inventory Control.
# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_ses() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)# time series with noise library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2*tsd$y[9] # filter filter <- ts_fil_ses() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # plot plot_ts_pred(y=tsd$y, yadj=y)
Remove or reduce randomness (noise) using a robust smoothing strategy that first mitigates outliers and then smooths residual variation.
ts_fil_smooth()ts_fil_smooth()
A ts_fil_smooth object.
# Robust smoothing with iterative outlier mitigation # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit smoother and transform to reduce spikes/noise filter <- ts_fil_smooth() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)# Robust smoothing with iterative outlier mitigation # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit smoother and transform to reduce spikes/noise filter <- ts_fil_smooth() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)
Fit a cubic smoothing spline to a time series for smooth trend extraction with a tunable roughness penalty.
ts_fil_spline(spar = NULL)ts_fil_spline(spar = NULL)
spar |
smoothing parameter. When spar is specified, the coefficient of the integral of the squared second derivative in the fitting criterion (penalized log-likelihood) is a monotone function of spar. |
A ts_fil_spline object.
P. Craven and G. Wahba (1978). Smoothing noisy data with spline functions. Numerische Mathematik.
# Smoothing splines with adjustable roughness penalty # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit spline smoother (spar controls smoothness) and transform filter <- ts_fil_spline(spar = 0.5) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)# Smoothing splines with adjustable roughness penalty # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit spline smoother (spar controls smoothness) and transform filter <- ts_fil_spline(spar = 0.5) filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs smoothed series plot_ts_pred(y = tsd$y, yadj = y)
Denoise a series using discrete wavelet transforms and selected wavelet families.
ts_fil_wavelet(filter = "haar")ts_fil_wavelet(filter = "haar")
filter |
Available wavelet filters: 'haar', 'd4', 'la8', 'bl14', 'c6'. |
A ts_fil_wavelet object.
S. Mallat (1989). A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
# Denoising with discrete wavelets (optionally selecting best filter) # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit wavelet filter ("haar" by default; can pass a list to select best) filter <- ts_fil_wavelet() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs wavelet-denoised series plot_ts_pred(y = tsd$y, yadj = y)# Denoising with discrete wavelets (optionally selecting best filter) # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit wavelet filter ("haar" by default; can pass a list to select best) filter <- ts_fil_wavelet() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Compare original vs wavelet-denoised series plot_ts_pred(y = tsd$y, yadj = y)
Apply Winsorization to limit extreme values by replacing them with nearer order statistics, reducing the influence of outliers.
ts_fil_winsor()ts_fil_winsor()
A ts_fil_winsor object.
J. W. Tukey (1962). The future of data analysis. Annals of Mathematical Statistics. (Winsorization discussed in robust summaries.)
# Winsorization: cap extreme values to reduce outlier impact # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit Winsor filter and transform series filter <- ts_fil_winsor() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Plot original vs Winsorized series plot_ts_pred(y = tsd$y, yadj = y)# Winsorization: cap extreme values to reduce outlier impact # Load package and example data library(daltoolbox) library(tspredit) data(tsd) tsd$y[9] <- 2 * tsd$y[9] # inject an outlier # Fit Winsor filter and transform series filter <- ts_fil_winsor() filter <- daltoolbox::fit(filter, tsd$y) y <- transform(filter, tsd$y) # Plot original vs Winsorized series plot_ts_pred(y = tsd$y, yadj = y)
ts_data ObjectReturn the first n observations from a ts_data.
ts_head(x, n = 6L, ...)ts_head(x, n = 6L, ...)
x |
|
n |
number of rows to return |
... |
optional arguments |
The first n observations of a ts_data (as a matrix/data.frame).
data(tsd) data10 <- ts_data(tsd$y, 10) ts_head(data10)data(tsd) data10 <- ts_data(tsd$y, 10) ts_head(data10)
Integrated tuning over input sizes, preprocessing, augmentation, and model hyperparameters for time series.
ts_integtune( input_size, base_model, folds = 10, ranges = NULL, preprocess = list(ts_norm_gminmax()), augment = list(ts_aug_none()) )ts_integtune( input_size, base_model, folds = 10, ranges = NULL, preprocess = list(ts_norm_gminmax()), augment = list(ts_aug_none()) )
input_size |
Integer vector. Candidate input window sizes. |
base_model |
Base model object for tuning. |
folds |
Integer. Number of cross-validation folds. |
ranges |
Named list of hyperparameter ranges to explore. |
preprocess |
List of preprocessing objects to compare. |
augment |
List of augmentation objects to apply during training. |
A ts_integtune object.
Salles, R., Pacitti, E., Bezerra, E., Marques, C., Pacheco, C., Oliveira, C., Porto, F., Ogasawara, E. (2023). TSPredIT: Integrated Tuning of Data Preprocessing and Time Series Prediction Models. Lecture Notes in Computer Science.
# Integrated search over input size, preprocessing and model hyperparameters library(daltoolbox) library(tspredit) data(tsd) # Build windows and split into train/test, then project to (X, y) ts <- ts_data(tsd$y, 10) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Configure integrated tuning: ranges for input_size, ELM (nhid, actfun), and preprocessors tune <- ts_integtune( input_size = 3:5, base_model = ts_elm(), ranges = list(nhid = 1:5, actfun = c('purelin')), preprocess = list(ts_norm_gminmax()) ) # Run search; augmentation (if provided) is applied during training internally model <- daltoolbox::fit(tune, x = io_train$input, y = io_train$output) # Forecast and evaluate on the held-out window prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Integrated search over input size, preprocessing and model hyperparameters library(daltoolbox) library(tspredit) data(tsd) # Build windows and split into train/test, then project to (X, y) ts <- ts_data(tsd$y, 10) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Configure integrated tuning: ranges for input_size, ELM (nhid, actfun), and preprocessors tune <- ts_integtune( input_size = 3:5, base_model = ts_elm(), ranges = list(nhid = 1:5, actfun = c('purelin')), preprocess = list(ts_norm_gminmax()) ) # Run search; augmentation (if provided) is applied during training internally model <- daltoolbox::fit(tune, x = io_train$input, y = io_train$output) # Forecast and evaluate on the held-out window prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Create a prediction object that uses the K-Nearest Neighbors regression for time series via sliding windows.
ts_knn(preprocess = NA, input_size = NA, input_map = ts_lagmap(), k = NA)ts_knn(preprocess = NA, input_size = NA, input_map = ts_lagmap(), k = NA)
preprocess |
Normalization preprocessor (e.g., |
input_size |
Integer. Number of lagged inputs. |
input_map |
Lag-selection strategy object created by |
k |
Integer. Number of neighbors. |
KNN regression predicts a value as the average (or weighted average) of the outputs of the k most similar windows in the training set. Similarity is computed in the feature space induced by lagged inputs. Consider normalization for distance-based methods.
A ts_knn object (S3) inheriting from ts_regsw.
T. M. Cover and P. E. Hart (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
# Example: distance-based regression on sliding windows # Load tools and example series library(daltoolbox) library(tspredit) data(tsd) # Build 10-lag windows and preview a few rows ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Split end of series as test and project (X, y) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define KNN regressor and fit (distance-based; normalization recommended) model <- ts_knn(ts_norm_gminmax(), input_size = 4, input_map = ts_lagmap("pacf"), k = 3) model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Predict multiple steps ahead and evaluate prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Example: distance-based regression on sliding windows # Load tools and example series library(daltoolbox) library(tspredit) data(tsd) # Build 10-lag windows and preview a few rows ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Split end of series as test and project (X, y) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define KNN regressor and fit (distance-based; normalization recommended) model <- ts_knn(ts_norm_gminmax(), input_size = 4, input_map = ts_lagmap("pacf"), k = 3) model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Predict multiple steps ahead and evaluate prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Configure how a sliding-window predictor chooses the input_size lagged
attributes that will be fed to the underlying regression model.
ts_lagmap( method = c("recent", "even", "geom", "acf", "pacf", "peaks", "seasonal", "acf_seasonal", "pacf_seasonal", "blocks", "mi", "mrmr"), seasonality = NULL, peak_basis = c("acf", "pacf"), block_radius = 1, bins = 8 )ts_lagmap( method = c("recent", "even", "geom", "acf", "pacf", "peaks", "seasonal", "acf_seasonal", "pacf_seasonal", "blocks", "mi", "mrmr"), seasonality = NULL, peak_basis = c("acf", "pacf"), block_radius = 1, bins = 8 )
method |
Character. Lag-selection strategy:
|
seasonality |
Optional integer. Seasonal period used by the seasonal
lag selectors. If |
peak_basis |
Character. Correlation profile used by |
block_radius |
Integer. Radius around each selected center when
|
bins |
Integer. Number of quantile bins used by the mutual-information criteria. |
The lag mapper is fitted on the training data before the base predictor is
trained. During fit(), the mapper stores a vector of selected lag columns.
The default "recent" method reproduces the historical behavior of the
package: it keeps the most recent input_size observations available in the
sliding window.
When ts_lagmap() is used inside a ts_regsw model, the mapper is fitted
after the model preprocessor has transformed the training windows. So the
selected positions refer to the representation actually seen by the backend,
not necessarily to the raw pre-transform window geometry.
Correlation-based methods operate on the raw training series reconstructed
from the input windows and aligned outputs. Supervised methods ("mi" and
"mrmr") inspect the relationship between each lagged attribute and the
training target.
A ts_lagmap object.
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015). Time Series Analysis: Forecasting and Control. Fifth Edition. Wiley.
Peng H, Long F, Ding C (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226-1238. doi:10.1109/TPAMI.2005.159
Leites J, Cerqueira V, Soares C (2024). Selecting time lags for time series forecasting: an empirical study. arXiv:2405.11237.
library(daltoolbox) library(tspredit) data(tsd) ts <- ts_data(tsd$y, 10) io <- ts_projection(ts) mapper <- ts_lagmap(method = "pacf") mapper <- daltoolbox::fit(mapper, io$input, io$output, input_size = 4) mapper$lags mapper$columnslibrary(daltoolbox) library(tspredit) data(tsd) ts <- ts_data(tsd$y, 10) io <- ts_projection(ts) mapper <- ts_lagmap(method = "pacf") mapper <- daltoolbox::fit(mapper, io$input, io$output, input_size = 4) mapper$lags mapper$columns
Create a target-centered singular multivariate regressor based on
stats::lm.
ts_lm_mv(models_x = NULL, formula = NULL, features = NULL)ts_lm_mv(models_x = NULL, formula = NULL, features = NULL)
models_x |
Optional named list with one univariate model per auxiliary variable. |
formula |
Optional regression formula. When omitted, the target variable
from |
features |
Optional character vector of feature names used when |
ts_lm_mv() is a linear-regression member of the ts_reg_mv family.
It is inspired by the formula-based design already used in daltoolbox,
but adapted to the aligned multivariate time-series abstraction of
tspredit.
This makes it a very transparent baseline for the singular multivariate branch:
the target variable remains explicit
the auxiliary variables are declared in the formula
the analyst can read the structural assumption directly from the model
The most common usage patterns are:
provide a full formula such as y ~ x1 + x2
omit the formula and let the model regress y on all auxiliary variables
The target variable is forecast from the synchronized auxiliary variables.
When future auxiliary values are not known, they can be generated by the
univariate models supplied in models_x.
A ts_lm_mv object inheriting from ts_reg_mv.
Montgomery DC, Peck EA, Vining GG (2021). Introduction to Linear Regression Analysis. Wiley.
data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_lm_mv( models_x = list(x1 = ts_arima(), x2 = ts_arima()), formula = y ~ x1 + x2 ) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 5)data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_lm_mv( models_x = list(x1 = ts_arima(), x2 = ts_arima()), formula = y ~ x1 + x2 ) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 5)
Create a time series prediction object based on a Multilayer Perceptron (MLP) regressor.
It wraps the nnet package to train a single-hidden-layer neural network
on sliding-window inputs. Use ts_regsw utilities to project inputs/outputs.
ts_mlp( preprocess = NA, input_size = NA, input_map = ts_lagmap(), size = NA, decay = 0.01, maxit = 1000 )ts_mlp( preprocess = NA, input_size = NA, input_map = ts_lagmap(), size = NA, decay = 0.01, maxit = 1000 )
preprocess |
Normalization preprocessor (e.g., |
input_size |
Integer. Number of lagged inputs used by the model. |
input_map |
Lag-selection strategy object created by |
size |
Integer. Number of hidden neurons. |
decay |
Numeric. L2 weight decay (regularization) parameter. |
maxit |
Integer. Maximum number of training iterations. |
The MLP is a universal function approximator capable of learning
non-linear mappings from lagged inputs to next-step values. For stability,
consider normalizing inputs (e.g., ts_norm_gminmax()). Hidden size and
weight decay control capacity and regularization respectively.
A ts_mlp object (S3) inheriting from ts_regsw.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams (1986). Learning representations by back-propagating errors. Nature 323, 533–536.
W. N. Venables and B. D. Ripley (2002). Modern Applied Statistics with S.
Fourth Edition. Springer. (for the nnet package)
# Example: MLP on sliding windows with min–max normalization # Load package and dataset library(daltoolbox) library(tspredit) data(tsd) ts <- ts_data(tsd$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Prepare projection (X, y) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define and fit the MLP imap <- ts_lagmap("even") model <- ts_mlp(ts_norm_gminmax(), input_size = 4, input_map = imap, size = 4, decay = 0) model <- daltoolbox::fit(model, x=io_train$input, y=io_train$output) # Predict 5 steps ahead prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) # Evaluate ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Example: MLP on sliding windows with min–max normalization # Load package and dataset library(daltoolbox) library(tspredit) data(tsd) ts <- ts_data(tsd$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Prepare projection (X, y) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define and fit the MLP imap <- ts_lagmap("even") model <- ts_mlp(ts_norm_gminmax(), input_size = 4, input_map = imap, size = 4, decay = 0) model <- daltoolbox::fit(model, x=io_train$input, y=io_train$output) # Predict 5 steps ahead prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) # Evaluate ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Wrap a univariate forecasting model so it can be orchestrated inside a multivariate workflow.
ts_mv_spec(model, variables = NULL, lags = NULL, transforms = NULL)ts_mv_spec(model, variables = NULL, lags = NULL, transforms = NULL)
model |
Base model object. It can be a sliding-window regressor such as
|
variables |
Optional character vector. Variables used as predictors for this submodel. When omitted, defaults depend on the context: target model uses all variables, auxiliary models use their own variable. |
lags |
Optional named list with one integer vector per variable. When
omitted, each variable uses all lags from |
transforms |
Optional named list of raw-series transformations applied per variable before the multivariate windows are built. Each entry can be a single transform object or a list of transforms. These transformations act as variable-specific feature engineering and are orchestrated by the multivariate wrapper. |
ts_mv_spec() is the object-oriented contract that describes how one
variable-specific predictive pipeline should be assembled inside
ts_regsw_mv().
Each specification can declare:
model: the learner responsible for the variable
variables: which synchronized series are allowed as inputs to this
learner
lags: which lag positions are extracted from each variable block
transforms: optional raw-series transformations applied per variable
before the multivariate windows are built
This design lets different variables use different forecasting strategies while preserving a single orchestration contract. For example:
the target y may use ts_lstm(ts_norm_an(), ...)
x1 may use ts_mlp(ts_norm_diff(), ...)
x2 may use ts_rf(ts_norm_gminmax(), ...) plus a smoothing filter
deterministic auxiliary variables may use ts_deterministic(),
ts_periodic(), or ts_persist()
In other words, the multivariate layer coordinates the pipelines, but the behavior of each variable still lives inside its own object.
A ts_mv_spec object.
spec_y <- ts_mv_spec(ts_mlp(ts_norm_gminmax()), variables = c("y", "x1")) spec_x1 <- ts_mv_spec(ts_deterministic("periodic", period = 7), variables = "x1")spec_y <- ts_mv_spec(ts_mlp(ts_norm_gminmax()), variables = c("y", "x1")) spec_x1 <- ts_mv_spec(ts_deterministic("periodic", period = 7), variables = "x1")
Transform data to a common scale while adapting to changes in distribution over time (optionally over a trailing window).
ts_norm_an( outliers = outliers_boxplot(), nw = 0, average = c("mean", "ema"), operation = c("divide", "subtract", "softdivide", "asinh"), scale = c("sd", "mad", "none"), lambda = 1, epsilon = 1e-08 )ts_norm_an( outliers = outliers_boxplot(), nw = 0, average = c("mean", "ema"), operation = c("divide", "subtract", "softdivide", "asinh"), scale = c("sd", "mad", "none"), lambda = 1, epsilon = 1e-08 )
outliers |
Indicate outliers transformation class. NULL can avoid outliers removal. |
nw |
integer: window size. |
average |
Character. Adaptive reference statistic: |
operation |
Character. Adaptive normalization operator:
|
scale |
Character. Local scale estimator used by the hybrid operators:
|
lambda |
Numeric. Weight assigned to the adaptive level term inside the hybrid reference scale. |
epsilon |
Numeric. Positive floor used to stabilize near-zero denominators and local scales. |
ts_norm_an() supports a family of adaptive window-wise transformations:
"divide" rescales a window by its adaptive reference level.
"subtract" recenters the window by subtracting the adaptive reference
level.
"softdivide" computes a stabilized relative deviation:
.
"asinh" applies an inverse-hyperbolic-sine contrast around the adaptive
reference level using the same stabilized scale.
The concrete operators are implemented in tsanutils(), while
ts_norm_an() focuses on estimating the adaptive references and applying
the chosen transformation consistently during fit, transform, and inverse
transform.
In the current contract, the adaptive reference is estimated from the full
supervised window passed to fit() or transform(). So when the input is a
sliding window produced by ts_data(), the terminal t0 position is part of
the same window-wise reference used for the transformation.
The adaptive reference is estimated either by a simple mean or by an
exponentially weighted mean (average = "ema"). The hybrid operators
additionally use a local scale estimate s based on either the standard
deviation or the MAD.
A ts_norm_an object.
Ogasawara, E., Martinez, L. C., De Oliveira, D., Zimbrão, G., Pappa, G. L., Mattoso, M. (2010). Adaptive Normalization: A novel data normalization approach for non-stationary time series. Proceedings of the International Joint Conference on Neural Networks (IJCNN). doi:10.1109/IJCNN.2010.5596746
Huber PJ (1964). Robust Estimation of a Location Parameter. Annals of Mathematical Statistics, 35(1), 73-101. doi:10.1214/aoms/1177703732
Burbidge JB, Magee L, Robb AL (1988). Alternative Transformations to Handle Extreme Values of the Dependent Variable. Journal of the American Statistical Association, 83(401), 123-127.
Bellemare MF, Wichman CJ (2020). Elasticities and the Inverse Hyperbolic Sine Transformation. Oxford Bulletin of Economics and Statistics, 82(1), 50-61. doi:10.1111/obes.12325
# time series to normalize library(daltoolbox) library(tspredit) data(tsd) # convert to sliding windows ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # divisive adaptive normalization (default) preproc <- ts_norm_an() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) # subtractive adaptive normalization preproc <- ts_norm_an(operation = "subtract") preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) # EMA-based soft division preproc <- ts_norm_an(average = "ema", operation = "softdivide", scale = "mad") preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3)# time series to normalize library(daltoolbox) library(tspredit) data(tsd) # convert to sliding windows ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # divisive adaptive normalization (default) preproc <- ts_norm_an() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) # subtractive adaptive normalization preproc <- ts_norm_an(operation = "subtract") preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) # EMA-based soft division preproc <- ts_norm_an(average = "ema", operation = "softdivide", scale = "mad") preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3)
Transform a series by first differences to remove level and highlight changes; normalization is then applied to the differenced series.
In sliding-window mode, this transformation reduces the window width by one:
a window with columns t9 ... t0 becomes a differenced window with
t8 ... t0 expressed as consecutive first differences. Any downstream lag
selection must therefore be learned on the transformed representation.
ts_norm_diff(outliers = outliers_boxplot())ts_norm_diff(outliers = outliers_boxplot())
outliers |
Indicate outliers transformation class. NULL can avoid outliers removal. |
A ts_norm_diff object.
Salles, R., Assis, L., Guedes, G., Bezerra, E., Porto, F., Ogasawara, E. (2017). A framework for benchmarking machine learning methods using linear models for univariate time series prediction. Proceedings of the International Joint Conference on Neural Networks (IJCNN). doi:10.1109/IJCNN.2017.7966139
# Differencing + global min–max normalization # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview raw last column ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # Fit differencing preprocessor and transform; note one fewer lag column preproc <- ts_norm_diff() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,9])# Differencing + global min–max normalization # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows and preview raw last column ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # Fit differencing preprocessor and transform; note one fewer lag column preproc <- ts_norm_diff() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,9])
Rescale values so the global minimum maps to 0 and the global maximum maps to 1 over the training set.
ts_norm_gminmax(outliers = outliers_boxplot())ts_norm_gminmax(outliers = outliers_boxplot())
outliers |
Indicate outliers transformation class. NULL can avoid outliers removal. |
The same scaling is applied to inputs and inverted on predictions
via inverse_transform.
A ts_norm_gminmax object.
Ogasawara, E., Murta, L., Zimbrão, G., Mattoso, M. (2009). Neural networks cartridges for data mining on time series. Proceedings of the International Joint Conference on Neural Networks (IJCNN). doi:10.1109/IJCNN.2009.5178615
# Global min–max normalization across the full training set # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Build 10-lag windows and preview raw scale ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # Fit global min–max and transform; inspect post-scale values preproc <- ts_norm_gminmax() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])# Global min–max normalization across the full training set # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Build 10-lag windows and preview raw scale ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # Fit global min–max and transform; inspect post-scale values preproc <- ts_norm_gminmax() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
Identity transform that leaves data unchanged but aligns with the pre/post-processing interface.
ts_norm_none()ts_norm_none()
A ts_norm_none object.
# Identity normalization (no scaling applied) # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows xw <- ts_data(tsd$y, 10) # No data normalization — transform returns inputs unchanged normalize <- ts_norm_none() normalize <- daltoolbox::fit(normalize, xw) xa <- transform(normalize, xw) ts_head(xa)# Identity normalization (no scaling applied) # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Convert to sliding windows xw <- ts_data(tsd$y, 10) # No data normalization — transform returns inputs unchanged normalize <- ts_norm_none() normalize <- daltoolbox::fit(normalize, xw) xa <- transform(normalize, xw) ts_head(xa)
Create an object for normalizing each window by its own min and max, preserving local contrast while standardizing scales.
ts_norm_swminmax(outliers = outliers_boxplot())ts_norm_swminmax(outliers = outliers_boxplot())
outliers |
Indicate outliers transformation class. NULL can avoid outliers removal. |
A ts_norm_swminmax object.
Ogasawara, E., Murta, L., Zimbrão, G., Mattoso, M. (2009). Neural networks cartridges for data mining on time series. Proceedings of the International Joint Conference on Neural Networks (IJCNN). doi:10.1109/IJCNN.2009.5178615
# Per-window min–max normalization for sliding windows # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Build 10-lag windows and preview raw scale ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # Fit per-window min–max and transform; inspect post-scale values preproc <- ts_norm_swminmax() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])# Per-window min–max normalization for sliding windows # Load package and example data library(daltoolbox) library(tspredit) data(tsd) # Build 10-lag windows and preview raw scale ts <- ts_data(tsd$y, 10) ts_head(ts, 3) summary(ts[,10]) # Fit per-window min–max and transform; inspect post-scale values preproc <- ts_norm_swminmax() preproc <- daltoolbox::fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
Forecast a univariate series by repeating a learned periodic cycle.
ts_periodic(period, context_size = NULL)ts_periodic(period, context_size = NULL)
period |
Integer. Cycle length to repeat. |
context_size |
Optional integer. Most recent values used to identify the next state within the cycle. When omitted, the smallest non-ambiguous value is inferred automatically. |
A ts_periodic object, inheriting from ts_deterministic.
series <- c(4, 5, 6, 7, 1, 2, 3) model <- ts_periodic(7) model <- daltoolbox::fit(model, x = series) predict(model, steps_ahead = 5)series <- c(4, 5, 6, 7, 1, 2, 3) model <- ts_periodic(7) model <- daltoolbox::fit(model, x = series) predict(model, steps_ahead = 5)
Forecast a univariate series by repeating its most recent observed value.
ts_persist()ts_persist()
A ts_persist object, inheriting from ts_deterministic.
series <- c(10, 11, 11, 11) model <- ts_persist() model <- daltoolbox::fit(model, x = series) predict(model, steps_ahead = 3)series <- c(10, 11, 11, 11) model <- ts_persist() model <- daltoolbox::fit(model, x = series) predict(model, steps_ahead = 3)
Split a ts_data (sliding windows) into input features and
output targets for modeling.
ts_projection(ts)ts_projection(ts)
ts |
Matrix or data.frame containing a |
For a multi-column ts_data, returns all but the last column as
inputs and the last column as the output. For a single-row matrix, returns
ts_data-wrapped inputs/outputs preserving names and window size.
A ts_projection object with two elements: $input and $output.
# Setting up a ts_data and projecting (X, y) # Load example dataset and create windows data(tsd) ts <- ts_data(tsd$y, 10) io <- ts_projection(ts) # Input data (features) ts_head(io$input) # Output data (target) ts_head(io$output)# Setting up a ts_data and projecting (X, y) # Load example dataset and create windows data(tsd) ts <- ts_data(tsd$y, 10) io <- ts_projection(ts) # Input data (features) ts_head(io$input) # Output data (target) ts_head(io$output)
Base class for time series regression models that operate directly on time series (non-sliding-window specialization).
ts_reg()ts_reg()
This class is intended to be subclassed by modeling backends that
do not require the sliding-window interface. Methods such as fit(),
predict(), and evaluate() dispatch on this class.
A ts_reg object (S3) to be extended by concrete models.
# Abstract base class — instantiate concrete subclasses instead # Examples: ts_mlp(), ts_rf(), ts_svm(), ts_arima()# Abstract base class — instantiate concrete subclasses instead # Examples: ts_mlp(), ts_rf(), ts_svm(), ts_arima()
Base class for singular multivariate time-series models that
operate on aligned observations (sw = 1).
ts_reg_mv(models_x = NULL)ts_reg_mv(models_x = NULL)
models_x |
Optional named list with one univariate model per auxiliary
variable. These models are used to generate future paths for |
ts_reg_mv() is the multivariate counterpart of the raw-series branch of
tspredit.
It is intended for models that consume aligned multivariate observations
directly, without first materializing explicit lagged windows in
ts_data_mv(..., sw > 1).
This branch is appropriate when the multivariate relationship is naturally expressed at the aligned-observation level, for example:
target-centered linear regression over synchronized covariates
ARIMA with external regressors (ARIMAX)
vector autoregression over the whole system
The design remains target-centered:
the multivariate object still declares one target variable y
predict() returns the forecast of y by default
descendants may also expose the forecast path of the remaining variables
when return_all = TRUE
Typical descendants are:
ts_arimax(): target-centered dynamic regression with ARIMA errors
ts_lm_mv(): target-centered multivariate linear regression
ts_var(): vector autoregression, still exposed through a target-centered
interface
The interface keeps a distinguished target variable y, but models may also
return the forecast path of the remaining variables when requested.
A ts_reg_mv object.
Base class for time series regression models built on sliding-window representations.
ts_regsw(preprocess = NA, input_size = NA, input_map = ts_lagmap())ts_regsw(preprocess = NA, input_size = NA, input_map = ts_lagmap())
preprocess |
Normalization preprocessor (e.g., |
input_size |
Integer. Number of lagged inputs per example. |
input_map |
Lag-selection strategy object created by |
This class provides helpers to map ts_data matrices into the
input window expected by ML backends and to apply pre/post processing
(e.g., normalization) consistently during fit and predict.
The preprocessing stage runs before input_map is fitted. So lag selection
is learned on the transformed representation actually delivered to the
backend model, not on the raw pre-transform window. This matters for
preprocessors such as ts_norm_diff() that change the effective window
geometry.
A ts_regsw object (S3) to be extended by concrete models.
# Abstract base class for sliding-window regressors # Use concrete subclasses such as ts_mlp(), ts_rf(), ts_svm(), ts_elm()# Abstract base class for sliding-window regressors # Use concrete subclasses such as ts_mlp(), ts_rf(), ts_svm(), ts_elm()
Orchestrate one target model and one auxiliary model per
covariate, while reusing the existing univariate learners from tspredit.
ts_regsw_mv(model_y, models_x, window_size = 30)ts_regsw_mv(model_y, models_x, window_size = 30)
model_y |
A |
models_x |
Named list with one |
window_size |
Integer. Base window size available to each variable. |
ts_regsw_mv() is the first multivariate forecasting orchestrator in
tspredit. It keeps the package centered on a target variable y, while
allowing every auxiliary variable x1, ..., xn to be forecast by its own
pipeline.
The workflow is:
store aligned multivariate observations in ts_data_mv()
define one ts_mv_spec() for y
define one ts_mv_spec() for each x
fit the composed system with fit()
forecast recursively with predict(..., steps_ahead = h)
The current implementation keeps a single window_size as the base temporal
memory available to every variable. After that, each specification decides
which variables and which lag positions are actually used by its learner.
This means the multivariate extension does not replace the existing univariate models. It reuses them as polymorphic building blocks.
Supported configurations in this first version:
the target model must inherit from ts_regsw
auxiliary models may inherit from ts_regsw or from ts_reg
raw-series auxiliary models such as ts_arima() currently use only their
own variable as input
The method returns the forecast of y as a numeric vector. The recursive
path of y and all auxiliary predictions is attached to that vector as
attributes, so the interface stays target-centered without discarding the
system forecast.
A ts_regsw_mv object.
data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = x2), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_regsw_mv( model_y = ts_mv_spec( ts_mlp(ts_norm_an(), input_size = 4, size = 4, decay = 0), variables = c("y", "x1", "x2"), transforms = list(y = ts_fil_ma(3)) ), models_x = list( x1 = ts_mv_spec(ts_deterministic("periodic", period = 7)), x2 = ts_mv_spec(ts_deterministic("periodic", period = 7)) ), window_size = 10 ) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 1) predict(model, steps_ahead = 5) pred <- predict(model, steps_ahead = 5) attr(pred, "system")data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = x2), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_regsw_mv( model_y = ts_mv_spec( ts_mlp(ts_norm_an(), input_size = 4, size = 4, decay = 0), variables = c("y", "x1", "x2"), transforms = list(y = ts_fil_ma(3)) ), models_x = list( x1 = ts_mv_spec(ts_deterministic("periodic", period = 7)), x2 = ts_mv_spec(ts_deterministic("periodic", period = 7)) ), window_size = 10 ) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 1) predict(model, steps_ahead = 5) pred <- predict(model, steps_ahead = 5) attr(pred, "system")
Create a time series prediction object that uses Random Forest regression on sliding-window inputs.
It wraps the randomForest package to fit an ensemble of decision trees.
ts_rf( preprocess = NA, input_size = NA, input_map = ts_lagmap(), nodesize = 1, ntree = 100, mtry = NULL )ts_rf( preprocess = NA, input_size = NA, input_map = ts_lagmap(), nodesize = 1, ntree = 100, mtry = NULL )
preprocess |
Normalization preprocessor (e.g., |
input_size |
Integer. Number of lagged inputs used by the model. |
input_map |
Lag-selection strategy object created by |
nodesize |
Integer. Minimum terminal node size. |
ntree |
Integer. Number of trees in the forest. |
mtry |
Integer. Number of variables randomly sampled at each split. |
Random Forests reduce variance by averaging many decorrelated trees.
For tabular sliding-window features, they can capture nonlinearities and
interactions without heavy feature engineering. Consider normalizing inputs
for comparability across windows and tuning mtry, ntree, and nodesize.
In recursive multi-step forecasting, very small forests can be unstable, so
the default uses a moderately larger ensemble.
A ts_rf object (S3) inheriting from ts_regsw.
L. Breiman (2001). Random forests. Machine Learning, 45(1), 5–32.
# Example: sliding-window Random Forest # Load tools and data library(daltoolbox) library(tspredit) data(tsd) # Turn series into 10-lag windows and preview ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Train/test split and (X, y) projection samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define Random Forest and fit model <- ts_rf(ts_norm_gminmax(), input_size = 9, nodesize = 1, ntree = 100) model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Forecast multiple steps and assess error prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Example: sliding-window Random Forest # Load tools and data library(daltoolbox) library(tspredit) data(tsd) # Turn series into 10-lag windows and preview ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Train/test split and (X, y) projection samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define Random Forest and fit model <- ts_rf(ts_norm_gminmax(), input_size = 9, nodesize = 1, ntree = 100) model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Forecast multiple steps and assess error prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Split a time-series representation into train and test sets.
Extracts test_size rows from the end (minus an optional offset) as the
test set. The remaining initial rows form the training set. The offset
is useful to reproduce experiments with different forecast origins.
For sliding-window workflows, the most coherent usage is to materialize the lagged representation first and split it afterwards. This preserves the lag context required by the earliest rows of the test partition, mirroring the package's univariate forecasting examples.
ts_sample(ts, test_size = 1, offset = 0)ts_sample(ts, test_size = 1, offset = 0)
ts |
A |
test_size |
Integer. Number of rows in the test split (default = 1). |
offset |
Integer. Offset from the end before the test split (default = 0). |
A list with $train and $test (both ts_data).
# Setting up a ts_data and making a temporal split # Load example dataset and build windows data(tsd) ts <- ts_data(tsd$y, 10) # Separating into train and test test_size <- 3 samp <- ts_sample(ts, test_size) # First five rows from training data ts_head(samp$train, 5) # Last five rows from training data ts_head(samp$train[-c(1:(nrow(samp$train)-5)),]) # Testing data ts_head(samp$test)# Setting up a ts_data and making a temporal split # Load example dataset and build windows data(tsd) ts <- ts_data(tsd$y, 10) # Separating into train and test test_size <- 3 samp <- ts_sample(ts, test_size) # First five rows from training data ts_head(samp$train, 5) # Last five rows from training data ts_head(samp$train[-c(1:(nrow(samp$train)-5)),]) # Testing data ts_head(samp$test)
Create a time series prediction object that uses Support Vector Regression (SVR) on sliding-window inputs.
It wraps the e1071 package to fit epsilon-insensitive regression with
linear, radial, polynomial, or sigmoid kernels.
ts_svm( preprocess = NA, input_size = NA, input_map = ts_lagmap(), kernel = c("radial", "linear", "polynomial", "sigmoid"), epsilon = 0, cost = 10 )ts_svm( preprocess = NA, input_size = NA, input_map = ts_lagmap(), kernel = c("radial", "linear", "polynomial", "sigmoid"), epsilon = 0, cost = 10 )
preprocess |
Normalization preprocessor (e.g., |
input_size |
Integer. Number of lagged inputs used by the model. |
input_map |
Lag-selection strategy object created by |
kernel |
Character. One of 'linear', 'radial', 'polynomial', 'sigmoid'. |
epsilon |
Numeric. Epsilon-insensitive loss width. |
cost |
Numeric. Regularization parameter controlling margin violations. |
SVR aims to find a function with at most epsilon deviation from
each training point while being as flat as possible. The cost parameter
controls the trade-off between margin width and violations; epsilon
controls the insensitivity tube width. RBF kernels often work well for
nonlinear series; tune cost, epsilon, and kernel hyperparameters.
A ts_svm object (S3) inheriting from ts_regsw.
C. Cortes and V. Vapnik (1995). Support-Vector Networks. Machine Learning, 20, 273–297.
# Example: SVR with min–max normalization # Load package and dataset library(daltoolbox) library(tspredit) data(tsd) # Create sliding windows and preview ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Temporal split and (X, y) projection samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define SVM regressor and fit to training data model <- ts_svm( ts_norm_gminmax(), input_size = 4, input_map = ts_lagmap("seasonal", seasonality = 4) ) model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Multi-step forecast and evaluation prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Example: SVR with min–max normalization # Load package and dataset library(daltoolbox) library(tspredit) data(tsd) # Create sliding windows and preview ts <- ts_data(tsd$y, 10) ts_head(ts, 3) # Temporal split and (X, y) projection samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define SVM regressor and fit to training data model <- ts_svm( ts_norm_gminmax(), input_size = 4, input_map = ts_lagmap("seasonal", seasonality = 4) ) model <- daltoolbox::fit(model, x = io_train$input, y = io_train$output) # Multi-step forecast and evaluation prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Create a ts_tune object for hyperparameter tuning of a
time series model.
Sets up a cross-validated search over hyperparameter ranges and input sizes for a base model. Results include the evaluated configurations and the selected best configuration.
ts_tune(input_size, base_model, folds = 10, ranges = NULL)ts_tune(input_size, base_model, folds = 10, ranges = NULL)
input_size |
Integer vector. Candidate input window sizes. |
base_model |
Base model object to tune (e.g., |
folds |
Integer. Number of cross-validation folds. |
ranges |
Named list of hyperparameter ranges to explore. |
A ts_tune object.
R. Kohavi (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI.
Salles, R., Pacitti, E., Bezerra, E., Marques, C., Pacheco, C., Oliveira, C., Porto, F., Ogasawara, E. (2023). TSPredIT: Integrated Tuning of Data Preprocessing and Time Series Prediction Models. Lecture Notes in Computer Science.
# Example: grid search over input_size and ELM hyperparameters # Load library and example data library(daltoolbox) library(tspredit) data(tsd) # Prepare 10-lag windows and split into train/test ts <- ts_data(tsd$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define tuning: vary input_size and ELM hyperparameters (nhid, actfun) tune <- ts_tune( input_size = 3:5, base_model = ts_elm(ts_norm_gminmax()), ranges = list(nhid = 1:5, actfun = c('purelin')) ) # Run CV-based search and get the best fitted model model <- daltoolbox::fit(tune, x = io_train$input, y = io_train$output) # Forecast and evaluate on the held-out horizon prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test# Example: grid search over input_size and ELM hyperparameters # Load library and example data library(daltoolbox) library(tspredit) data(tsd) # Prepare 10-lag windows and split into train/test ts <- ts_data(tsd$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) # Define tuning: vary input_size and ELM hyperparameters (nhid, actfun) tune <- ts_tune( input_size = 3:5, base_model = ts_elm(ts_norm_gminmax()), ranges = list(nhid = 1:5, actfun = c('purelin')) ) # Run CV-based search and get the best fitted model model <- daltoolbox::fit(tune, x = io_train$input, y = io_train$output) # Forecast and evaluate on the held-out horizon prediction <- predict(model, x = io_test$input[1,], steps_ahead = 5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- daltoolbox::evaluate(model, output, prediction) ev_test
Create a target-centered vector autoregression over aligned multivariate observations.
ts_var(target = NULL, p = NULL, p_max = 5, intercept = TRUE)ts_var(target = NULL, p = NULL, p_max = 5, intercept = TRUE)
target |
Optional target variable name. When omitted, use the |
p |
Optional lag order. When |
p_max |
Maximum lag order considered in the automatic search. |
intercept |
Logical. Whether to include an intercept in each equation. |
ts_var() models the multivariate system directly, but keeps the tspredit
interface centered on a distinguished target variable y.
This means:
the full system is learned jointly
predict() returns the target forecast by default
predict(..., return_all = TRUE) exposes the forecast path of all system
variables
The current implementation uses ordinary least squares over lagged aligned
observations and can choose the lag order automatically by minimizing AICc
over 1:p_max.
This makes ts_var() conceptually different from ts_arimax():
ts_arimax() treats the auxiliaries as regressors for one main target
ts_var() treats all variables as part of the dynamic system
Even so, tspredit still lets the user mark one variable as the main target
for evaluation and default return behavior.
A ts_var object inheriting from ts_reg_mv.
Lütkepohl H (2005). New Introduction to Multiple Time Series Analysis. Springer.
Tsay RS (2014). Multivariate Time Series Analysis with R and Financial Applications. Wiley.
data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_var(p_max = 3) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 5)data(tsd) x1 <- c(tsd$y[-1], tail(tsd$y, 1)) x2 <- stats::filter(tsd$y, rep(1/3, 3), sides = 1) x2[is.na(x2)] <- tsd$y[is.na(x2)] mv <- ts_data_mv(data.frame(y = tsd$y, x1 = x1, x2 = as.numeric(x2)), y = "y") samp <- ts_sample(mv, test_size = 5) model <- ts_var(p_max = 3) model <- daltoolbox::fit(model, samp$train) predict(model, steps_ahead = 5)
Create a window-based ARMA-inspired regressor with local
stepwise normalization for the ts_regsw workflow.
ts_warma( preprocess = ts_norm_none(), input_size = NA, input_map = ts_lagmap(), steps = NA, intercept = TRUE )ts_warma( preprocess = ts_norm_none(), input_size = NA, input_map = ts_lagmap(), steps = NA, intercept = TRUE )
preprocess |
External preprocessing object applied before the WARMA local
steps. Defaults to |
input_size |
Integer. Number of lagged inputs used by the model. |
input_map |
Lag-selection strategy object created by |
steps |
Integer in |
intercept |
Logical. Whether to include an intercept in the linear model fitted over the locally normalized lagged inputs. |
ts_warma() is a tspredit implementation inspired by the WARMA proposal:
a window-based view of non-stationary series in which local preprocessing is
interpreted in steps.
In this adaptation:
step 0 leaves the local window unchanged
step 1 subtracts the local mean of each window
step 2 subtracts the local mean and scales by the local standard
deviation
The implementation follows the package's sliding-window lineage, so it uses
the fully overlapping window regime naturally induced by ts_data(..., sw)
and ts_regsw. The resulting representation is then modeled with a linear
regressor over the normalized lagged inputs.
This makes ts_warma() a computationally light competitor to ts_darima()
and a practical univariate block for the multivariate target-centered
workflow.
When steps = NA, the model chooses the smallest step in {0, 1, 2} whose
locally transformed reconstructed series reaches integration order zero
according to forecast::ndiffs().
The current implementation should be understood as the tspredit
interpretation of WARMA inside the package's object-oriented
sliding-window pipeline. In other words, it is an adaptation aligned with
ts_regsw, not a separate estimation framework detached from the rest of the
library.
A ts_warma object inheriting from ts_regsw.
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015). Time Series Analysis: Forecasting and Control. Wiley.
Hyndman RJ, Athanasopoulos G (2021). Forecasting: Principles and Practice. Third Edition. OTexts. https://otexts.com/fpp3/
Ogasawara E, Pereira ACM, Bernardes GFR, Brandão AAF, Albuquerque MP (2010). Adaptive normalization: A novel data normalization approach for non-stationary time series. IJCNN.
Local WARMA manuscript used as implementation reference: 2026_04_SBBD_WARMA.pdf.
data(tsd) ts <- ts_data(tsd$y, 8) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_warma(input_size = 5, steps = NA) model <- daltoolbox::fit(model, io_train$input, io_train$output) prediction <- predict(model, io_test$input[1, ], steps_ahead = 5) predictiondata(tsd) ts <- ts_data(tsd$y, 8) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_warma(input_size = 5, steps = NA) model <- daltoolbox::fit(model, io_train$input, io_train$output) prediction <- predict(model, io_test$input[1, ], steps_ahead = 5) prediction
Utility object that groups helper functions used by the adaptive
normalization family implemented in ts_norm_an().
tsanutils()tsanutils()
These helpers separate the mathematical operators from the training flow of the preprocessor itself.
Stabilization helpers
an_stabilize_level() avoids unstable divisive normalization when the
adaptive reference is close to zero.
an_reference_scale() blends local dispersion and local level to create
a smooth transition between additive and relative normalization regimes.
Adaptive normalization operators
an_divide() and an_divide_inverse() implement divisive adaptive
normalization.
an_subtract() and an_subtract_inverse() implement subtractive adaptive
normalization.
an_softdivide() and an_softdivide_inverse() implement the stabilized
hybrid operator based on a blended reference scale.
an_asinh() and an_asinh_inverse() implement the inverse-hyperbolic-sine
adaptive contrast around the local reference level.
This organization makes it easier to keep ts_norm_an() readable and to
compare operators as explicit members of the same adaptive-normalization
family.
A tsanutils object exposing the helper functions.
Ogasawara, E., Martinez, L. C., De Oliveira, D., Zimbrão, G., Pappa, G. L., Mattoso, M. (2010). Adaptive Normalization: A novel data normalization approach for non-stationary time series. Proceedings of the International Joint Conference on Neural Networks (IJCNN). doi:10.1109/IJCNN.2010.5596746
Huber PJ (1964). Robust Estimation of a Location Parameter. Annals of Mathematical Statistics, 35(1), 73-101. doi:10.1214/aoms/1177703732
Burbidge JB, Magee L, Robb AL (1988). Alternative Transformations to Handle Extreme Values of the Dependent Variable. Journal of the American Statistical Association, 83(401), 123-127.
Bellemare MF, Wichman CJ (2020). Elasticities and the Inverse Hyperbolic Sine Transformation. Oxford Bulletin of Economics and Statistics, 82(1), 50-61. doi:10.1111/obes.12325
utils <- tsanutils() center <- c(0.1, 2) scale_value <- c(0.2, 0.5) values <- c(0.15, 2.3) utils$an_divide(list(epsilon = 1e-8), values, center, scale_value) utils$an_softdivide(list(lambda = 1, epsilon = 1e-8), values, center, scale_value)utils <- tsanutils() center <- c(0.1, 2) scale_value <- c(0.2, 0.5) values <- c(0.15, 2.3) utils$an_divide(list(epsilon = 1e-8), values, center, scale_value) utils$an_softdivide(list(lambda = 1, epsilon = 1e-8), values, center, scale_value)
A synthetic univariate time series used throughout the introductory
tspredit examples.
x: regular time index from 0 to 10.
y: smooth sine-based signal used as the forecasting target.
data(tsd)data(tsd)
A data frame with 100 rows and 2 columns:
Numeric time index.
Numeric response series used in forecasting demonstrations.
tsd is the smallest dataset distributed with tspredit and acts as the
didactic entry point for the package. It is intentionally simple so the reader
can focus on the mechanics of sliding windows, train/test splitting,
preprocessing, and prediction workflows before moving to larger benchmark
collections documented in R/tspredbench.R, including EUNITE.Loads,
EUNITE.Reg, EUNITE.Temp, ipeadata.d, ipeadata.m, NN3, NN5,
CATS, SantaFe.A, SantaFe.D, bioenergy, climate, emissions,
fertilizers, gdp, m1, m3, m4, pesticides, and stocks.
Generated for package documentation and examples.
# Load dataset and inspect the first rows data(tsd) head(tsd) # Plot the target series used in the examples ts.plot(tsd$y, ylab = "Value", xlab = "Index", main = "Synthetic example series")# Load dataset and inspect the first rows data(tsd) head(tsd) # Plot the target series used in the examples ts.plot(tsd$y, ylab = "Value", xlab = "Index", main = "Synthetic example series")
Utility object that groups helper functions used to select lag subsets for sliding-window predictors.
tslagutils()tslagutils()
These helpers are organized by the type of evidence they use to choose lags.
Positional mappings
lag_recent() keeps the most recent lags and reproduces the package's
original behavior.
lag_even() spreads the selected lags evenly across the available window.
lag_geom() emphasizes recent lags while still sampling older history on
a geometric scale.
Correlation-driven mappings
lag_acf() ranks lags by the absolute autocorrelation of the reconstructed
training series.
lag_pacf() ranks lags by the absolute partial autocorrelation.
lag_peaks() keeps local maxima of the ACF or PACF profile to avoid
selecting many redundant neighboring lags.
lag_seasonal() prioritizes multiples of an estimated or user-provided
seasonal period.
lag_acf_seasonal() and lag_pacf_seasonal() combine seasonal lags with
correlation-based completion.
lag_blocks() expands neighborhoods around the strongest correlation peaks.
Supervised mappings
lag_mi() ranks lags by discretized mutual information with the target.
lag_mrmr() greedily maximizes relevance to the target while reducing
redundancy among already selected lags.
The mutual-information criteria use quantile discretization and therefore provide deterministic approximations suitable for lightweight dependency-free lag selection inside the package.
A tslagutils object exposing the helper functions.
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015). Time Series Analysis: Forecasting and Control. Fifth Edition. Wiley.
Hyndman RJ, Athanasopoulos G (2021). Forecasting: Principles and Practice. Third Edition. OTexts. https://otexts.com/fpp3/
Peng H, Long F, Ding C (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226-1238. doi:10.1109/TPAMI.2005.159
Leites J, Cerqueira V, Soares C (2024). Selecting time lags for time series forecasting: an empirical study. arXiv:2405.11237.
utils <- tslagutils() # Positional baselines utils$lag_recent(total = 9, input_size = 4) utils$lag_even(total = 9, input_size = 4) # Reconstruct a raw series from sliding windows and aligned outputs data(tsd) ts <- ts_data(tsd$y, 10) io <- ts_projection(ts) series <- utils$reconstruct_series(io$input, io$output) head(series) # Correlation profile over available lags utils$score_acf(series, lag_max = 9)utils <- tslagutils() # Positional baselines utils$lag_recent(total = 9, input_size = 4) utils$lag_even(total = 9, input_size = 4) # Reconstruct a raw series from sliding windows and aligned outputs data(tsd) ts <- ts_data(tsd$y, 10) io <- ts_projection(ts) series <- utils$reconstruct_series(io$input, io$output) head(series) # Correlation profile over available lags utils$score_acf(series, lag_max = 9)