| Title: | Drift Adaptable Models |
|---|---|
| Description: | In streaming data analysis, it is crucial to detect significant shifts in the data distribution or the accuracy of predictive models over time, a phenomenon known as concept drift. The package aims to identify when concept drift occurs and provide methodologies for adapting models in non-stationary environments. It offers a range of state-of-the-art techniques for detecting concept drift and maintaining model performance. Additionally, the package provides tools for adapting models in response to these changes, ensuring continuous and accurate predictions in dynamic contexts. Methods for concept drift detection are described in Tavares (2022) <doi:10.1007/s12530-021-09415-z>. |
| Authors: | Lucas Tavares [aut], Leonardo Carvalho [aut], Rodrigo Machado [aut], Diego Carvalho [ctb], Esther Pacitti [ctb], Fabio Porto [ctb], Eduardo Ogasawara [aut, ths, cre] (ORCID: <https://orcid.org/0000-0002-0466-0626>), CEFET/RJ [cph] |
| Maintainer: | Eduardo Ogasawara <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2.727 |
| Built: | 2026-05-13 06:39:10 UTC |
| Source: | https://github.com/cefet-rj-dal/heimdall |
ADWIN (Adaptive Windowing) is a sequential change detector that maintains a variable-length window and tests whether the means of two subwindows differ significantly. In this package, the implementation is primarily used for virtual concept drift when it monitors a numeric feature stream, although the same mechanism can also detect real concept drift if applied to an error or loss stream. The theoretical basis follows Bifet and Gavaldà (2007) https://doi.org/10.1137/1.9781611972771.42.
dfr_adwin(target_feat = NULL, delta = 2e-05)dfr_adwin(target_feat = NULL, delta = 2e-05)
target_feat |
Feature to be monitored. |
delta |
The significance parameter for the ADWIN algorithm. |
dfr_adwin object
Bifet, A., and Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining, 443-448. https://doi.org/10.1137/1.9781611972771.42
#Use the same example of dfr_cumsum changing the constructor to: #model <- dfr_adwin(target_feat='serie')#Use the same example of dfr_cumsum changing the constructor to: #model <- dfr_adwin(target_feat='serie')
AEDD is an unsupervised multivariate detector that compares reconstruction errors produced by an autoencoder on reference and recent windows. Because it monitors changes in the input distribution rather than classifier performance, this implementation is primarily aimed at virtual concept drift. The method follows Kaminskyi, Li, and Muller (2022) https://doi.org/10.1109/ICDMW58026.2022.00109.
dfr_aedd( encoding_size, ae_class = autoenc_encode_decode, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, window_size = 100, monitoring_step = 1700, criteria = "mann_whitney", alpha = 0.01, reporting = FALSE )dfr_aedd( encoding_size, ae_class = autoenc_encode_decode, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, window_size = 100, monitoring_step = 1700, criteria = "mann_whitney", alpha = 0.01, reporting = FALSE )
encoding_size |
Encoding Size |
ae_class |
Autoencoder Class |
batch_size |
Batch Size for batch learning |
num_epochs |
Number of Epochs for training |
learning_rate |
Learning Rate |
window_size |
Size of the most recent data to be used |
monitoring_step |
The number of rows that the drifter waits to be is updated |
criteria |
The method to be used to check if there is a drift. May be mann_whitney (default), kolmogorov_smirnov, levene, parametric_threshold, nonparametric_threshold |
alpha |
The significance threshold for the statistical test used in criteria |
reporting |
If TRUE, some data are returned as norm_x_oh, drift_input, hist_proj, and recent_proj. |
dfr_aedd object
Kaminskyi, D., Li, B., and Muller, E. (2022). Reconstruction-based unsupervised drift detection over multivariate streaming data. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW). https://doi.org/10.1109/ICDMW58026.2022.00109
#See an example of using `dfr_aedd` at this #https://github.com/cefet-rj-dal/heimdall/blob/main/multivariate/dfr_aedd.md#See an example of using `dfr_aedd` at this #https://github.com/cefet-rj-dal/heimdall/blob/main/multivariate/dfr_aedd.md
CUSUM is a sequential analysis procedure that accumulates deviations in a monitored signal and raises an alarm when the cumulative evidence exceeds a threshold. In this package, the detector is implemented as an error-based monitor, so it is primarily intended for real concept drift affecting predictive performance. The concept-drift adaptation follows the sequential change-detection literature discussed by Muthukrishnan, Berg, and Wu (2007) https://doi.org/10.1109/ICDMW.2007.89.
dfr_cusum(lambda = 100)dfr_cusum(lambda = 100)
lambda |
Necessary level for warning zone (2 standard deviation) |
dfr_cusum object
Muthukrishnan, S., Berg, E., and Wu, Y. (2007). Sequential change detection on data streams. In Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007). https://doi.org/10.1109/ICDMW.2007.89
library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_cusum() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_cusum() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
DDM monitors the online error rate of a predictive model under the PAC-learning assumption that, in a stationary environment, the error should decrease or remain stable as more samples are observed. Because it operates on the classifier error stream, it is primarily a detector of real concept drift. The method follows Gama et al. (2004) https://doi.org/10.1007/978-3-540-28645-5_29.
dfr_ddm(min_instances = 30, warning_level = 2, out_control_level = 3)dfr_ddm(min_instances = 30, warning_level = 2, out_control_level = 3)
min_instances |
The minimum number of instances before detecting change |
warning_level |
Necessary level for warning zone (2 standard deviation) |
out_control_level |
Necessary level for a positive drift detection |
dfr_ddm object
Gama, J., Medas, P., Castillo, G., and Rodrigues, P. P. (2004). Learning with drift detection. In Advances in Artificial Intelligence - SBIA 2004, 286-295. https://doi.org/10.1007/978-3-540-28645-5_29
library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_ddm() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_ddm() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
ECDD applies an exponentially weighted moving average (EWMA) control chart to the online classification error stream. Since it monitors predictive errors directly, it is primarily designed to detect real concept drift. The method follows Ross et al. (2012), who adapted EWMA charts for concept-drift detection in streaming classifiers https://doi.org/10.1016/j.patrec.2011.08.019.
dfr_ecdd(lambda = 0.2, min_run_instances = 30, average_run_length = 100)dfr_ecdd(lambda = 0.2, min_run_instances = 30, average_run_length = 100)
lambda |
EWMA smoothing parameter |
min_run_instances |
The minimum number of instances before detecting change |
average_run_length |
Desired Average Run Length (ARL) |
dfr_ecdd object
Ross, G. J., Adams, N. M., Tasoulis, D. K., and Hand, D. J. (2012). Exponentially weighted moving average charts for detecting concept drift. Pattern Recognition Letters, 33(2), 191-198. https://doi.org/10.1016/j.patrec.2011.08.019
library(daltoolbox) library(heimdall) # This example uses an error-based drift detector where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_ecdd() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses an error-based drift detector where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_ecdd() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
EDDM extends DDM by monitoring the distance between classification errors instead of only the error rate, which makes it more sensitive to gradual degradation. Because it operates on the model error stream, it is primarily intended for real concept drift. The method follows Baena-Garcia et al. (2006), who proposed EDDM for improved detection of gradual drift.
dfr_eddm( min_instances = 30, min_num_errors = 30, warning_level = 0.95, out_control_level = 0.9 )dfr_eddm( min_instances = 30, min_num_errors = 30, warning_level = 0.95, out_control_level = 0.9 )
min_instances |
The minimum number of instances before detecting change |
min_num_errors |
The minimum number of errors before detecting change |
warning_level |
Necessary level for warning zone |
out_control_level |
Necessary level for a positive drift detection |
dfr_eddm object
Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Gavaldà, R., and Morales-Bueno, R. (2006). Early drift detection method. In Fourth International Workshop on Knowledge Discovery from Data Streams.
library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_eddm() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_eddm() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
HDDM_A is a sequential detector based on Hoeffding's inequality that tests whether the mean of the monitored error stream has increased beyond statistically expected fluctuations. Because this implementation is error-based, it is primarily targeted at real concept drift. The theoretical basis follows Frias-Blanco et al. (2015) https://doi.org/10.1109/TKDE.2014.2345382.
dfr_hddm( drift_confidence = 0.001, warning_confidence = 0.005, two_side_option = TRUE )dfr_hddm( drift_confidence = 0.001, warning_confidence = 0.005, two_side_option = TRUE )
drift_confidence |
Confidence to the drift |
warning_confidence |
Confidence to the warning |
two_side_option |
Option to monitor error increments and decrements (two-sided) or only increments (one-sided) |
dfr_hddm object
Frias-Blanco, I., del Campo-Avila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Diaz, A., and Caballero-Mota, Y. (2015). Online and nonparametric drift detection methods based on Hoeffding's bounds. IEEE Transactions on Knowledge and Data Engineering, 27(3), 810-823. https://doi.org/10.1109/TKDE.2014.2345382
library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_hddm() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses an error-based drift detector with a synthetic a # model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_hddm() detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$prediction)){ output <- update_state(output$obj, data$prediction[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
Implements Inactive Dummy Detector
dfr_inactive()dfr_inactive()
Drifter object
# See ?hcd_ddm for an example of DDM drift detector# See ?hcd_ddm for an example of DDM drift detector
This detector compares consecutive reference and recent windows through the Kullback-Leibler divergence estimated from their empirical distributions. In this package, it is primarily used for virtual concept drift, since it monitors changes in the distribution of a numeric feature stream rather than predictive error. The statistical foundation is the Kullback-Leibler divergence introduced by Kullback and Leibler (1951).
dfr_kldist(target_feat = NULL, window_size = 100, p_th = 0.05, data = NULL)dfr_kldist(target_feat = NULL, window_size = 100, p_th = 0.05, data = NULL)
target_feat |
Feature to be monitored. |
window_size |
Size of the sliding window |
p_th |
Drift threshold applied to the KL divergence |
data |
Already collected data to avoid cold start. |
dfr_kldist object
Kullback, S., and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79-86. https://doi.org/10.1214/aoms/1177729694
library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_kldist(target_feat='serie') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_kldist(target_feat='serie') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
KSWIN applies a Kolmogorov-Smirnov test between a recent window and a reference sample drawn from older observations. In this package, the method is primarily used for virtual concept drift, because it monitors distributional changes in a numeric feature stream. The method follows Raab et al. (2020) https://doi.org/10.1016/j.neucom.2019.11.111.
dfr_kswin( target_feat = NULL, window_size = 1500, stat_size = 500, alpha = 1e-07, data = NULL )dfr_kswin( target_feat = NULL, window_size = 1500, stat_size = 500, alpha = 1e-07, data = NULL )
target_feat |
Feature to be monitored. |
window_size |
Size of the sliding window (must be > 2*stat_size) |
stat_size |
Size of the statistic window |
alpha |
Probability for the test statistic of the Kolmogorov-Smirnov-Test The alpha parameter is very sensitive, therefore should be set below 0.01. |
data |
Already collected data to avoid cold start. |
dfr_kswin object
Raab, C., Heusinger, M., and Schleif, F.-M. (2020). Reactive soft prototype computing for concept drift streams. Neurocomputing, 416, 340-351. https://doi.org/10.1016/j.neucom.2019.11.111
library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_kswin(target_feat='serie') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_kswin(target_feat='serie') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
LBDD is a window-based detector that compares the variability of reference and recent samples using Levene's test. Because it monitors changes in the distribution of an observed feature rather than model performance, it is primarily aimed at virtual concept drift. In this package, the detector follows the statistical-testing approach discussed by Giusti et al. (2021) for drift analysis, using Levene's variance test as its core mechanism.
dfr_lbdd(target_feat = NULL, alpha = 0.01, window_size = 1500)dfr_lbdd(target_feat = NULL, alpha = 0.01, window_size = 1500)
target_feat |
Feature to be monitored |
alpha |
Probability theshold for the test statistic |
window_size |
Size of the sliding window |
dfr_lbdd object
Giusti, L., Carvalho, L., Gomes, A. T., Coutinho, R., Soares, J., and Ogasawara, E. (2021). Analysing flight delay under concept drift. Evolving Systems. https://doi.org/10.1007/s12530-021-09415-z
library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_lbdd(target_feat='depart_visibility') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_lbdd(target_feat='depart_visibility') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
MCDD is a window-based detector that compares the location of reference and recent samples by means of hypothesis tests on their central tendency. Because it monitors the distribution of observed features rather than predictive errors, it is primarily intended for virtual concept drift. In this package, the detector follows the statistical-testing perspective adopted by Giusti et al. (2021) for drift analysis.
dfr_mcdd(target_feat = NULL, alpha = 1e-08, window_size = 1500)dfr_mcdd(target_feat = NULL, alpha = 1e-08, window_size = 1500)
target_feat |
Feature to be monitored |
alpha |
Probability theshold for all test statistics |
window_size |
Size of the sliding window |
dfr_mcdd object
Giusti, L., Carvalho, L., Gomes, A. T., Coutinho, R., Soares, J., and Ogasawara, E. (2021). Analysing flight delay under concept drift. Evolving Systems. https://doi.org/10.1007/s12530-021-09415-z
library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_mcdd(target_feat='depart_visibility') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example uses a dist-based drift detector with a synthetic dataset. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL model <- dfr_mcdd(target_feat='depart_visibility') detection <- NULL output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, data.frame(idx=i, event=output$drift, type=type)) } detection[detection$type == 'drift',]
Implements Multi Criteria drift detectors
dfr_multi_criteria(drifter_list, combination = "or", fuzzy_window = 10)dfr_multi_criteria(drifter_list, combination = "or", fuzzy_window = 10)
drifter_list |
List of drifters to combine. |
combination |
How the drifters will be combined. Possible values: 'fuzzy', 'or', 'and'. |
fuzzy_window |
Sets the fuzzy window size. Only if combination = 'fuzzy'. |
Drifter object
The Page-Hinkley test is a sequential change-point detector that monitors cumulative deviations from a running mean and signals a change when those deviations grow persistently. In this package, the implementation is primarily used for virtual concept drift when it monitors a numeric feature stream, although the same statistic can also be applied to error streams to detect real concept drift. The method is based on Page (1954) and the later streaming adaptation popularized in data-stream mining.
dfr_page_hinkley( target_feat = NULL, min_instances = 30, delta = 0.005, threshold = 50, alpha = 1 - 1e-04 )dfr_page_hinkley( target_feat = NULL, min_instances = 30, delta = 0.005, threshold = 50, alpha = 1 - 1e-04 )
target_feat |
Feature to be monitored. |
min_instances |
The minimum number of instances before detecting change |
delta |
The delta factor for the Page Hinkley test |
threshold |
The change detection threshold (lambda) |
alpha |
The forgetting factor, used to weight the observed value and the mean |
dfr_page_hinkley object
Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1/2), 100-115. https://doi.org/10.2307/2333009
library(daltoolbox) library(heimdall) # This example assumes a model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_page_hinkley(target_feat='serie') detection <- c() output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, list(idx=i, event=output$drift, type=type)) } detection <- as.data.frame(detection) detection[detection$type == 'drift',]library(daltoolbox) library(heimdall) # This example assumes a model residual where 1 is an error and 0 is a correct prediction. data(st_drift_examples) data <- st_drift_examples$univariate data$event <- NULL data$prediction <- st_drift_examples$univariate$serie > 4 model <- dfr_page_hinkley(target_feat='serie') detection <- c() output <- list(obj=model, drift=FALSE) for (i in 1:length(data$serie)){ output <- update_state(output$obj, data$serie[i]) if (output$drift){ type <- 'drift' output$obj <- reset_state(output$obj) }else{ type <- '' } detection <- rbind(detection, list(idx=i, event=output$drift, type=type)) } detection <- as.data.frame(detection) detection[detection$type == 'drift',]
Implements Passive Dummy Detector
dfr_passive()dfr_passive()
Drifter object
# See ?hcd_ddm for an example of DDM drift detector# See ?hcd_ddm for an example of DDM drift detector
Implements Distribution Based drift detectors
dist_based(target_feat)dist_based(target_feat)
target_feat |
Feature to be monitored. |
Drifter object
Ancestor class for drift detection
drifter()drifter()
Drifter object
# See ?dd_ddm for an example of DDM drift detector# See ?dd_ddm for an example of DDM drift detector
Implements Error Based drift detectors
error_based()error_based()
Drifter object
# See ?hcd_ddm for an example of DDM drift detector# See ?hcd_ddm for an example of DDM drift detector
Process Batch
## S3 method for class 'drifter' fit(obj, data, prediction, ...)## S3 method for class 'drifter' fit(obj, data, prediction, ...)
obj |
Drifter object |
data |
data batch in data frame format |
prediction |
prediction batch as vector format |
... |
opitional arguments |
updated Drifter object
Ancestor class for metric calculation
metric()metric()
Metric object
# See ?metric for an example of DDM drift detector# See ?metric for an example of DDM drift detector
Class for accuracy calculation
mt_accuracy()mt_accuracy()
Metric object
# See ?mt_accuracy for an example of Accuracy Calculator# See ?mt_accuracy for an example of Accuracy Calculator
Class for FScore calculation
mt_fscore(f = 1)mt_fscore(f = 1)
f |
The F parameter for the F-Score metric |
Metric object
# See ?mt_fscore for an example of FScore Calculator# See ?mt_fscore for an example of FScore Calculator
Class for precision calculation
mt_precision()mt_precision()
Metric object
# See ?mt_precision for an example of Precision Calculator# See ?mt_precision for an example of Precision Calculator
Class for recall calculation
mt_recall()mt_recall()
Metric object
# See ?mt_recall for an example of Recall Calculator# See ?mt_recall for an example of Recall Calculator
Class for QOC AUC calculation
mt_rocauc()mt_rocauc()
Metric object
# See ?mt_rocauc for an example of ROC AUC Calculator# See ?mt_rocauc for an example of ROC AUC Calculator
Implements Multivariate Distribution Based drift detectors
mv_dist_based()mv_dist_based()
Drifter object
Ancestor class for normalization techniques
norm(norm_class)norm(norm_class)
norm_class |
Normalizer class |
Norm object
# See ?norm for an example of DDM drift detector# See ?norm for an example of DDM drift detector
Normalizer that has own memory
nrm_memory(norm_class = minmax())nrm_memory(norm_class = minmax())
norm_class |
Normalizer class |
Norm object
# See ?nrm_mimax for an example of Memory Normalizer# See ?nrm_mimax for an example of Memory Normalizer
Reset Drifter State
reset_state(obj)reset_state(obj)
obj |
Drifter object |
updated Drifter object
# See ?hcd_ddm for an example of DDM drift detector# See ?hcd_ddm for an example of DDM drift detector
A list of multivariate time series for drift detection
example1: a bivariate dataset with one multivariate concept drift example
#'
data(st_drift_examples)data(st_drift_examples)
A list of time series.
data(st_drift_examples) dataset <- st_drift_examples$example1data(st_drift_examples) dataset <- st_drift_examples$example1
Ancestor class for drift adaptive models
stealthy( model, drift_method, monitored_features = NULL, norm_class = daltoolbox::zscore(), warmup_size = 100, th = 0.5, target_uni_drifter = FALSE, incremental_memory = TRUE, verbose = FALSE, reporting = FALSE )stealthy( model, drift_method, monitored_features = NULL, norm_class = daltoolbox::zscore(), warmup_size = 100, th = 0.5, target_uni_drifter = FALSE, incremental_memory = TRUE, verbose = FALSE, reporting = FALSE )
model |
The algorithm object to be used for predictions |
drift_method |
The algorithm object to detect drifts |
monitored_features |
List of features that will be monitored by the drifter |
norm_class |
Class used to perform normalization |
warmup_size |
Number of rows used to warmup the drifter. No drift will be detected during this phase |
th |
The threshold to be used with classification algorithms |
target_uni_drifter |
Passes the prediction target to the drifts as the target feat when the drifter is univariate and dist_based. |
incremental_memory |
If true, the model will retrain with all available data whenever the fit is called. If false, it only retrains when a drift is detected. |
verbose |
if TRUE shows drift messages |
reporting |
If TRUE, some data are returned as norm_x_oh, drift_input, hist_proj, and recent_proj. |
Stealthy object
# See ?dd_ddm for an example of DDM drift detector# See ?dd_ddm for an example of DDM drift detector
Update Drifter State
update_state(obj, value)update_state(obj, value)
obj |
Drifter object |
value |
a value that represents a processed batch |
updated Drifter object
# See ?hcd_ddm for an example of DDM drift detector# See ?hcd_ddm for an example of DDM drift detector