skika.hyper_parameter_tuning.drift_detectors.AutoDDM#

Classes

AutoDDM([min_num_instances, warning_level, ...])

Description :

class skika.hyper_parameter_tuning.drift_detectors.AutoDDM.AutoDDM(min_num_instances=30, warning_level=2.0, out_control_level=3.0, default_prob=1, ts_length=20, confidence=0.95, tolerance=1000, c=0.05)#
Description :

AutoDDM is a dirft detector that adjusts the drift thresholds based on prior information. We exploit the periodicity in the data stream when it exists, such that it is more sensitive to true concept drifts while reducing false-positive detections.

Parameters :
min_num_instances: int

The minimum required number of analyzed samples so change can be detected. This is used to avoid false detections during the early moments of the detector, when the weight of one sample is important.

warning_level: float

Warning Level

out_control_level: float

Out-control Level

default_prob: float (0 to 1)

The initial probability when drift detected and reset. Default value 1.

ts_length: int

The length of location buffer. Default value 20.

confidence: float (0 to 1)

The default confidence level. Default value 0.95.

tolerance: int

The tolerance range of matching. Default value 1000. E.g. 500 plus/minus tolerance will match to 500

c: float

A Laplacian constant used in the threshold function. Default value 0.05.

Example

>>> from src.detector.AutoDDM import AutoDDM
>>> import warnings
>>> import time
>>> import numpy as np
>>> from skmultiflow.trees import HoeffdingTreeClassifier
>>> from guppy import hpy
>>> import arff
>>> import pandas
>>> from skmultiflow.data import DataStream
>>>
>>> warnings.filterwarnings('ignore')
>>> plt.style.use("seaborn-whitegrid")
>>>
>>> # Global variable
>>> TRAINING_SIZE = 1
>>> grace = 1000
>>> tolerance = 500
>>>
>>> elec_data = arff.load("elecNormNew.arff")
>>> elec_df = pandas.DataFrame(elec_data)
>>> elec_df.columns = ['date', 'day', 'period', 'nswprice', 'nswdemand', 'vicprice', 'vicdemand', 'transfer', 'class']
>>> mapping = {"day":{"1":1, "2":2, "3":3, "4":4, "5":5, "6":6, "7":7}, "class": {"UP": 0, "DOWN": 1}}
>>> elec_df = elec_df.replace(mapping)
>>> elec_full_df = pandas.concat([elec_df] * 200)
>>> STREAM_SIZE = elec_full_df.shape[0]
>>> elec_stream = DataStream(elec_full_df, name="elec")
>>> elec_stream.prepare_for_use()
>>> X_train, y_train = elec_stream.next_sample(TRAINING_SIZE)
>>> ht = HoeffdingTreeClassifier()
>>> ht.partial_fit(X_train, y_train)
>>> n_global = TRAINING_SIZE  # Cumulative Number of observations
>>> d_ddm = 0
>>> w_ddm = 0
>>> TP_ddm = []
>>> FP_ddm = []
>>> RT_ddm = []
>>> DIST_ddm = []
>>> mem_ddm = []
>>> retrain = False
>>> grace_end = n_global
>>> detect_end = n_global
>>> pred_grace_ht = []
>>> pred_grace_ht_p = []
>>> ht_p = None
>>> ML_accuracy = 0
>>> acc_x = []
>>> acc_y = []
>>> drift_x = []
>>> drift_y = []
>>>
>>> ddm = AutoDDM(tolerance=tolerance)
>>> h = hpy()
>>> while elec_stream.has_more_samples():
>>>     n_global += 1
>>>
>>>     X_test, y_test = elec_stream.next_sample()
>>>     y_predict = ht.predict(X_test)
>>>     ddm_start_time = time.time()
>>>     ddm.add_element(y_test != y_predict, n_global)
>>>     ML_accuracy += 1 if y_test == y_predict else 0
>>>     if (n_global % 100 == 0):
>>>         acc_x.append(n_global)
>>>         acc_y.append(ML_accuracy/n_global)
>>>     ddm_running_time = time.time() - ddm_start_time
>>>     RT_ddm.append(ddm_running_time)
>>>     if (n_global > grace_end):
>>>         if (n_global > detect_end):
>>>             if ht_p is not None:
>>>                 drift_point = detect_end - 2 * grace
>>>                 print("Accuracy of ht: " + str(np.mean(pred_grace_ht)))
>>>                 print("Accuracy of ht_p: " + str(np.mean(pred_grace_ht_p)))
>>>                 if (np.mean(pred_grace_ht_p) > np.mean(pred_grace_ht)):
>>>                     print("TP detected at: " + str(drift_point))
>>>                     TP_ddm.append(drift_point)
>>>                     ddm.detect_TP(drift_point)
>>>                     ht = ht_p
>>>                     drift_x.append(n_global)
>>>                     drift_y.append(ML_accuracy/n_global)
>>>                 else:
>>>                     print("FP detected at: " + str(drift_point))
>>>                     FP_ddm.append(drift_point)
>>>                     ddm.detect_FP(n_global)
>>>                 ht_p = None
>>>                 pred_grace_ht = []
>>>                 pred_grace_ht_p = []
>>>             if ddm.detected_warning_zone():
>>>                 w_ddm += 1
>>>             if ddm.detected_change():
>>>                 d_ddm += 1
>>>                 ht_p = HoeffdingTreeClassifier()
>>>                 grace_end = n_global + grace
>>>                 detect_end = n_global + 2 * grace
>>>         else:
>>>             pred_grace_ht.append(y_test == y_predict)
>>>             pred_grace_ht_p.append(y_test == ht_p.predict(X_test))
>>>     if ht_p is not None:
>>>         ht_p.partial_fit(X_test, y_test)
>>>     ht.partial_fit(X_test, y_test)
>>> x = h.heap()
>>> mem_ddm.append(x.size)
>>> print("Number of drifts detected by ddm: " + str(d_ddm))
>>> print("TP by ddm:" + str(len(TP_ddm)))
>>> print("FP by ddm:" + str(len(FP_ddm)))
>>> print("Mean RT  %s seconds" % np.mean((ddm_running_time)))
>>> print("Mean Memory by ddm:" + str(mem_ddm))
>>> print("Accuracy by DDM:" + str(ML_accuracy / STREAM_SIZE))
>>> plt.plot(acc_x, acc_y, color='black')
>>> plt.scatter(drift_x, drift_y, edgecolors='red')
>>> plt.show()
add_element(prediction, n)#

Add a new element to the statistics

Parameters
  • prediction (int (either 0 or 1)) – This parameter indicates whether the last sample analyzed was correctly classified or not. 1 indicates an error (miss-classification).

  • n (int) – This parameter indicates the current timestamp t.

Notes

After calling this method, to verify if change was detected or if the learner is in the warning zone, one should call the super method detected_change, which returns True if concept drift was detected and False otherwise. Once identified concept drift is confirmed to be a TP/FP, one should call the method detect_TP/detect_FP respectively.

detect_FP(n)#

A false positive is detected :param n: The timestamp when the false positive is detected

detect_TP(n)#

A true concept drift is detected :param n: The timestamp when the true concept drift is detected

detected_change()#

This function returns whether concept drift was detected or not.

Returns

Whether concept drift was detected or not.

Return type

bool

detected_warning_zone()#

If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.

Returns

Whether the change detector is in the warning zone or not.

Return type

bool

get_info()#

Collects and returns the information about the configuration of the estimator

Returns

Configuration of the estimator.

Return type

string

get_length_estimation()#

Returns the length estimation.

Returns

The length estimation

Return type

int

get_params(deep=True)#

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

reset()#

Resets the change detector parameters.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type

self