skika.hyper_parameter_tuning.drift_detectors.AutoDDM#

Classes

AutoDDM([min_num_instances, warning_level, ...])

Description :

class skika.hyper_parameter_tuning.drift_detectors.AutoDDM.AutoDDM(min_num_instances=30, warning_level=2.0, out_control_level=3.0, default_prob=1, ts_length=20, confidence=0.95, tolerance=1000, c=0.05)#

Description :

AutoDDM is a dirft detector that adjusts the drift thresholds based on prior information. We exploit the periodicity in the data stream when it exists, such that it is more sensitive to true concept drifts while reducing false-positive detections.

Parameters :

min_num_instances: int: The minimum required number of analyzed samples so change can be detected. This is used to avoid false detections during the early moments of the detector, when the weight of one sample is important.
warning_level: float: Warning Level
out_control_level: float: Out-control Level
default_prob: float (0 to 1): The initial probability when drift detected and reset. Default value 1.
ts_length: int: The length of location buffer. Default value 20.
confidence: float (0 to 1): The default confidence level. Default value 0.95.
tolerance: int: The tolerance range of matching. Default value 1000. E.g. 500 plus/minus tolerance will match to 500
c: float: A Laplacian constant used in the threshold function. Default value 0.05.

Example

>>> from src.detector.AutoDDM import AutoDDM
>>> import warnings
>>> import time
>>> import numpy as np
>>> from skmultiflow.trees import HoeffdingTreeClassifier
>>> from guppy import hpy
>>> import arff
>>> import pandas
>>> from skmultiflow.data import DataStream
>>>
>>> warnings.filterwarnings('ignore')
>>> plt.style.use("seaborn-whitegrid")
>>>
>>> # Global variable
>>> TRAINING_SIZE = 1
>>> grace = 1000
>>> tolerance = 500
>>>
>>> elec_data = arff.load("elecNormNew.arff")
>>> elec_df = pandas.DataFrame(elec_data)
>>> elec_df.columns = ['date', 'day', 'period', 'nswprice', 'nswdemand', 'vicprice', 'vicdemand', 'transfer', 'class']
>>> mapping = {"day":{"1":1, "2":2, "3":3, "4":4, "5":5, "6":6, "7":7}, "class": {"UP": 0, "DOWN": 1}}
>>> elec_df = elec_df.replace(mapping)
>>> elec_full_df = pandas.concat([elec_df] * 200)
>>> STREAM_SIZE = elec_full_df.shape[0]
>>> elec_stream = DataStream(elec_full_df, name="elec")
>>> elec_stream.prepare_for_use()
>>> X_train, y_train = elec_stream.next_sample(TRAINING_SIZE)
>>> ht = HoeffdingTreeClassifier()
>>> ht.partial_fit(X_train, y_train)
>>> n_global = TRAINING_SIZE  # Cumulative Number of observations
>>> d_ddm = 0
>>> w_ddm = 0
>>> TP_ddm = []
>>> FP_ddm = []
>>> RT_ddm = []
>>> DIST_ddm = []
>>> mem_ddm = []
>>> retrain = False
>>> grace_end = n_global
>>> detect_end = n_global
>>> pred_grace_ht = []
>>> pred_grace_ht_p = []
>>> ht_p = None
>>> ML_accuracy = 0
>>> acc_x = []
>>> acc_y = []
>>> drift_x = []
>>> drift_y = []
>>>
>>> ddm = AutoDDM(tolerance=tolerance)
>>> h = hpy()
>>> while elec_stream.has_more_samples():
>>>     n_global += 1
>>>
>>>     X_test, y_test = elec_stream.next_sample()
>>>     y_predict = ht.predict(X_test)
>>>     ddm_start_time = time.time()
>>>     ddm.add_element(y_test != y_predict, n_global)
>>>     ML_accuracy += 1 if y_test == y_predict else 0
>>>     if (n_global % 100 == 0):
>>>         acc_x.append(n_global)
>>>         acc_y.append(ML_accuracy/n_global)
>>>     ddm_running_time = time.time() - ddm_start_time
>>>     RT_ddm.append(ddm_running_time)
>>>     if (n_global > grace_end):
>>>         if (n_global > detect_end):
>>>             if ht_p is not None:
>>>                 drift_point = detect_end - 2 * grace
>>>                 print("Accuracy of ht: " + str(np.mean(pred_grace_ht)))
>>>                 print("Accuracy of ht_p: " + str(np.mean(pred_grace_ht_p)))
>>>                 if (np.mean(pred_grace_ht_p) > np.mean(pred_grace_ht)):
>>>                     print("TP detected at: " + str(drift_point))
>>>                     TP_ddm.append(drift_point)
>>>                     ddm.detect_TP(drift_point)
>>>                     ht = ht_p
>>>                     drift_x.append(n_global)
>>>                     drift_y.append(ML_accuracy/n_global)
>>>                 else:
>>>                     print("FP detected at: " + str(drift_point))
>>>                     FP_ddm.append(drift_point)
>>>                     ddm.detect_FP(n_global)
>>>                 ht_p = None
>>>                 pred_grace_ht = []
>>>                 pred_grace_ht_p = []
>>>             if ddm.detected_warning_zone():
>>>                 w_ddm += 1
>>>             if ddm.detected_change():
>>>                 d_ddm += 1
>>>                 ht_p = HoeffdingTreeClassifier()
>>>                 grace_end = n_global + grace
>>>                 detect_end = n_global + 2 * grace
>>>         else:
>>>             pred_grace_ht.append(y_test == y_predict)
>>>             pred_grace_ht_p.append(y_test == ht_p.predict(X_test))
>>>     if ht_p is not None:
>>>         ht_p.partial_fit(X_test, y_test)
>>>     ht.partial_fit(X_test, y_test)
>>> x = h.heap()
>>> mem_ddm.append(x.size)
>>> print("Number of drifts detected by ddm: " + str(d_ddm))
>>> print("TP by ddm:" + str(len(TP_ddm)))
>>> print("FP by ddm:" + str(len(FP_ddm)))
>>> print("Mean RT  %s seconds" % np.mean((ddm_running_time)))
>>> print("Mean Memory by ddm:" + str(mem_ddm))
>>> print("Accuracy by DDM:" + str(ML_accuracy / STREAM_SIZE))
>>> plt.plot(acc_x, acc_y, color='black')
>>> plt.scatter(drift_x, drift_y, edgecolors='red')
>>> plt.show()

add_element(prediction, n)#

Add a new element to the statistics

Parameters

prediction (int (either 0 or 1)) – This parameter indicates whether the last sample analyzed was correctly classified or not. 1 indicates an error (miss-classification).
n (int) – This parameter indicates the current timestamp t.

Notes

After calling this method, to verify if change was detected or if the learner is in the warning zone, one should call the super method detected_change, which returns True if concept drift was detected and False otherwise. Once identified concept drift is confirmed to be a TP/FP, one should call the method detect_TP/detect_FP respectively.

detect_FP(n)#: A false positive is detected :param n: The timestamp when the false positive is detected

detect_TP(n)#: A true concept drift is detected :param n: The timestamp when the true concept drift is detected

detected_change()#

This function returns whether concept drift was detected or not.

Returns: Whether concept drift was detected or not.
Return type: bool

detected_warning_zone()#

If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.

Returns: Whether the change detector is in the warning zone or not.
Return type: bool

get_info()#

Collects and returns the information about the configuration of the estimator

Returns: Configuration of the estimator.
Return type: string

get_length_estimation()#

Returns the length estimation.

Returns: The length estimation
Return type: int

get_params(deep=True)#

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: mapping of string to any

reset()#: Resets the change detector parameters.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type: self