skika.hyper_parameter_tuning.drift_detectors.AutoDDM#
Classes
|
Description : |
- class skika.hyper_parameter_tuning.drift_detectors.AutoDDM.AutoDDM(min_num_instances=30, warning_level=2.0, out_control_level=3.0, default_prob=1, ts_length=20, confidence=0.95, tolerance=1000, c=0.05)#
- Description :
AutoDDM is a dirft detector that adjusts the drift thresholds based on prior information. We exploit the periodicity in the data stream when it exists, such that it is more sensitive to true concept drifts while reducing false-positive detections.
- Parameters :
- min_num_instances: int
The minimum required number of analyzed samples so change can be detected. This is used to avoid false detections during the early moments of the detector, when the weight of one sample is important.
- warning_level: float
Warning Level
- out_control_level: float
Out-control Level
- default_prob: float (0 to 1)
The initial probability when drift detected and reset. Default value 1.
- ts_length: int
The length of location buffer. Default value 20.
- confidence: float (0 to 1)
The default confidence level. Default value 0.95.
- tolerance: int
The tolerance range of matching. Default value 1000. E.g. 500 plus/minus tolerance will match to 500
- c: float
A Laplacian constant used in the threshold function. Default value 0.05.
Example
>>> from src.detector.AutoDDM import AutoDDM >>> import warnings >>> import time >>> import numpy as np >>> from skmultiflow.trees import HoeffdingTreeClassifier >>> from guppy import hpy >>> import arff >>> import pandas >>> from skmultiflow.data import DataStream >>> >>> warnings.filterwarnings('ignore') >>> plt.style.use("seaborn-whitegrid") >>> >>> # Global variable >>> TRAINING_SIZE = 1 >>> grace = 1000 >>> tolerance = 500 >>> >>> elec_data = arff.load("elecNormNew.arff") >>> elec_df = pandas.DataFrame(elec_data) >>> elec_df.columns = ['date', 'day', 'period', 'nswprice', 'nswdemand', 'vicprice', 'vicdemand', 'transfer', 'class'] >>> mapping = {"day":{"1":1, "2":2, "3":3, "4":4, "5":5, "6":6, "7":7}, "class": {"UP": 0, "DOWN": 1}} >>> elec_df = elec_df.replace(mapping) >>> elec_full_df = pandas.concat([elec_df] * 200) >>> STREAM_SIZE = elec_full_df.shape[0] >>> elec_stream = DataStream(elec_full_df, name="elec") >>> elec_stream.prepare_for_use() >>> X_train, y_train = elec_stream.next_sample(TRAINING_SIZE) >>> ht = HoeffdingTreeClassifier() >>> ht.partial_fit(X_train, y_train) >>> n_global = TRAINING_SIZE # Cumulative Number of observations >>> d_ddm = 0 >>> w_ddm = 0 >>> TP_ddm = [] >>> FP_ddm = [] >>> RT_ddm = [] >>> DIST_ddm = [] >>> mem_ddm = [] >>> retrain = False >>> grace_end = n_global >>> detect_end = n_global >>> pred_grace_ht = [] >>> pred_grace_ht_p = [] >>> ht_p = None >>> ML_accuracy = 0 >>> acc_x = [] >>> acc_y = [] >>> drift_x = [] >>> drift_y = [] >>> >>> ddm = AutoDDM(tolerance=tolerance) >>> h = hpy() >>> while elec_stream.has_more_samples(): >>> n_global += 1 >>> >>> X_test, y_test = elec_stream.next_sample() >>> y_predict = ht.predict(X_test) >>> ddm_start_time = time.time() >>> ddm.add_element(y_test != y_predict, n_global) >>> ML_accuracy += 1 if y_test == y_predict else 0 >>> if (n_global % 100 == 0): >>> acc_x.append(n_global) >>> acc_y.append(ML_accuracy/n_global) >>> ddm_running_time = time.time() - ddm_start_time >>> RT_ddm.append(ddm_running_time) >>> if (n_global > grace_end): >>> if (n_global > detect_end): >>> if ht_p is not None: >>> drift_point = detect_end - 2 * grace >>> print("Accuracy of ht: " + str(np.mean(pred_grace_ht))) >>> print("Accuracy of ht_p: " + str(np.mean(pred_grace_ht_p))) >>> if (np.mean(pred_grace_ht_p) > np.mean(pred_grace_ht)): >>> print("TP detected at: " + str(drift_point)) >>> TP_ddm.append(drift_point) >>> ddm.detect_TP(drift_point) >>> ht = ht_p >>> drift_x.append(n_global) >>> drift_y.append(ML_accuracy/n_global) >>> else: >>> print("FP detected at: " + str(drift_point)) >>> FP_ddm.append(drift_point) >>> ddm.detect_FP(n_global) >>> ht_p = None >>> pred_grace_ht = [] >>> pred_grace_ht_p = [] >>> if ddm.detected_warning_zone(): >>> w_ddm += 1 >>> if ddm.detected_change(): >>> d_ddm += 1 >>> ht_p = HoeffdingTreeClassifier() >>> grace_end = n_global + grace >>> detect_end = n_global + 2 * grace >>> else: >>> pred_grace_ht.append(y_test == y_predict) >>> pred_grace_ht_p.append(y_test == ht_p.predict(X_test)) >>> if ht_p is not None: >>> ht_p.partial_fit(X_test, y_test) >>> ht.partial_fit(X_test, y_test) >>> x = h.heap() >>> mem_ddm.append(x.size) >>> print("Number of drifts detected by ddm: " + str(d_ddm)) >>> print("TP by ddm:" + str(len(TP_ddm))) >>> print("FP by ddm:" + str(len(FP_ddm))) >>> print("Mean RT %s seconds" % np.mean((ddm_running_time))) >>> print("Mean Memory by ddm:" + str(mem_ddm)) >>> print("Accuracy by DDM:" + str(ML_accuracy / STREAM_SIZE)) >>> plt.plot(acc_x, acc_y, color='black') >>> plt.scatter(drift_x, drift_y, edgecolors='red') >>> plt.show()
- add_element(prediction, n)#
Add a new element to the statistics
- Parameters
prediction (int (either 0 or 1)) – This parameter indicates whether the last sample analyzed was correctly classified or not. 1 indicates an error (miss-classification).
n (int) – This parameter indicates the current timestamp t.
Notes
After calling this method, to verify if change was detected or if the learner is in the warning zone, one should call the super method detected_change, which returns True if concept drift was detected and False otherwise. Once identified concept drift is confirmed to be a TP/FP, one should call the method detect_TP/detect_FP respectively.
- detect_FP(n)#
A false positive is detected :param n: The timestamp when the false positive is detected
- detect_TP(n)#
A true concept drift is detected :param n: The timestamp when the true concept drift is detected
- detected_change()#
This function returns whether concept drift was detected or not.
- Returns
Whether concept drift was detected or not.
- Return type
bool
- detected_warning_zone()#
If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.
- Returns
Whether the change detector is in the warning zone or not.
- Return type
bool
- get_info()#
Collects and returns the information about the configuration of the estimator
- Returns
Configuration of the estimator.
- Return type
string
- get_length_estimation()#
Returns the length estimation.
- Returns
The length estimation
- Return type
int
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
- reset()#
Resets the change detector parameters.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Return type
self