skika.hyper_parameter_tuning.trees_arf.evaluate_prequential_and_adapt#
Classes
Prequential evaluation method with adaptive tuning of hyper-parameters to tune the number of trees in ARF. |
- class skika.hyper_parameter_tuning.trees_arf.evaluate_prequential_and_adapt.EvaluatePrequentialAndAdaptTreesARF(n_wait=200, max_samples=100000, batch_size=1, pretrain_size=200, max_time=inf, metrics=None, output_file=None, show_plot=False, restart_stream=True, data_points_for_classification=False, metaKB=None)#
Prequential evaluation method with adaptive tuning of hyper-parameters to tune the number of trees in ARF.
- Description :
This code is based on the
scikit_multiflow
evaluate_prequential implementation. Copyright (c) 2017, scikit-multiflow All rights reserved.We modified it to include adaptive tuning of hyper-parameters.
Scikit_multiflow description: An alternative to the traditional holdout evaluation, inherited from batch setting problems.
The prequential evaluation is designed specifically for stream settings, in the sense that each sample serves two purposes, and that samples are analysed sequentially, in order of arrival, and become immediately inaccessible.
This method consists of using each sample to test the model, which means to make a predictions, and then the same sample is used to train the model (partial fit). This way the model is always tested on samples that it hasn’t seen yet.
Additional scikit-ika features: This method implements an adaptive tuning process to adapt the number of trees in an Adaptive Random Forest, depending on the number of redundant features in the stream.
- Parameters :
- n_wait:int (Default: 200)
The number of samples to process between each test. Also defines when to update the plot if show_plot=True. Note that setting n_wait too small can significantly slow the evaluation process.
- max_samples:int (Default: 100000)
The maximum number of samples to process during the evaluation.
- batch_size:int (Default: 1)
The number of samples to pass at a time to the model(s).
- pretrain_size:int (Default: 200)
The number of samples to use to train the model before starting the evaluation. Used to enforce a ‘warm’ start.
- max_time:float (Default: float(“inf”))
The maximum duration of the simulation (in seconds).
metrics:list, optional (Default: [‘accuracy’, ‘kappa’])
The list of metrics to track during the evaluation. Also defines the metrics that will be displayed in plots and/or logged into the output file. Valid options areClassification‘accuracy’‘kappa’‘kappa_t’‘kappa_m’‘true_vs_predicted’‘precision’‘recall’‘f1’‘gmean’Multi-target Classification‘hamming_score’‘hamming_loss’‘exact_match’‘j_index’Regression‘mean_square_error’‘mean_absolute_error’‘true_vs_predicted’Multi-target Regression‘average_mean_squared_error’‘average_mean_absolute_error’‘average_root_mean_square_error’Experimental‘running_time’‘model_size’‘ram_hours’- output_file: string, optional (Default: None)
File name to save the summary of the evaluation.
- show_plot: bool (Default: False)
If True, a plot will show the progress of the evaluation. Warning: Plotting can slow down the evaluation process.
- restart_stream: bool, optional (default: True)
If True, the stream is restarted once the evaluation is complete.
- data_points_for_classification: bool(Default: False)
If True, the visualization used is a cloud of data points (only works for classification) and default performance metrics are ignored. If specific metrics are required, then they must be explicitly set using the
metrics
attribute.- metaKBdict (Default: None)
The meta model linking the meta features to the hyper-parameters configuration. It is a dictionary linking the percentage of redundant features and the number of trees to choose for each of them. This model is built by runing multiple ARF configurations (with different number of trees) on multiple streams with different percentages of redundant features, and using the build_pareto_knowledge_trees module to choose the number of trees. E.g.: dictMeta = {0.0:60 ,0.1:30, 0.2:30, 0.3:30, 0.4:60, 0.5:70, 0.6:60, 0.7:30, 0.8:30, 0.9:30} If no metaKB, the class performs only the prequential evaluation.
- Notes
If the adaptive hyper-parameter tuning is not used, this evaluator can process a single learner to track its performance; or multiple learners at a time, to compare different models on the same stream.
If the adaptive hyper-parameter tuning is used, this evaluator can process only a single learner at the moment.
This class can be only used with the ARF as a classifier. Further developments are needed to generalise it to more tasks with more classifiers.
The metric ‘true_vs_predicted’ is intended to be informative only. It corresponds to evaluations at a specific moment which might not represent the actual learner performance across all instances.
Example
>>> from skika.data.random_rbf_generator_redund import RandomRBFGeneratorRedund >>> from skika.hyper_parameter_tuning.trees_arf.evaluate_prequential_and_adapt import EvaluatePrequentialAndAdaptTreesARF >>> >>> # Set the stream >>> stream = StreamGeneratorRedund(base_stream = RandomRBFGeneratorRedund(n_classes=2, n_features=30, n_centroids=50, noise_percentage = 0.0), random_state=None, n_drifts = 100, n_instances = 100000) >>> stream.prepare_for_use() >>> >>> # Set the model >>> arf = AdaptiveRandomForest(n_estimators = 10) >>> >>> # Set the meta knowledge >>> dictMeta = {0.0:60 ,0.1:30, 0.2:30, 0.3:30, 0.4:60, 0.5:70, 0.6:60, 0.7:30, 0.8:30, 0.9:30} # dict = {'pourc redund feat':best nb tree} >>> >>> # Set the evaluator >>> >>> evaluator = EvaluatePrequential(metrics=['accuracy','kappa','running_time','ram_hours'], >>> max_samples=100000, >>> n_wait=500, >>> pretrain_size=200, >>> show_plot=True) >>> >>> # Run evaluation with adative tuning >>> evaluator.evaluate(stream=stream, model=arf, model_names=['ARF'])
- evaluate(stream, model, model_names=None)#
Evaluates a model on samples from a stream and adapt the tuning.
- Parameters
stream (Stream) – The stream from which to draw the samples.
model (skmultiflow.core.BaseStreamModel or sklearn.base.BaseEstimator or list) – The model or list of models to evaluate. NOTE : Only ARF is usable with this current version of the adaptive tuning.
model_names (list, optional (Default=None)) – A list with the names of the models.
- Returns
The trained model(s).
- Return type
StreamModel or list
- get_current_measurements(model_idx=None)#
Get current measurements from the evaluation (measured on last n_wait samples).
- Parameters
model_idx (int, optional (Default=None)) – Indicates the index of the model as defined in evaluate(model). If None, returns a list with the measurements for each model.
- Returns
measurements or list
Current measurements. If model_idx is None, returns a list with the measurements – for each model.
- Raises
IndexError – If the index is invalid.:
- get_info()#
Collects and returns the information about the configuration of the estimator
- Returns
Configuration of the estimator.
- Return type
string
- get_mean_measurements(model_idx=None)#
Get mean measurements from the evaluation.
- Parameters
model_idx (int, optional (Default=None)) – Indicates the index of the model as defined in evaluate(model). If None, returns a list with the measurements for each model.
- Returns
measurements or list
Mean measurements. If model_idx is None, returns a list with the measurements – for each model.
- Raises
IndexError – If the index is invalid.:
- get_measurements(model_idx=None)#
Get measurements from the evaluation.
- Parameters
model_idx (int, optional (Default=None)) – Indicates the index of the model as defined in evaluate(model). If None, returns a list with the measurements for each model.
- Returns
tuple (mean, current)
Mean and Current measurements. If model_idx is None, each member of the tuple – is a a list with the measurements for each model.
- Raises
IndexError – If the index is invalid.:
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
- partial_fit(X, y, classes=None, sample_weight=None)#
Partially fit all the models on the given data.
- Parameters
X (Numpy.ndarray of shape (n_samples, n_features)) – The data upon which the algorithm will create its model.
y (Array-like) – An array-like containing the classification labels / target values for all samples in X.
classes (list) – Stores all the classes that may be encountered during the classification task. Not used for regressors.
sample_weight (Array-like) – Samples weight. If not provided, uniform weights are assumed.
- Returns
self
- Return type
EvaluatePrequential
- predict(X)#
Predicts with the estimator(s) being evaluated.
- Parameters
X (Numpy.ndarray of shape (n_samples, n_features)) – All the samples we want to predict the label for.
- Returns
Model(s) predictions
- Return type
list of numpy.ndarray
- reset()#
Resets the estimator to its initial state.
- Return type
self
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Return type
self