skika.hyper_parameter_tuning.drift_detectors.build_pareto_knowledge_drifts#

Classes

BuildDriftKnowledge(results_directory, ...)

Description :

class skika.hyper_parameter_tuning.drift_detectors.build_pareto_knowledge_drifts.BuildDriftKnowledge(results_directory, names_detectors, names_streams, output, verbose=False)#

Description :

Class to build the pareto knowledge from hyper-parameters configurations evaluated on differents datasets for the drift detector tuning. The knowledge consists in the best configuration of hyper-parameters for each dataset.

The datasets are characterised by meta-features and a knowledge base can be then be built to link these features to the best configurations.

Parameters :

results_directory: str: Path to the directory containing the knowledge files (results of the evaluation of the configurations on example streams)
names_detectors: list of str: List of the names of the detectors
names_streams: list of str: list of the names of the streams
n_meta_features: int, default = 15 ((severity, magnitude, interval) * (med, kurto, skew, per10, per90)): Number of meta-features extracted from the stream NOT USED FOR THE MOMENT as we use theoritical meta-features and not measured ones
knowledge_type: str: String indicating what knowledge is being calculated (for arf tree tuning or drift detectors) NOT USED FOR THE MOMENT, need further implementing to bring the two applications together
output: str: Directory path where to save output file
verbose: bool, default = False: Print pareto figures if True
Output:: Csv file containing the configurations selected for each example stream (each row = 1 stream)

Example

>>> names_stm = ['BernouW1ME0010','BernouW1ME005095','BernouW1ME00509','BernouW1ME0109','BernouW1ME0108','BernouW1ME0208','BernouW1ME0207','BernouW1ME0307','BernouW1ME0306','BernouW1ME0406','BernouW1ME0506','BernouW1ME05506',
>>>             'BernouW100ME0010','BernouW100ME005095','BernouW100ME00509','BernouW100ME0109','BernouW100ME0108','BernouW100ME0208','BernouW100ME0207','BernouW100ME0307','BernouW100ME0306','BernouW100ME0406','BernouW100ME0506','BernouW100ME05506',
>>>             'BernouW500ME0010','BernouW500ME005095','BernouW500ME00509','BernouW500ME0109','BernouW500ME0108','BernouW500ME0208','BernouW500ME0207','BernouW500ME0307','BernouW500ME0306','BernouW500ME0406','BernouW500ME0506','BernouW500ME05506']
>>>
>>> names_detect = [['PH1','PH2','PH3','PH4','PH5','PH6','PH7','PH8','PH9','PH10','PH11','PH12','PH13','PH14','PH15','PH16'],
>>>                   ['ADWIN1','ADWIN2','ADWIN3','ADWIN4','ADWIN5','ADWIN6','ADWIN7','ADWIN8','ADWIN9'],
>>>                   ['DDM1','DDM2','DDM3','DDM4','DDM5','DDM6','DDM7','DDM8','DDM9','DDM10'],
>>>                   ['SeqDrift21','SeqDrift22','SeqDrift23','SeqDrift24','SeqDrift25','SeqDrift26','SeqDrift27','SeqDrift28','SeqDrift29','SeqDrift210',
>>>                    'SeqDrift211','SeqDrift212','SeqDrift213','SeqDrift214','SeqDrift215','SeqDrift216','SeqDrift217','SeqDrift218']]
>>>
>>> output_dir = os.getcwd()
>>> directory_path_files = 'examples/pareto_knowledge/ExampleDriftKnowledge' # Available in hyper-param-tuning-examples repository
>>>
>>> pareto_build = BuildDriftKnowledge(results_directory=directory_path_files, names_detectors=names_detect, names_streams=names_stm, output=output_dir, verbose=True)
>>> pareto_build.load_drift_data()
>>> pareto_build.calculate_pareto()
>>> pareto_build.best_config

property best_config#: Retrieve the length of the stream. :returns: The length of the stream. :rtype: int

calculate_crowding(scores)#: From https://github.com/MichaelAllen1966 Crowding is based on a vector for each individual All dimension is normalised between low and high. For any one dimension, all solutions are sorted in order low to high. Crowding for chromsome x for that score is the difference between the next highest and next lowest score. Total crowding value sums all crowding for all scores

calculate_pareto()#: Function to calculate the Pareto front and detect the knee point

identify_pareto(scores)#: From https://github.com/MichaelAllen1966

load_drift_data()#: Function to load the performance data from the csv files

reduce_by_crowding(scores, number_to_select)#: From https://github.com/MichaelAllen1966 This function selects a number of solutions based on tournament of crowding distances. Two members of the population are picked at random. The one with the higher croding dostance is always picked