skika.hyper_parameter_tuning.trees_arf.build_pareto_knowledge_trees#

Classes

BuildTreesKnowledge(results_file, ...[, verbose])

Description :

class skika.hyper_parameter_tuning.trees_arf.build_pareto_knowledge_trees.BuildTreesKnowledge(results_file, list_perc_redund, list_models, output, verbose=False)#

Description :

Class to build the pareto knowledge from hyper-parameters configurations evaluated on differents datasets for tuning the number of trees in ARF. The knowledge consists in the best configuration of hyper-parameters for each dataset.

The datasets are characterised by meta-features and a knowledge base can be then be built to link these features to the best configurations.

Parameters :

results_file: str: Path to the file containing the knowledge files (results of the evaluation of the configurations on example streams) See example in hyper-param-tuning-examples repository (pareto_knowledge/ExamplesTreesKnowledge/Results10-200.csv) to format the file
list_perc_redund: list of float: List of percentages of redundance used in the example streams
list_models: list of str: List of the names of the ARF configurations tested on the streams
output: str: Directory path where to save output file
verbose: bool, default = False: Print pareto figures if True

Output:

Csv file containing the configurations selected for each example stream (each row = 1 stream)

Example

>>> names = ['ARF10','ARF30','ARF60','ARF70','ARF90','ARF100','ARF120','ARF150','ARF200']
>>> perc_redund = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]

>>> output_dir = os.getcwd()
>>> name_file =' /examples/pareto_knowledge/ExamplesTreesKnowledge/Results10-200.csv' # Available in hyper-param-tuning-examples repository

>>> pareto_build = BuildTreesKnowledge(results_file=name_file, list_perc_redund=perc_redund, list_models=names, output=output_dir, verbose=True)
>>> pareto_build.load_drift_data()
>>> pareto_build.calculate_pareto()
>>> pareto_build.best_config

property best_config#: Retrieve the length of the stream. :returns: The length of the stream. :rtype: int

calculate_crowding(scores)#: From https://github.com/MichaelAllen1966 Crowding is based on a vector for each individual All dimension is normalised between low and high. For any one dimension, all solutions are sorted in order low to high. Crowding for chromsome x for that score is the difference between the next highest and next lowest score. Total crowding value sums all crowding for all scores

calculate_pareto()#: Function to calculate the Pareto front and detect the knee point

identify_pareto(scores)#: From https://github.com/MichaelAllen1966

load_drift_data()#: Function to load the performance data from the csv file

reduce_by_crowding(scores, number_to_select)#: From https://github.com/MichaelAllen1966 This function selects a number of solutions based on tournament of crowding distances. Two members of the population are picked at random. The one with the higher croding dostance is always picked