skika.hyper_parameter_tuning.trees_arf.build_pareto_knowledge_trees#

Classes

BuildTreesKnowledge(results_file, ...[, verbose])

Description :

class skika.hyper_parameter_tuning.trees_arf.build_pareto_knowledge_trees.BuildTreesKnowledge(results_file, list_perc_redund, list_models, output, verbose=False)#
Description :

Class to build the pareto knowledge from hyper-parameters configurations evaluated on differents datasets for tuning the number of trees in ARF. The knowledge consists in the best configuration of hyper-parameters for each dataset.

The datasets are characterised by meta-features and a knowledge base can be then be built to link these features to the best configurations.

Parameters :
results_file: str

Path to the file containing the knowledge files (results of the evaluation of the configurations on example streams) See example in hyper-param-tuning-examples repository (pareto_knowledge/ExamplesTreesKnowledge/Results10-200.csv) to format the file

list_perc_redund: list of float

List of percentages of redundance used in the example streams

list_models: list of str

List of the names of the ARF configurations tested on the streams

output: str

Directory path where to save output file

verbose: bool, default = False

Print pareto figures if True

Output:

Csv file containing the configurations selected for each example stream (each row = 1 stream)

Example

>>> names = ['ARF10','ARF30','ARF60','ARF70','ARF90','ARF100','ARF120','ARF150','ARF200']
>>> perc_redund = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
>>> output_dir = os.getcwd()
>>> name_file =' /examples/pareto_knowledge/ExamplesTreesKnowledge/Results10-200.csv' # Available in hyper-param-tuning-examples repository
>>> pareto_build = BuildTreesKnowledge(results_file=name_file, list_perc_redund=perc_redund, list_models=names, output=output_dir, verbose=True)
>>> pareto_build.load_drift_data()
>>> pareto_build.calculate_pareto()
>>> pareto_build.best_config
property best_config#

Retrieve the length of the stream. :returns: The length of the stream. :rtype: int

calculate_crowding(scores)#

From https://github.com/MichaelAllen1966 Crowding is based on a vector for each individual All dimension is normalised between low and high. For any one dimension, all solutions are sorted in order low to high. Crowding for chromsome x for that score is the difference between the next highest and next lowest score. Total crowding value sums all crowding for all scores

calculate_pareto()#

Function to calculate the Pareto front and detect the knee point

identify_pareto(scores)#

From https://github.com/MichaelAllen1966

load_drift_data()#

Function to load the performance data from the csv file

reduce_by_crowding(scores, number_to_select)#

From https://github.com/MichaelAllen1966 This function selects a number of solutions based on tournament of crowding distances. Two members of the population are picked at random. The one with the higher croding dostance is always picked