skika.hyper_parameter_tuning.trees_arf.build_pareto_knowledge_trees#
Classes
|
Description : |
- class skika.hyper_parameter_tuning.trees_arf.build_pareto_knowledge_trees.BuildTreesKnowledge(results_file, list_perc_redund, list_models, output, verbose=False)#
- Description :
Class to build the pareto knowledge from hyper-parameters configurations evaluated on differents datasets for tuning the number of trees in ARF. The knowledge consists in the best configuration of hyper-parameters for each dataset.
The datasets are characterised by meta-features and a knowledge base can be then be built to link these features to the best configurations.
- Parameters :
- results_file: str
Path to the file containing the knowledge files (results of the evaluation of the configurations on example streams) See example in hyper-param-tuning-examples repository (pareto_knowledge/ExamplesTreesKnowledge/Results10-200.csv) to format the file
- list_perc_redund: list of float
List of percentages of redundance used in the example streams
- list_models: list of str
List of the names of the ARF configurations tested on the streams
- output: str
Directory path where to save output file
- verbose: bool, default = False
Print pareto figures if True
- Output:
Csv file containing the configurations selected for each example stream (each row = 1 stream)
Example
>>> names = ['ARF10','ARF30','ARF60','ARF70','ARF90','ARF100','ARF120','ARF150','ARF200'] >>> perc_redund = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
>>> output_dir = os.getcwd() >>> name_file =' /examples/pareto_knowledge/ExamplesTreesKnowledge/Results10-200.csv' # Available in hyper-param-tuning-examples repository
>>> pareto_build = BuildTreesKnowledge(results_file=name_file, list_perc_redund=perc_redund, list_models=names, output=output_dir, verbose=True) >>> pareto_build.load_drift_data() >>> pareto_build.calculate_pareto() >>> pareto_build.best_config
- property best_config#
Retrieve the length of the stream. :returns: The length of the stream. :rtype: int
- calculate_crowding(scores)#
From https://github.com/MichaelAllen1966 Crowding is based on a vector for each individual All dimension is normalised between low and high. For any one dimension, all solutions are sorted in order low to high. Crowding for chromsome x for that score is the difference between the next highest and next lowest score. Total crowding value sums all crowding for all scores
- calculate_pareto()#
Function to calculate the Pareto front and detect the knee point
- identify_pareto(scores)#
- load_drift_data()#
Function to load the performance data from the csv file
- reduce_by_crowding(scores, number_to_select)#
From https://github.com/MichaelAllen1966 This function selects a number of solutions based on tournament of crowding distances. Two members of the population are picked at random. The one with the higher croding dostance is always picked