skika.data.generate_dataset#
Functions
|
Given a list of availiable concepts, generate a dict with (start, id) pairs giving the start of each concept. |
Generates a list of concepts for a datastream. |
|
|
Given a list of availiable concepts, generate a dict with (start, id) pairs giving the start of each concept. |
|
|
|
Given [(gt_concept, start_i, end_i)...] Return the ground truth occuring at a given index. |
|
|
|
|
|
|
|
Save examples to an ARFF file. |
|
Create, generate and save a data stream to csv or ARFF. |
Classes
|
Options for generating a concept. |
|
|
|
- class skika.data.generate_dataset.DatastreamOptions(noise, num_concepts, hard_diff, easy_diff, hard_appear, easy_appear, hard_prop, examples_per_appearence, stream_type, seed, gradual)#
Options for generating a concept.
- class skika.data.generate_dataset.NPEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)#
- default(obj)#
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- encode(o)#
Return a JSON string representation of a Python data structure.
>>> from json.encoder import JSONEncoder >>> JSONEncoder().encode({"foo": ["bar", "baz"]}) '{"foo": ["bar", "baz"]}'
- iterencode(o, _one_shot=False)#
Encode the given object and yield each string representation as available.
For example:
for chunk in JSONEncoder().iterencode(bigobject): mysocket.write(chunk)
- skika.data.generate_dataset.generate_concept_chain(concept_desc, sequential)#
Given a list of availiable concepts, generate a dict with (start, id) pairs giving the start of each concept.
- Parameters
sequential (bool) – If true, concept transitions are determined by ID without randomness.
- skika.data.generate_dataset.generate_experiment_concept_chain(ds_options, sequential, pattern)#
Generates a list of concepts for a datastream.
- Parameters
ds_options – options for the data stream
sequential (bool) – If concepts should be sequential not random
pattern (bool) – If transitions should have an underlying pattern
- Returns
concept_chain (dict<int><int>)
num_samples (int)
concept_descriptions (list<ConceptOccurence>)
- skika.data.generate_dataset.generate_pattern_concept_chain(concept_desc, sequential)#
Given a list of availiable concepts, generate a dict with (start, id) pairs giving the start of each concept. This is generated using a random markov model, so specific transtion patterns have unique properties.
- Parameters
sequential (bool) – If true, concept transitions are determined by ID without randomness.
- skika.data.generate_dataset.get_concepts(gt_concepts, ex_index, num_samples)#
Given [(gt_concept, start_i, end_i)…] Return the ground truth occuring at a given index.
- skika.data.generate_dataset.saveStreamToArff(filename, stream_examples, stream_supplementary, arff)#
Save examples to an ARFF file.
- Parameters
filename (str) – filename with extention
stream_examples (list) – list of examples [[X, y]]
stream_supplementary (list) – list of supplementary info for each observation
arff (bool) – Use arff or CSV
- skika.data.generate_dataset.save_stream(options, ds_options, pattern=False, arff=False)#
Create, generate and save a data stream to csv or ARFF.
- Parameters
options (ExperimentOptions) – options for the experiment
ds_options (DatastreamOptions) – options for the stream
pattern (bool) – Should use a pattern for concept ordering
arff (bool) – Save to ARFF