skika.data.reccurring_concept_stream#
Classes
|
An AGRAWAL concept. |
|
Base concept class. |
|
Represents a concept in a stream |
|
An RBF concept. |
|
An enumeration. |
|
A stream featuring gradual drift between given concepts. |
|
A stream featuring abrupt drift between given concepts. |
|
A SEA concept. |
|
A SINE concept. |
|
A STAGGER concept. |
|
A TREE concept. |
|
A WINDSIM concept. |
- class skika.data.reccurring_concept_stream.AGRAWALConcept(concept_id=0, seed=None, noise=0)#
An AGRAWAL concept.
- Parameters
concept_id (int) – The ID of the AGRAWAL generating function to use. Should be within 0-9.
seed (int) – The seed used by the random number generator.
noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
- class skika.data.reccurring_concept_stream.Concept(stream)#
Base concept class. A ‘Concept’ can be thought of as a relationship between features and label. Here we model different concept using streams produced by different generating functions, i.e. each concept is a given distribution of data and an observation is drawn from one concept.
- Parameters
stream (datastream) – The stream the concept will draw observations from.
- class skika.data.reccurring_concept_stream.ConceptOccurence(id, difficulty, noise, appearences, examples_per_appearence)#
Represents a concept in a stream
- class skika.data.reccurring_concept_stream.RBFConcept(concept_id=0, seed=None, noise=0, desc=None)#
An RBF concept.
- Parameters
concept_id (int) – The ID of the concept.
seed (int) – The seed used by the random number generator.
noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
desc (ConceptOccurence) – A class which describes the specific concept.
- class skika.data.reccurring_concept_stream.RCStreamType(value)#
An enumeration.
- class skika.data.reccurring_concept_stream.RecurringConceptGradualStream(rctype, num_samples, noise, concept_chain, window_size=1000, seed=None, desc=None, boost_first_occurance=True)#
A stream featuring gradual drift between given concepts. Uses the scikit-multiflow concept drift stream to blend concepts over a window.
- Parameters
rctype (RCStreamType) – An enum describing the type of stream
num_samples (int) – The number of samples in the stream
noise (float) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
concept_chain (list<int> or dict) – A dict with key observation number and value the concept begining at that observation or A list of concept ids. A dict will be generated with each concept lasting its length given in desc or uniform length.
window_size (int) – The number of observations each gradual drift is spread over.
seed (int) – Random seed.
desc (dict<int><ConceptOccurence>) – A map of concept ID to options
boost_first_occurance (bool) – If true, double the observations drawn from the first occurence of a concept. Allows a better model to be built and stored.
Examples
>>> # An example stream using the STAGGER Generator. >>> # Starts using generating function 0, then at >>> # observation 5000 transitions to generating function >>> # 1 then at 10000 transitions back to 0. >>> from skika.data.reccurring_concept_stream import RCStreamType, RecurringConceptGradualStream, ConceptOccurence >>> concept_chain = {0: 0, 5000: 1, 10000: 0} >>> num_samples = 15000 >>> # init concept >>> concept_0 = ConceptOccurence(id=0, difficulty=2, noise=0, appearences=2, examples_per_appearence=5000) >>> concept_1 = ConceptOccurence(id=1, difficulty=3, noise=0, appearences=1, examples_per_appearence=5000) >>> desc = {0: concept_0, 1: concept_1} >>> datastream = RecurringConceptGradualStream( rctype=RCStreamType.STAGGER, num_samples=num_samples, noise=0, concept_chain=concept_chain, window_size=1000, seed=42, desc=desc, boost_first_occurance=False) >>> datastream.has_more_samples() True >>> datastream.get_drift_info() {0: 0, 5000: 1, 10000: 0} >>> datastream.n_remaining_samples() 15000 >>> datastream.get_stream_info() {0: 0, 5000: 1, 10000: 0} 0 - 5000: STAGGERGenerator(balance_classes=False, classification_function=0, random_state=42) 5000 - 10000: STAGGERGenerator(balance_classes=False, classification_function=1, random_state=43) 10000 - 15000: STAGGERGenerator(balance_classes=False, classification_function=0, random_state=42) >>> datastream.get_moa_stream_info() {0: 0, 5000: 1, 10000: 0} '(ConceptDriftStream -s (generators.STAGGERGenerator -f 1 -i 42) -d (ConceptDriftStream -s (generators.STAGGERGenerator -f 2 -i 43) -d (generators.STAGGERGenerator -f 1 -i 42) -p 5000 -w 1) -p 5000 -w 1)' >>> datastream.get_supplementary_info() >>> datastream.next_sample() (array([[2., 0., 2.]]), array([0])) >>> datastream.n_remaining_samples() 14999 >>> datastream.next_sample() (array([[2., 0., 0.]]), array([0])) >>> datastream.n_remaining_samples() 14998
- get_moa_stream_info()#
Returns a string to run the corresponding stream in MOA.
- get_moa_stream_string(concepts=None)#
Returns a string to run the corresponding stream in MOA.
- get_stream_info()#
Prints information about the concepts included in the stream.
- get_supplementary_info()#
Returns supplementary info about each concept.
- class skika.data.reccurring_concept_stream.RecurringConceptStream(rctype, num_samples, noise, concept_chain, seed=None, desc=None, boost_first_occurance=True)#
A stream featuring abrupt drift between given concepts.
- Parameters
rctype (RCStreamType) – An enum describing the type of stream
num_samples (int) – The number of samples in the stream
noise (float) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
concept_chain (list<int> or dict) – A dict with key observation number and value the concept beginning at that observation or A list of concept ids. A dict will be generated with each concept lasting its length given in desc or uniform length.
seed (int) – Random seed.
desc (dict<int><ConceptOccurence>) – A map of concept ID to options
boost_first_occurance (bool) – If true, double the observations drawn from the first occurence of a concept. Allows a better model to be built and stored.
Examples
>>> # An example stream using the STAGGER Generator. >>> # Starts using generating function 0, then at >>> # observation 5000 transitions to generating function >>> # 1 then at 10000 transitions back to 0. >>> from skika.data.reccurring_concept_stream import RCStreamType, RecurringConceptStream, ConceptOccurence >>> concept_chain = {0: 0, 5000: 1, 10000: 0} >>> num_samples = 15000 >>> # init concept >>> concept_0 = ConceptOccurence(id=0, difficulty=2, noise=0, appearences=2, examples_per_appearence=5000) >>> concept_1 = ConceptOccurence(id=1, difficulty=3, noise=0, appearences=1, examples_per_appearence=5000) >>> desc = {0: concept_0, 1: concept_1} >>> datastream = RecurringConceptStream( rctype=RCStreamType.STAGGER, num_samples=num_samples, noise=0, concept_chain=concept_chain, seed=42, desc=desc, boost_first_occurance=False) >>> datastream.has_more_samples() True >>> datastream.get_drift_info() {0: 0, 5000: 1, 10000: 0} >>> datastream.n_remaining_samples() 15000 >>> datastream.get_stream_info() {0: 0, 5000: 1, 10000: 0} 0 - 5000: STAGGERGenerator(balance_classes=False, classification_function=0, random_state=42) 5000 - 10000: STAGGERGenerator(balance_classes=False, classification_function=1, random_state=43) 10000 - 15000: STAGGERGenerator(balance_classes=False, classification_function=0, random_state=42) >>> datastream.get_moa_stream_info() {0: 0, 5000: 1, 10000: 0} '(ConceptDriftStream -s (generators.STAGGERGenerator -f 1 -i 42) -d (ConceptDriftStream -s (generators.STAGGERGenerator -f 2 -i 43) -d (generators.STAGGERGenerator -f 1 -i 42) -p 5000 -w 1) -p 5000 -w 1)' >>> datastream.get_supplementary_info() >>> datastream.next_sample() (array([[2., 0., 2.]]), array([0])) >>> datastream.n_remaining_samples() 14999 >>> datastream.next_sample() (array([[2., 0., 0.]]), array([0])) >>> datastream.n_remaining_samples() 14998
- get_moa_stream_info()#
Returns a string to run the corresponding stream in MOA.
- get_moa_stream_string(concepts=None)#
Returns a string to run the corresponding stream in MOA.
- get_stream_info()#
Prints information about the concepts included in the stream.
- get_supplementary_info()#
Returns supplementary info about each concept.
- class skika.data.reccurring_concept_stream.SEAConcept(concept_id=0, seed=None, noise=0)#
A SEA concept.
- Parameters
concept_id (int) – The ID of the SEA generating function to use. Should be within 0-9.
seed (int) – The seed used by the random number generator.
noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
- class skika.data.reccurring_concept_stream.SINEConcept(concept_id=0, seed=None, noise=0)#
A SINE concept.
- Parameters
concept_id (int) – The ID of the SINE generating function to use. Should be within 0-9.
seed (int) – The seed used by the random number generator.
noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
- class skika.data.reccurring_concept_stream.STAGGERConcept(concept_id=0, seed=None, noise=0)#
A STAGGER concept.
- Parameters
concept_id (int) – The ID of the STAGGER generating function to use. Should be within 0-3.
seed (int) – The seed used by the random number generator.
noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
- class skika.data.reccurring_concept_stream.TREEConcept(concept_id=0, seed=None, noise=0, desc=None)#
A TREE concept.
- Parameters
concept_id (int) – The ID of the concept.
seed (int) – The seed used by the random number generator.
noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
desc (ConceptOccurence) – A class which describes the specific concept.
- class skika.data.reccurring_concept_stream.WindSimConcept(concept_id=0, seed=None, noise=0, desc=None)#
A WINDSIM concept.
- Parameters
concept_id (int) – The ID of the concept.
seed (int) – The seed used by the random number generator.
noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.
desc (ConceptOccurence) – A class which describes the specific concept.