skika.data.reccurring_concept_stream#

Classes

AGRAWALConcept([concept_id, seed, noise])

An AGRAWAL concept.

Concept(stream)

Base concept class.

ConceptOccurence(id, difficulty, noise, ...)

Represents a concept in a stream

RBFConcept([concept_id, seed, noise, desc])

An RBF concept.

RCStreamType(value)

An enumeration.

RecurringConceptGradualStream(rctype, ...[, ...])

A stream featuring gradual drift between given concepts.

RecurringConceptStream(rctype, num_samples, ...)

A stream featuring abrupt drift between given concepts.

SEAConcept([concept_id, seed, noise])

A SEA concept.

SINEConcept([concept_id, seed, noise])

A SINE concept.

STAGGERConcept([concept_id, seed, noise])

A STAGGER concept.

TREEConcept([concept_id, seed, noise, desc])

A TREE concept.

WindSimConcept([concept_id, seed, noise, desc])

A WINDSIM concept.

class skika.data.reccurring_concept_stream.AGRAWALConcept(concept_id=0, seed=None, noise=0)#

An AGRAWAL concept.

Parameters
  • concept_id (int) – The ID of the AGRAWAL generating function to use. Should be within 0-9.

  • seed (int) – The seed used by the random number generator.

  • noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

class skika.data.reccurring_concept_stream.Concept(stream)#

Base concept class. A ‘Concept’ can be thought of as a relationship between features and label. Here we model different concept using streams produced by different generating functions, i.e. each concept is a given distribution of data and an observation is drawn from one concept.

Parameters

stream (datastream) – The stream the concept will draw observations from.

class skika.data.reccurring_concept_stream.ConceptOccurence(id, difficulty, noise, appearences, examples_per_appearence)#

Represents a concept in a stream

class skika.data.reccurring_concept_stream.RBFConcept(concept_id=0, seed=None, noise=0, desc=None)#

An RBF concept.

Parameters
  • concept_id (int) – The ID of the concept.

  • seed (int) – The seed used by the random number generator.

  • noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

  • desc (ConceptOccurence) – A class which describes the specific concept.

class skika.data.reccurring_concept_stream.RCStreamType(value)#

An enumeration.

class skika.data.reccurring_concept_stream.RecurringConceptGradualStream(rctype, num_samples, noise, concept_chain, window_size=1000, seed=None, desc=None, boost_first_occurance=True)#

A stream featuring gradual drift between given concepts. Uses the scikit-multiflow concept drift stream to blend concepts over a window.

Parameters
  • rctype (RCStreamType) – An enum describing the type of stream

  • num_samples (int) – The number of samples in the stream

  • noise (float) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

  • concept_chain (list<int> or dict) – A dict with key observation number and value the concept begining at that observation or A list of concept ids. A dict will be generated with each concept lasting its length given in desc or uniform length.

  • window_size (int) – The number of observations each gradual drift is spread over.

  • seed (int) – Random seed.

  • desc (dict<int><ConceptOccurence>) – A map of concept ID to options

  • boost_first_occurance (bool) – If true, double the observations drawn from the first occurence of a concept. Allows a better model to be built and stored.

Examples

>>> # An example stream using the STAGGER Generator.
>>> # Starts using generating function 0, then at
>>> # observation 5000 transitions to generating function
>>> # 1 then at 10000 transitions back to 0.
>>> from skika.data.reccurring_concept_stream import RCStreamType, RecurringConceptGradualStream, ConceptOccurence
>>> concept_chain = {0: 0, 5000: 1, 10000: 0}
>>> num_samples = 15000
>>> # init concept
>>> concept_0 = ConceptOccurence(id=0, difficulty=2, noise=0,
                    appearences=2, examples_per_appearence=5000)
>>> concept_1 = ConceptOccurence(id=1, difficulty=3, noise=0,
                    appearences=1, examples_per_appearence=5000)
>>> desc = {0: concept_0, 1: concept_1}
>>> datastream = RecurringConceptGradualStream(
                    rctype=RCStreamType.STAGGER,
                    num_samples=num_samples,
                    noise=0,
                    concept_chain=concept_chain,
                    window_size=1000,
                    seed=42,
                    desc=desc,
                    boost_first_occurance=False)
>>> datastream.has_more_samples()
True
>>> datastream.get_drift_info()
{0: 0, 5000: 1, 10000: 0}
>>> datastream.n_remaining_samples()
15000
>>> datastream.get_stream_info()
{0: 0, 5000: 1, 10000: 0}
0 - 5000: STAGGERGenerator(balance_classes=False, classification_function=0,
                random_state=42)
5000 - 10000: STAGGERGenerator(balance_classes=False, classification_function=1,
                random_state=43)
10000 - 15000: STAGGERGenerator(balance_classes=False, classification_function=0,
                random_state=42)
>>> datastream.get_moa_stream_info()
{0: 0, 5000: 1, 10000: 0}
'(ConceptDriftStream -s (generators.STAGGERGenerator -f 1 -i 42) -d (ConceptDriftStream -s (generators.STAGGERGenerator -f 2 -i 43) -d (generators.STAGGERGenerator -f 1 -i 42) -p 5000 -w 1) -p 5000 -w 1)'
>>> datastream.get_supplementary_info()
>>> datastream.next_sample()
(array([[2., 0., 2.]]), array([0]))
>>> datastream.n_remaining_samples()
14999
>>> datastream.next_sample()
(array([[2., 0., 0.]]), array([0]))
>>> datastream.n_remaining_samples()
14998
get_moa_stream_info()#

Returns a string to run the corresponding stream in MOA.

get_moa_stream_string(concepts=None)#

Returns a string to run the corresponding stream in MOA.

get_stream_info()#

Prints information about the concepts included in the stream.

get_supplementary_info()#

Returns supplementary info about each concept.

class skika.data.reccurring_concept_stream.RecurringConceptStream(rctype, num_samples, noise, concept_chain, seed=None, desc=None, boost_first_occurance=True)#

A stream featuring abrupt drift between given concepts.

Parameters
  • rctype (RCStreamType) – An enum describing the type of stream

  • num_samples (int) – The number of samples in the stream

  • noise (float) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

  • concept_chain (list<int> or dict) – A dict with key observation number and value the concept beginning at that observation or A list of concept ids. A dict will be generated with each concept lasting its length given in desc or uniform length.

  • seed (int) – Random seed.

  • desc (dict<int><ConceptOccurence>) – A map of concept ID to options

  • boost_first_occurance (bool) – If true, double the observations drawn from the first occurence of a concept. Allows a better model to be built and stored.

Examples

>>> # An example stream using the STAGGER Generator.
>>> # Starts using generating function 0, then at
>>> # observation 5000 transitions to generating function
>>> # 1 then at 10000 transitions back to 0.
>>> from skika.data.reccurring_concept_stream import RCStreamType, RecurringConceptStream, ConceptOccurence
>>> concept_chain = {0: 0, 5000: 1, 10000: 0}
>>> num_samples = 15000
>>> # init concept
>>> concept_0 = ConceptOccurence(id=0, difficulty=2, noise=0,
                    appearences=2, examples_per_appearence=5000)
>>> concept_1 = ConceptOccurence(id=1, difficulty=3, noise=0,
                    appearences=1, examples_per_appearence=5000)
>>> desc = {0: concept_0, 1: concept_1}
>>> datastream = RecurringConceptStream(
                    rctype=RCStreamType.STAGGER,
                    num_samples=num_samples,
                    noise=0,
                    concept_chain=concept_chain,
                    seed=42,
                    desc=desc,
                    boost_first_occurance=False)
>>> datastream.has_more_samples()
True
>>> datastream.get_drift_info()
{0: 0, 5000: 1, 10000: 0}
>>> datastream.n_remaining_samples()
15000
>>> datastream.get_stream_info()
{0: 0, 5000: 1, 10000: 0}
0 - 5000: STAGGERGenerator(balance_classes=False, classification_function=0,
                random_state=42)
5000 - 10000: STAGGERGenerator(balance_classes=False, classification_function=1,
                random_state=43)
10000 - 15000: STAGGERGenerator(balance_classes=False, classification_function=0,
                random_state=42)
>>> datastream.get_moa_stream_info()
{0: 0, 5000: 1, 10000: 0}
'(ConceptDriftStream -s (generators.STAGGERGenerator -f 1 -i 42) -d (ConceptDriftStream -s (generators.STAGGERGenerator -f 2 -i 43) -d (generators.STAGGERGenerator -f 1 -i 42) -p 5000 -w 1) -p 5000 -w 1)'
>>> datastream.get_supplementary_info()
>>> datastream.next_sample()
(array([[2., 0., 2.]]), array([0]))
>>> datastream.n_remaining_samples()
14999
>>> datastream.next_sample()
(array([[2., 0., 0.]]), array([0]))
>>> datastream.n_remaining_samples()
14998
get_moa_stream_info()#

Returns a string to run the corresponding stream in MOA.

get_moa_stream_string(concepts=None)#

Returns a string to run the corresponding stream in MOA.

get_stream_info()#

Prints information about the concepts included in the stream.

get_supplementary_info()#

Returns supplementary info about each concept.

class skika.data.reccurring_concept_stream.SEAConcept(concept_id=0, seed=None, noise=0)#

A SEA concept.

Parameters
  • concept_id (int) – The ID of the SEA generating function to use. Should be within 0-9.

  • seed (int) – The seed used by the random number generator.

  • noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

class skika.data.reccurring_concept_stream.SINEConcept(concept_id=0, seed=None, noise=0)#

A SINE concept.

Parameters
  • concept_id (int) – The ID of the SINE generating function to use. Should be within 0-9.

  • seed (int) – The seed used by the random number generator.

  • noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

class skika.data.reccurring_concept_stream.STAGGERConcept(concept_id=0, seed=None, noise=0)#

A STAGGER concept.

Parameters
  • concept_id (int) – The ID of the STAGGER generating function to use. Should be within 0-3.

  • seed (int) – The seed used by the random number generator.

  • noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

class skika.data.reccurring_concept_stream.TREEConcept(concept_id=0, seed=None, noise=0, desc=None)#

A TREE concept.

Parameters
  • concept_id (int) – The ID of the concept.

  • seed (int) – The seed used by the random number generator.

  • noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

  • desc (ConceptOccurence) – A class which describes the specific concept.

class skika.data.reccurring_concept_stream.WindSimConcept(concept_id=0, seed=None, noise=0, desc=None)#

A WINDSIM concept.

Parameters
  • concept_id (int) – The ID of the concept.

  • seed (int) – The seed used by the random number generator.

  • noise (int) – The probability that noise will happen in the generation. At each new sample generated, the sample with will perturbed by the amount of perturbation. Values go from 0.0 to 1.0.

  • desc (ConceptOccurence) – A class which describes the specific concept.