skika.data.bernoulli_stream#

Classes

BernoulliStream(drift_period, n_drifts, ...)

A class to generate a Bernoulli Stream

class skika.data.bernoulli_stream.BernoulliStream(drift_period, n_drifts, widths_drifts, mean_errors, n_stable_drifts=1, random_state=0)#

A class to generate a Bernoulli Stream

The stream is simulating the error rate of a learner. It is possible to generate drifts by changing the mean error rate at change points.

Change points are generated regularly along the stream, every drift_period instances.

The width of the drifts can be specified. If only one value is given, it is applied to every drift. If several ones are specified, they are uniformly reparted and randomly applied to the drifts.

The new mean error rate after each drift is picked in the mean_errors list.

It is possible to modulate the number of following drifts with the same characteristics with the parameter n_stable_drifts.

It is possible to retrive drift chracteristics from the stream : magnitude and severity of drifts.

Arguments :
drift_periodint

Number of instances between two drifts

n_driftsint

Number of drifts to generate

widths_driftslist of int

Width(s) to be applied to the drifts.

mean_errorslist of float, or list of lists of floats
List of mean errors to simulation the concepts. Can either be :
  • List of mean errors values. The stream will than radomly pair the values to create stable drift periods patterns. Must have at least 2 values to simulate drifts from 1 comcept to another. Ex : mean_errors = [0.1,0.2,0.5,0.6]

  • List of lists. Each of the sub-lists must be a pair of two different error rates reprensenting one stable drift period. Ex : mean_errors = [[0.1,0.2],[0.5,0.6],[0.8,0.9]]

n_stable_driftsint, (optional, default = 1)

Number of following drifts with the same patterns (width and mean_errors). Ex : if n_stable_drifts = 5, the characteristics of the drifts are changing every 5 drifts. This enable to generate streams with more or less drifts diversity.

random_stateint (optional, default = 0)

Random state for the pseudo-random generators.

Examples

>>> # Imports
>>> from bernoulli_stream import BernoulliStream
>>> # Setting up the stream
stream = BernoulliStream(drift_period=1000, n_drifts = 10, widths_drifts = [1,500], mean_errors = [[0.0,1.0],[0.2,0.8]], n_stable_drifts = 5)
>>> stream.prepare_for_use()
>>> # Retrieving one sample
>>> stream.next_sample()
>>> array([1.])
>>> stream.list_positions
>>> [1000, 2001, 3002, 4502, 6002, 7003, 8004, 9504, 11004, 12005]
>>> stream.n_samples
>>> 13006
choiceWithoutRepet(n_iter, list_choices)#

Generate a list of n_iter items from lit_choices without following repetition. If len(list_choice) > 2, the list is generated without repetitions every 3 items.

property current_drift_magnitude#

Retrieve the current_drift_magnitude.

Returns

The current_drift_magnitude.

Return type

int

property current_drift_severity#

Retrieve the current_drift_severity. :returns: The current_drift_severity. :rtype: int

property current_mean_error#

Retrieve the current_mean_error.

Returns

The current_mean_error.

Return type

float

property current_width#

Retrieve the current_width.

Returns

The current_width.

Return type

int

get_data_info()#

Retrieves information from the stream

The default format is: ‘Stream name - n_samples, n_drifts, widths_drifts, error_rates’.

Returns

string

Stream data information

property list_positions#

Retrieve the list of drifts positions.

Returns

The list of drifts positions.

Return type

list

property n_samples#

Retrieve the length of the stream.

Returns

The length of the stream.

Return type

int

property next_mean_error#

Retrieve the next_mean_error.

Returns

The next_mean_error.

Return type

float

next_sample(batch_size=1)#
The sample generation works as follows:

A prediction 0 or 1 is generated by the random Bernoulli process, based on the current and next mean errors. The probability of drift is calculated and updated at every call based on the current sample index and the next drift position and width.

Drift characteristics are udated every time a drift happens.

Parameters

batch_size (int) – The number of samples to return (works for batch_size == 1 only for the moment)

Returns

Return a tuple with the predictions matrix for the batch_size samples that were requested.

Return type

tuple or tuple list

perform_bernoulli_trials(n, p)#

Perform n Bernoulli trials with success probability p and return number of successes.

prepare_for_use()#

Prepares the stream for use.

Notes

This functions should always be called after the stream initialization.