skika.data.bernoulli_stream#
Classes
|
A class to generate a Bernoulli Stream |
- class skika.data.bernoulli_stream.BernoulliStream(drift_period, n_drifts, widths_drifts, mean_errors, n_stable_drifts=1, random_state=0)#
A class to generate a Bernoulli Stream
The stream is simulating the error rate of a learner. It is possible to generate drifts by changing the mean error rate at change points.
Change points are generated regularly along the stream, every drift_period instances.
The width of the drifts can be specified. If only one value is given, it is applied to every drift. If several ones are specified, they are uniformly reparted and randomly applied to the drifts.
The new mean error rate after each drift is picked in the mean_errors list.
It is possible to modulate the number of following drifts with the same characteristics with the parameter n_stable_drifts.
It is possible to retrive drift chracteristics from the stream : magnitude and severity of drifts.
- Arguments :
- drift_periodint
Number of instances between two drifts
- n_driftsint
Number of drifts to generate
- widths_driftslist of int
Width(s) to be applied to the drifts.
- mean_errorslist of float, or list of lists of floats
- List of mean errors to simulation the concepts. Can either be :
List of mean errors values. The stream will than radomly pair the values to create stable drift periods patterns. Must have at least 2 values to simulate drifts from 1 comcept to another. Ex : mean_errors = [0.1,0.2,0.5,0.6]
List of lists. Each of the sub-lists must be a pair of two different error rates reprensenting one stable drift period. Ex : mean_errors = [[0.1,0.2],[0.5,0.6],[0.8,0.9]]
- n_stable_driftsint, (optional, default = 1)
Number of following drifts with the same patterns (width and mean_errors). Ex : if n_stable_drifts = 5, the characteristics of the drifts are changing every 5 drifts. This enable to generate streams with more or less drifts diversity.
- random_stateint (optional, default = 0)
Random state for the pseudo-random generators.
Examples
>>> # Imports >>> from bernoulli_stream import BernoulliStream
>>> # Setting up the stream stream = BernoulliStream(drift_period=1000, n_drifts = 10, widths_drifts = [1,500], mean_errors = [[0.0,1.0],[0.2,0.8]], n_stable_drifts = 5) >>> stream.prepare_for_use()
>>> # Retrieving one sample >>> stream.next_sample() >>> array([1.])
>>> stream.list_positions >>> [1000, 2001, 3002, 4502, 6002, 7003, 8004, 9504, 11004, 12005]
>>> stream.n_samples >>> 13006
- choiceWithoutRepet(n_iter, list_choices)#
Generate a list of n_iter items from lit_choices without following repetition. If len(list_choice) > 2, the list is generated without repetitions every 3 items.
- property current_drift_magnitude#
Retrieve the current_drift_magnitude.
- Returns
The current_drift_magnitude.
- Return type
int
- property current_drift_severity#
Retrieve the current_drift_severity. :returns: The current_drift_severity. :rtype: int
- property current_mean_error#
Retrieve the current_mean_error.
- Returns
The current_mean_error.
- Return type
float
- property current_width#
Retrieve the current_width.
- Returns
The current_width.
- Return type
int
- get_data_info()#
Retrieves information from the stream
The default format is: ‘Stream name - n_samples, n_drifts, widths_drifts, error_rates’.
Returns
- string
Stream data information
- property list_positions#
Retrieve the list of drifts positions.
- Returns
The list of drifts positions.
- Return type
list
- property n_samples#
Retrieve the length of the stream.
- Returns
The length of the stream.
- Return type
int
- property next_mean_error#
Retrieve the next_mean_error.
- Returns
The next_mean_error.
- Return type
float
- next_sample(batch_size=1)#
- The sample generation works as follows:
A prediction 0 or 1 is generated by the random Bernoulli process, based on the current and next mean errors. The probability of drift is calculated and updated at every call based on the current sample index and the next drift position and width.
Drift characteristics are udated every time a drift happens.
- Parameters
batch_size (int) – The number of samples to return (works for batch_size == 1 only for the moment)
- Returns
Return a tuple with the predictions matrix for the batch_size samples that were requested.
- Return type
tuple or tuple list
- perform_bernoulli_trials(n, p)#
Perform n Bernoulli trials with success probability p and return number of successes.
- prepare_for_use()#
Prepares the stream for use.
Notes
This functions should always be called after the stream initialization.