skika.data.hyper_plane_generator_redund#

Classes

HyperplaneGeneratorRedund([random_state, ...])

Hyperplane stream generator.

class skika.data.hyper_plane_generator_redund.HyperplaneGeneratorRedund(random_state=None, n_features=10, n_drift_features=2, perc_redund_feature=0, mag_change=0.0, noise_percentage=0.05, sigma_percentage=0.1)#

Hyperplane stream generator.

Modified version of scikit-multiflow code to include generation of redundant attributes.

Generates a problem of prediction class of a rotation hyperplane. It was used as testbed for CVFDT and VFDT in 1.

A hyperplane in d-dimensional space is the set of points \(x\) that satisfy \(\sum^{d}_{i=1} w_i x_i = w_0 = \sum^{d}_{i=1} w_i\), where \(x_i\) is the ith coordinate of \(x\). Examples for which \(\sum^{d}_{i=1} w_i x_i > w_0\), are labeled positive, and examples for which \(\sum^{d}_{i=1} w_i x_i \leq w_0\), are labeled negative.

Hyperplanes are useful for simulating time-changing concepts, because we can change the orientation and position of the hyperplane in a smooth manner by changing the relative size of the weights. We introduce change to this dataset by adding drift to each weight feature \(w_i = w_i + d \sigma\), where \(\sigma\) is the probability that the direction of change is reversed and \(d\) is the change applied to every example.

Parameters
  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • n_features (int (Default 10)) – The number of attributes to generate. Higher than 2.

  • n_drift_features (int (Default: 2)) – The number of attributes with drift. Higher than 2.

  • perc_redund_feature (float (Default: 0.0)) – The percentage of features to be redundant. From 0.0 to 1.0.

  • mag_change (float (Default: 0.0)) – Magnitude of the change for every example. From 0.0 to 1.0.

  • noise_percentage (float (Default: 0.05)) – Percentage of noise to add to the data. From 0.0 to 1.0.

  • sigma_percentage (int (Default 0.1)) – Percentage of probability that the direction of change is reversed. From 0.0 to 1.0.

References

1

G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press.

property feature_names#

Retrieve the names of the features.

Returns

names of the features

Return type

list

get_data_info()#

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

Stream data information

Return type

string

get_info()#

Collects and returns the information about the configuration of the estimator

Returns

Configuration of the estimator.

Return type

string

get_params(deep=True)#

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

has_more_samples()#

Checks if stream has more samples.

Returns

True if stream has more samples.

Return type

Boolean

is_restartable()#

Determine if the stream is restartable. :returns: True if stream is restartable. :rtype: Boolean

last_sample()#

Retrieves last batch_size samples in the stream.

Returns

A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

Return type

tuple or tuple list

property mag_change#

Retrieve the value of the value of magnitude of change.

Returns

magnitude of change

Return type

float

property n_cat_features#

Retrieve the number of integer features.

Returns

The number of integer features in the stream.

Return type

int

property n_drift_features#

Retrieve the number of drift features.

Returns

The total number of drift features.

Return type

int

property n_features#

Retrieve the number of features.

Returns

The total number of features.

Return type

int

property n_num_features#

Retrieve the number of numerical features.

Returns

The number of numerical features in the stream.

Return type

int

n_remaining_samples()#

Returns the estimated number of remaining samples.

Returns

Remaining number of samples. -1 if infinite (e.g. generator)

Return type

int

property n_targets#

Retrieve the number of targets

Returns

the number of targets in the stream.

Return type

int

next_sample(batch_size=1)#

The sample generation works as follows: The features are generated with the random generator, initialized with the seed passed by the user. Then the classification function decides, as a function of the sum and weight’s sum, whether to instance belongs to class 0 or class 1. The next step is to add noise if requested by the user and than generate drift.

Parameters

batch_size (int) – The number of samples to return.

Returns

Return a tuple with the features matrix and the labels matrix for the batch_size samples that were requested.

Return type

tuple or tuple list

property noise_percentage#

Retrieve the value of the value of Noise percentage

Returns

percentage of the noise

Return type

float

property perc_redund_features#

Retrieve the number of redundant features. :returns: The total number of redundant features. :rtype: int

prepare_for_use()#

Prepares the stream for use.

Notes

This functions should always be called after the stream initialization.

reset()#

Resets the estimator to its initial state.

Return type

self

restart()#

Restart the stream.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type

self

property sigma_percentage#

Retrieve the value of the value of sigma percentage

Returns

percentage of the sigma

Return type

float

property target_names#

Retrieve the names of the targets

Returns

the names of the targets in the stream.

Return type

list

property target_values#

Retrieve all target_values in the stream for each target.

Returns

list of lists of all target_values for each target

Return type

list