skika.data.random_rbf_generator_redund#

Classes

Centroid()

Class that stores a centroid's attributes.

RandomRBFGeneratorRedund([...])

Random Radial Basis Function stream generator.

class skika.data.random_rbf_generator_redund.Centroid#

Class that stores a centroid’s attributes.

class skika.data.random_rbf_generator_redund.RandomRBFGeneratorRedund(model_random_state=None, sample_random_state=None, n_classes=2, n_features=10, perc_redund_feature=0.0, n_centroids=50, noise_percentage=0.0)#

Random Radial Basis Function stream generator.

Modified version of scikit-multiflow code to include generation of redundant attributes.

Produces a radial basis function stream. A number of centroids, having a random central position, a standard deviation, a class label and weight, are generated. A new sample is created by choosing one of the centroids at random, taking into account their weights, and offsetting the attributes at a random direction from the centroid’s center. The offset length is drawn from a Gaussian distribution.

This process will create a normally distributed hypersphere of samples on the surrounds of each centroid.

We added a parameter to set a percentage of redundant features among the total number number of features.

Parameters
  • model_random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random..

  • sample_random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random..

  • n_classes (int (Default: 2)) – The number of class labels to generate.

  • n_features (int (Default: 10)) – The number of numerical features to generate.

  • perc_redund_feature (float (Default: 0.0)) – The percentage of features to be redundant. From 0.0 to 1.0.

  • n_centroids (int (Default: 50)) – The number of centroids to generate.

  • noise_percentage (float (Default: 0.0)) – Percentage of noise to add to the data. From 0.0 to 1.0.

Examples

>>> # Imports
>>> from skika.data.random_rbf_generator import RandomRBFGeneratorRedund
>>> # Setting up the stream
>>> stream = RandomRBFGeneratorRedund(model_random_state=99, sample_random_state=50, n_classes=4, n_features=10, perc_redund_feature = 0.4, n_centroids=50)
>>> stream.prepare_for_use()
>>> # Retrieving one sample
>>> stream.next_sample()
(array([[0.44952282, 1.09201096, 0.34778443, 0.92181679, 0.19503463,
     0.28834419, 0.44952282, 0.19503463, 0.92181679, 0.19503463]]),
array([3]))
>>> # Retrieving 10 samples
>>> stream.next_sample(10)
(array([[ 0.70374896,  0.65752835,  0.20343463,  0.56136917,  0.76659286,
      0.61081231,  0.70374896,  0.76659286,  0.56136917,  0.76659286],
    [ 0.27797196,  0.05640135,  0.80946171,  0.60572837,  0.95080656,
      0.25512099,  0.27797196,  0.95080656,  0.60572837,  0.95080656],
    [ 0.33696167,  0.10923638,  0.85987231,  0.61868598,  0.85755211,
      0.19469184,  0.33696167,  0.85755211,  0.61868598,  0.85755211],
    [ 0.71886223,  0.23078927,  0.45013806,  0.03019141,  0.42679505,
      0.03841721,  0.71886223,  0.42679505,  0.03019141,  0.42679505],
    [-0.01849262,  0.92570731,  0.87564868,  0.49372553,  0.39717634,
      0.46697609, -0.01849262,  0.39717634,  0.49372553,  0.39717634],
    [ 0.81850217,  0.87228851,  0.18873385, -0.04254749,  0.06942877,
      0.55567756,  0.81850217,  0.06942877, -0.04254749,  0.06942877],
    [ 0.69888163,  0.61994977,  0.43074298,  0.27526838,  0.69566798,
      0.91059369,  0.69888163,  0.69566798,  0.27526838,  0.69566798],
    [ 1.01929588,  0.80181051,  0.50547533,  0.14715636,  0.42889167,
      0.61513174,  1.01929588,  0.42889167,  0.14715636,  0.42889167],
    [ 0.37738633,  0.60922205,  0.64216064,  0.90009707,  0.91787083,
      0.36189554,  0.37738633,  0.91787083,  0.90009707,  0.91787083],
    [ 0.62185359,  0.75178244,  1.00436662,  0.24412816,  0.41070861,
      0.52547739,  0.62185359,  0.41070861,  0.24412816,  0.41070861]]),
array([3, 3, 3, 2, 3, 2, 0, 2, 0, 2]))
>>> # Generators will have infinite remaining instances, so it returns -1
>>> stream.n_remaining_samples()
-1
>>> stream.has_more_samples()
True
property feature_names#

Retrieve the names of the features.

Returns

names of the features

Return type

list

generate_centroids()#

Sequentially creates all the centroids, choosing at random a center, a label, a standard deviation and a weight.

get_data_info()#

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

Stream data information

Return type

string

get_info()#

Collects and returns the information about the configuration of the estimator

Returns

Configuration of the estimator.

Return type

string

get_params(deep=True)#

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

has_more_samples()#

Checks if stream has more samples.

Returns

True if stream has more samples.

Return type

Boolean

is_restartable()#

Determine if the stream is restartable. :returns: True if stream is restartable. :rtype: Boolean

last_sample()#

Retrieves last batch_size samples in the stream.

Returns

A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

Return type

tuple or tuple list

property n_cat_features#

Retrieve the number of integer features.

Returns

The number of integer features in the stream.

Return type

int

property n_features#

Retrieve the number of features.

Returns

The total number of features.

Return type

int

property n_num_features#

Retrieve the number of numerical features.

Returns

The number of numerical features in the stream.

Return type

int

n_remaining_samples()#

Returns the estimated number of remaining samples.

Returns

Remaining number of samples. -1 if infinite (e.g. generator)

Return type

int

property n_targets#

Retrieve the number of targets

Returns

the number of targets in the stream.

Return type

int

next_sample(batch_size=1)#

Return batch_size samples generated by choosing a centroid at random and randomly offsetting its attributes so that it is placed inside the hypersphere of that centroid.

Parameters

batch_size (int) – The number of samples to return.

Returns

Return a tuple with the features matrix and the labels matrix for the batch_size samples that were requested.

Return type

tuple or tuple list

property perc_redund_features#

Retrieve the number of redundant features. :returns: The total number of redundant features. :rtype: int

prepare_for_use()#

Prepares the stream for use.

Notes

This functions should always be called after the stream initialization.

reset()#

Resets the estimator to its initial state.

Return type

self

restart()#

Restart the stream.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type

self

property target_names#

Retrieve the names of the targets

Returns

the names of the targets in the stream.

Return type

list

property target_values#

Retrieve all target_values in the stream for each target.

Returns

list of lists of all target_values for each target

Return type

list