skika.data.hyper_plane_generator_redund#
Classes
|
Hyperplane stream generator. |
- class skika.data.hyper_plane_generator_redund.HyperplaneGeneratorRedund(random_state=None, n_features=10, n_drift_features=2, perc_redund_feature=0, mag_change=0.0, noise_percentage=0.05, sigma_percentage=0.1)#
Hyperplane stream generator.
Modified version of scikit-multiflow code to include generation of redundant attributes.
Generates a problem of prediction class of a rotation hyperplane. It was used as testbed for CVFDT and VFDT in 1.
A hyperplane in d-dimensional space is the set of points \(x\) that satisfy \(\sum^{d}_{i=1} w_i x_i = w_0 = \sum^{d}_{i=1} w_i\), where \(x_i\) is the ith coordinate of \(x\). Examples for which \(\sum^{d}_{i=1} w_i x_i > w_0\), are labeled positive, and examples for which \(\sum^{d}_{i=1} w_i x_i \leq w_0\), are labeled negative.
Hyperplanes are useful for simulating time-changing concepts, because we can change the orientation and position of the hyperplane in a smooth manner by changing the relative size of the weights. We introduce change to this dataset by adding drift to each weight feature \(w_i = w_i + d \sigma\), where \(\sigma\) is the probability that the direction of change is reversed and \(d\) is the change applied to every example.
- Parameters
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
n_features (int (Default 10)) – The number of attributes to generate. Higher than 2.
n_drift_features (int (Default: 2)) – The number of attributes with drift. Higher than 2.
perc_redund_feature (float (Default: 0.0)) – The percentage of features to be redundant. From 0.0 to 1.0.
mag_change (float (Default: 0.0)) – Magnitude of the change for every example. From 0.0 to 1.0.
noise_percentage (float (Default: 0.05)) – Percentage of noise to add to the data. From 0.0 to 1.0.
sigma_percentage (int (Default 0.1)) – Percentage of probability that the direction of change is reversed. From 0.0 to 1.0.
References
- 1
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press.
- property feature_names#
Retrieve the names of the features.
- Returns
names of the features
- Return type
list
- get_data_info()#
Retrieves minimum information from the stream
Used by evaluator methods to id the stream.
The default format is: ‘Stream name - n_targets, n_classes, n_features’.
- Returns
Stream data information
- Return type
string
- get_info()#
Collects and returns the information about the configuration of the estimator
- Returns
Configuration of the estimator.
- Return type
string
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
- has_more_samples()#
Checks if stream has more samples.
- Returns
True if stream has more samples.
- Return type
Boolean
- is_restartable()#
Determine if the stream is restartable. :returns: True if stream is restartable. :rtype: Boolean
- last_sample()#
Retrieves last batch_size samples in the stream.
- Returns
A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.
- Return type
tuple or tuple list
- property mag_change#
Retrieve the value of the value of magnitude of change.
- Returns
magnitude of change
- Return type
float
- property n_cat_features#
Retrieve the number of integer features.
- Returns
The number of integer features in the stream.
- Return type
int
- property n_drift_features#
Retrieve the number of drift features.
- Returns
The total number of drift features.
- Return type
int
- property n_features#
Retrieve the number of features.
- Returns
The total number of features.
- Return type
int
- property n_num_features#
Retrieve the number of numerical features.
- Returns
The number of numerical features in the stream.
- Return type
int
- n_remaining_samples()#
Returns the estimated number of remaining samples.
- Returns
Remaining number of samples. -1 if infinite (e.g. generator)
- Return type
int
- property n_targets#
Retrieve the number of targets
- Returns
the number of targets in the stream.
- Return type
int
- next_sample(batch_size=1)#
The sample generation works as follows: The features are generated with the random generator, initialized with the seed passed by the user. Then the classification function decides, as a function of the sum and weight’s sum, whether to instance belongs to class 0 or class 1. The next step is to add noise if requested by the user and than generate drift.
- Parameters
batch_size (int) – The number of samples to return.
- Returns
Return a tuple with the features matrix and the labels matrix for the batch_size samples that were requested.
- Return type
tuple or tuple list
- property noise_percentage#
Retrieve the value of the value of Noise percentage
- Returns
percentage of the noise
- Return type
float
- property perc_redund_features#
Retrieve the number of redundant features. :returns: The total number of redundant features. :rtype: int
- prepare_for_use()#
Prepares the stream for use.
Notes
This functions should always be called after the stream initialization.
- reset()#
Resets the estimator to its initial state.
- Return type
self
- restart()#
Restart the stream.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Return type
self
- property sigma_percentage#
Retrieve the value of the value of sigma percentage
- Returns
percentage of the sigma
- Return type
float
- property target_names#
Retrieve the names of the targets
- Returns
the names of the targets in the stream.
- Return type
list
- property target_values#
Retrieve all target_values in the stream for each target.
- Returns
list of lists of all target_values for each target
- Return type
list