`module` `feature_selection.fsds`

Fast Feature Selection on Data Streams Method.

This module contains the Fast Feature Selection in Data Streams model that is able to select features via a sketching algorithm without requiring supervision. The method was introduced by: HUANG, Hao; YOO, Shinjae; KASIVISWANATHAN, Shiva Prasad. Unsupervised feature selection on data streams. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015. S. 1031-1040.

`class` `FSDS`

FSDS feature selector.

This code is adopted from the official Python implementation of the authors with minor adaptations.

`method` `FSDS.init`

__init__(
    n_total_features: int,
    n_selected_features: int,
    l: int = 0,
    m: Optional[int] = None,
    B: Optional[list, numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] = None,
    k: int = 2,
    reset_after_drift: bool = False,
    baseline: str = 'constant',
    ref_sample: Union[float, numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] = 0
)

Inits the feature selector.

Args:

n_total_features: The total number of features.
n_selected_features: The number of selected features.
l: Size of the matrix sketch with l << m.
m: Size of the feature space (i.e. dimensionality).
B: Matrix sketch.
k: Number of singular vectors.
reset_after_drift: A boolean indicating if the change detector will be reset after a drift was detected.
baseline A string identifier of the baseline method. The baseline is the value that we substitute non-selected features with. This is necessary, because most online learning models are not able to handle arbitrary patterns of missing data.
ref_sample: A sample used to compute the baseline. If the constant baseline is used, one needs to provide a single float value.

`method` `FSDS.reset`

reset()

Resets the feature selector.

`method` `FSDS.weight_features`

weight_features(
    X: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
    y: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
)

Updates feature weights.

FSDS is an unsupervised approach and does not use the target information.

Args:

X: Array/matrix of observations.
y: Array of corresponding labels.

This file was automatically generated via lazydocs.

module feature_selection.fsds

class FSDS

method FSDS.__init__

method FSDS.reset

method FSDS.weight_features

`module` `feature_selection.fsds`

`class` `FSDS`

`method` `FSDS.init`

`method` `FSDS.reset`

`method` `FSDS.weight_features`