module feature_selection.base_feature_selector
Base Online Feature Selector.
This module encapsulates functionality for online feature weighting and selection. The abstract BaseFeatureSelector class should be used as a super class for all online feature selection methods.
Copyright (C) 2022 Johannes Haug.
class BaseFeatureSelector
Abstract base class for online feature selection methods.
Attributes:
n_total_features
(int): The total number of features.n_selected_features
(int): The number of selected features.supports_multi_class
(bool): True if the feature selection model supports multi-class classification, False otherwise.reset_after_drift
(bool): A boolean indicating if the change detector will be reset after a drift was detected.baseline
(str): A string identifier of the baseline method. The baseline is the value that we substitute non-selected features with. This is necessary, because most online learning models are not able to handle arbitrary patterns of missing data.ref_sample
(ArrayLike | float): A sample used to compute the baseline. If the constant baseline is used, one needs to provide a single float value.weights
(ArrayLike): The current (raw) feature weights.selected_features
(ArrayLike): The indices of all currently selected features.weights_history
(List[list]): A list of all absolute feature weight vectors obtained over time.selected_features_history
(List[list]): A list of all selected feature vectors obtained over time.
method BaseFeatureSelector.__init__
__init__(
n_total_features: int,
n_selected_features: int,
supports_multi_class: bool,
reset_after_drift: bool,
baseline: str,
ref_sample: Union[float, numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
)
Inits the feature selector.
Args:
n_total_features
: The total number of features.n_selected_features
: The number of selected features.supports_multi_class
: True if the feature selection model supports multi-class classification, False otherwise.reset_after_drift
: A boolean indicating if the change detector will be reset after a drift was detected.baseline
: A string identifier of the baseline method. The baseline is the value that we substitute non-selected features with. This is necessary, because most online learning models are not able to handle arbitrary patterns of missing data.ref_sample
: A sample used to compute the baseline. If the constant baseline is used, one needs to provide a single float value.
method BaseFeatureSelector.reset
reset()
Resets the feature selector.
method BaseFeatureSelector.select_features
select_features(
X: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
rng: numpy.random._generator.Generator
) → Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
Selects features with highest absolute weights.
Args:
X
: Array/matrix of observations.rng
: A numpy random number generator object.
Returns:
ArrayLike
: The observation array/matrix where all non-selected features have been replaced by the baseline value.
method BaseFeatureSelector.weight_features
weight_features(
X: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
y: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
)
Updates feature weights.
Args:
X
: Array/matrix of observations.y
: Array of corresponding labels.
This file was automatically generated via lazydocs.