module feature_selection.base_feature_selector

Base Online Feature Selector.

This module encapsulates functionality for online feature weighting and selection. The abstract BaseFeatureSelector class should be used as a super class for all online feature selection methods.

Copyright (C) 2022 Johannes Haug.


class BaseFeatureSelector

Abstract base class for online feature selection methods.

Attributes:

  • n_total_features (int): The total number of features.
  • n_selected_features (int): The number of selected features.
  • supports_multi_class (bool): True if the feature selection model supports multi-class classification, False otherwise.
  • reset_after_drift (bool): A boolean indicating if the change detector will be reset after a drift was detected.
  • baseline (str): A string identifier of the baseline method. The baseline is the value that we substitute non-selected features with. This is necessary, because most online learning models are not able to handle arbitrary patterns of missing data.
  • ref_sample (ArrayLike | float): A sample used to compute the baseline. If the constant baseline is used, one needs to provide a single float value.
  • weights (ArrayLike): The current (raw) feature weights.
  • selected_features (ArrayLike): The indices of all currently selected features.
  • weights_history (List[list]): A list of all absolute feature weight vectors obtained over time.
  • selected_features_history (List[list]): A list of all selected feature vectors obtained over time.

method BaseFeatureSelector.__init__

__init__(
    n_total_features: int,
    n_selected_features: int,
    supports_multi_class: bool,
    reset_after_drift: bool,
    baseline: str,
    ref_sample: Union[float, numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
)

Inits the feature selector.

Args:

  • n_total_features: The total number of features.
  • n_selected_features: The number of selected features.
  • supports_multi_class: True if the feature selection model supports multi-class classification, False otherwise.
  • reset_after_drift: A boolean indicating if the change detector will be reset after a drift was detected.
  • baseline: A string identifier of the baseline method. The baseline is the value that we substitute non-selected features with. This is necessary, because most online learning models are not able to handle arbitrary patterns of missing data.
  • ref_sample: A sample used to compute the baseline. If the constant baseline is used, one needs to provide a single float value.

method BaseFeatureSelector.reset

reset()

Resets the feature selector.


method BaseFeatureSelector.select_features

select_features(
    X: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
    rng: numpy.random._generator.Generator
) → Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]

Selects features with highest absolute weights.

Args:

  • X: Array/matrix of observations.
  • rng: A numpy random number generator object.

Returns:

  • ArrayLike: The observation array/matrix where all non-selected features have been replaced by the baseline value.

method BaseFeatureSelector.weight_features

weight_features(
    X: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
    y: Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
)

Updates feature weights.

Args:

  • X: Array/matrix of observations.
  • y: Array of corresponding labels.

This file was automatically generated via lazydocs.