module prediction.dynamic_model_tree

Dynamic Model Tree Classifier.

This module contains an implementation of the Dynamic Model Tree classification framework proposed in:

Haug, Johannes; Broelemann, Klaus; Kasneci, Gjergji. Dynamic Model Tree for Interpretable Data Stream Learning. In: 38th IEEE International Conference on Data Engineering, DOI: 10.1109/ICDE53745.2022.00237, 2022.

Copyright (C) 2022 Johannes Haug.


class DynamicModelTreeClassifier

Dynamic Model Tree Classifier.

This implementation of the DMT uses linear (logit) simple models and the negative log likelihood loss (as described in the corresponding paper).

Attributes:

  • classes (List): List of the target classes.
  • learning_rate (float): Learning rate of the linear models.
  • penalty_term (float): Regularization term for the linear model (0 = no regularization penalty).
  • penalty (str): String identifier of the type of regularization used by the linear model. Either 'l1', 'l2', or 'elasticnet' (see documentation of sklearn SGDClassifier).
  • epsilon (float): Threshold required before attempting to split or prune based on the Akaike Information Criterion. The smaller the epsilon-threshold, the stronger the evidence for splitting/pruning must be. Choose 0 < epsilon <= 1.
  • n_saved_candidates (int): Max. number of saved split candidates per node.
  • p_replaceable_candidates (float): Max. percent of saved split candidates that can be replaced by new/better candidates per training iteration.
  • cat_features (List[int]): List of indices (pos. in the feature vector) corresponding to categorical features.
  • root (Node): Root node of the Dynamic Model Tree.

method DynamicModelTreeClassifier.__init__

__init__(
    classes: List,
    learning_rate: float = 0.05,
    penalty_term: float = 0,
    penalty: str = 'l2',
    epsilon: float = 1e-07,
    n_saved_candidates: int = 100,
    p_replaceable_candidates: float = 0.5,
    cat_features: Optional[List[int]] = None,
    reset_after_drift: Optional[bool] = False
)

Inits the DMT.

Args:

  • classes: List of the target classes.
  • learning_rate: Learning rate of the linear models.
  • penalty_term: Regularization term for the linear model (0 = no regularization penalty).
  • penalty: String identifier of the type of regularization used by the linear model. Either 'l1', 'l2', or 'elasticnet' (see documentation of sklearn SGDClassifier).
  • epsilon: Threshold required before attempting to split or prune based on the Akaike Information Criterion. The smaller the epsilon-threshold, the stronger the evidence for splitting/pruning must be. Choose 0 < epsilon <= 1.
  • n_saved_candidates: Max. number of saved split candidates per node.
  • p_replaceable_candidates: Max. percent of saved split candidates that can be replaced by new/better candidates per training iteration.
  • cat_features: List of indices (pos. in the feature vector) corresponding to categorical features.
  • reset_after_drift: A boolean indicating if the predictor will be reset after a drift was detected. Note that the DMT automatically adjusts to concept drift and thus generally need not be retrained.

method DynamicModelTreeClassifier.n_nodes

n_nodes() → Tuple[int, int, int]

Returns the number of nodes, leaves and the depth of the DMT.

Returns:

  • int: Total number of nodes.
  • int: Total number of leaves.
  • int: Depth (where a single root node has depth = 1).

method DynamicModelTreeClassifier.partial_fit

partial_fit(
    X: Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
    y: Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
)

Updates the predictor.

Args:

  • X: Array/matrix of observations.
  • y: Array of corresponding labels.

method DynamicModelTreeClassifier.predict

predict(
    X: Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
) → Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]

Predicts the target values.

Args:

  • X: Array/matrix of observations.

Returns:

  • ArrayLike: Predicted labels for all observations.

method DynamicModelTreeClassifier.predict_proba

predict_proba(
    X: Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
) → Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]

Predicts the probability of target values.

Args:

  • X: Array/matrix of observations.

Returns:

  • ArrayLike: Predicted probability per class label for all observations.

method DynamicModelTreeClassifier.reset

reset()

Resets the predictor.


class Node

Node of the Dynamic Model Tree.

Attributes:

  • classes (List): List of the target classes.
  • n_features (int): Number of input features.
  • learning_rate (float): Learning rate of the linear models.
  • penalty_term (float): Regularization term for the linear model (0 = no regularization penalty).
  • penalty (str): String identifier of the type of regularization used by the linear model. Either 'l1', 'l2', or 'elasticnet' (see documentation of sklearn SGDClassifier).
  • epsilon (float): Threshold required before attempting to split or prune based on the Akaike Information Criterion. The smaller the epsilon-threshold, the stronger the evidence for splitting/pruning must be. Choose 0 < epsilon <= 1.
  • n_saved_candidates (int): Max. number of saved split candidates per node.
  • p_replaceable_candidates (float): Max. percent of saved split candidates that can be replaced by new/better candidates per training iteration.
  • cat_features (List[int]): List of indices (pos. in the feature vector) corresponding to categorical features.
  • linear_model (Any): Linear (logit) model trained at the node.
  • log_likelihood (ArrayLike): Log-likelihood given observations that reached the node.
  • counts_left (dict): Number of observations per split candidate falling to the left child.
  • log_likelihoods_left (dict): Log-likelihoods of the left child per split candidate.
  • gradients_left (dict): Gradients of the left child per split candidate.
  • counts_right (dict): Number of observations per split candidate falling to the right child.
  • log_likelihoods_right (dict): Log-likelihoods of the right child per split candidate.
  • gradients_right (dict): Gradients of the right child per split candidate
  • children (List[Node]): List of child nodes.
  • split (tuple): Feature/value combination used for splitting.
  • is_leaf (bool): Indicator of whether the node is a leaf.

method Node.__init__

__init__(
    classes: List,
    n_features: int,
    learning_rate: float,
    penalty_term: float,
    penalty: str,
    epsilon: float,
    n_saved_candidates: int,
    p_replaceable_candidates: float,
    cat_features: List[int]
)

Inits Node.

Args:

  • classes: List of the target classes.
  • n_features: Number of input features.
  • learning_rate: Learning rate of the linear models.
  • penalty_term: Regularization term for the linear model (0 = no regularization penalty).
  • penalty: String identifier of the type of regularization used by the linear model. Either 'l1', 'l2', or 'elasticnet' (see documentation of sklearn SGDClassifier).
  • epsilon: Threshold required before attempting to split or prune based on the Akaike Information Criterion. The smaller the epsilon-threshold, the stronger the evidence for splitting/pruning must be. Choose 0 < epsilon <= 1.
  • n_saved_candidates: Max. number of saved split candidates per node.
  • p_replaceable_candidates: Max. percent of saved split candidates that can be replaced by new/better candidates per training iteration.
  • cat_features: List of indices (pos. in the feature vector) corresponding to categorical features.

method Node.predict_observation

predict_observation(
    x: Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
    get_prob: bool = False
) → Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]

Predicts one observation (recurrent function).

Passes an observation down the tree until a leaf is reached. Makes prediction at leaf.

Args:

  • x: Observation.
  • get_prob: Indicator whether to return class probabilities.

Returns:

  • ArrayLike: Predicted class label/probability of the given observation.

method Node.update

update(
    X: Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]],
    y: Union[numpy.__array_like._SupportsArray[numpy.dtype], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]
)

Updates the node and all descendants.

Update the parameters of the weak model at the given node. If the node is an inner node, we attempt to split on a different feature or replace the inner node by a leaf and thereby prune all previous children/subbranches. If the node is a leaf node, we attempt to split.

Args:

  • X: Array/matrix of observations.
  • y: Array of corresponding labels.

This file was automatically generated via lazydocs.