module data.data_loader

Data Loader.

This module encapsulates functionality to load and preprocess input data. The data loader class uses the scikit-multiflow Stream class to simulate streaming data.

Copyright (C) 2022 Johannes Haug.


class DataLoader

Data Loader Class.

The data loader class is responsible to sample and pre-process (i.e. normalize) input data, thereby simulating a data stream. The data loader uses a skmultiflow Stream object to generate or load streaming data.

Attributes:

  • path (str | None): The path to a .csv file containing the training data set.
  • stream (Stream | None): A scikit-multiflow data stream object.
  • target_col (int): The index of the target column in the training data.
  • scaler (BaseScaler | None): A scaler object used to normalize/standardize sampled instances.

method DataLoader.__init__

__init__(
    path: Optional[str] = None,
    stream: Optional[skmultiflow.data.base_stream.Stream] = None,
    target_col: int = -1,
    scaler: Optional[float.data.preprocessing.base_scaler.BaseScaler] = None
)

Inits the data loader.

The data loader init function must receive either one of the following inputs: 1.) the path to a .csv file (+ a target index), which is then mapped to a skmultiflow FileStream object. 2.) a valid scikit multiflow Stream object.

Args:

  • path: The path to a .csv file containing the training data set.
  • stream: A scikit-multiflow data stream object.
  • target_col: The index of the target column in the training data.
  • scaler: A scaler object used to normalize/standardize sampled instances.

method DataLoader.get_data

get_data(
    n_batch: int
) → Tuple[Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], Union[numpy._array_like._SupportsArray[numpy.dtype], numpy._nested_sequence._NestedSequence[numpy._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]]

Loads a batch from the stream object.

Args:

  • n_batch: Number of samples to load from the data stream object.

Returns:

  • Tuple[ArrayLike, ArrayLike]: The sampled observations and corresponding targets.

This file was automatically generated via lazydocs.