vowpalwabbit.dftovw

This is an optional module which implements a dataframe converter to VW format.

Deprecated alias

Deprecated since version 9.0.0: The module name vowpalwabbit.DFtoVW has been renamed to vowpalwabbit.dftovw. Please use the new module name instead.

Module contents

class vowpalwabbit.dftovw.AttributeDescriptor(attribute_name, expected_type, min_value=None, max_value=None)

Bases: object

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

__init__(attribute_name, expected_type, min_value=None, max_value=None)

Initialize an AttributeDescriptor instance

Parameters
  • attribute_name (str) – The name of the attribute.

  • expected_type (Tuple[Type, ...]) – The expected type of the attribute.

  • min_value (Union[str, int, float, None]) – The minimum value of the attribute.

  • max_value (Union[str, int, float, None]) – The maximum value of the attribute.

Raises

TypeError – If one of the arguments passed is not of valid type.

class vowpalwabbit.dftovw.ContextualbanditLabel(action, cost, probability)

Bases: object

The contextual bandit label type for the constructor of DFtoVW.

__init__(action, cost, probability)

Initialize a ContextualbanditLabel instance.

Parameters
  • action (Hashable) – The action taken where we observed the cost.

  • cost (Hashable) – The cost observed for this action (lower is better)

  • probability (Hashable) – The probability of the exploration policy to choose this action when collecting the data.

action: Any

Contextual bandit label action column name

cost: Any

Contextual bandit label cost column name

probability: Any

Contextual bandit label probability column name

process(df)

Returns the ContextualbanditLabel string representation.

Parameters

df (DataFrame) – The dataframe from which to select the column(s).

Return type

Series

Returns

The ContextualbanditLabel string representation.

class vowpalwabbit.dftovw.DFtoVW(df, features=None, namespaces=None, label=None, tag=None)

Bases: object

Convert a pandas DataFrame to a suitable VW format. Instances of this class are built with classes such as SimpleLabel, MulticlassLabel, Feature or Namespace.

The class also provided a convenience constructor to initialize the class based on the target/features column names only.

__init__(df, features=None, namespaces=None, label=None, tag=None)

Initialize a DFtoVW instance.

Parameters

Examples

>>> from vowpalwabbit.dftovw import DFtoVW, SimpleLabel, Feature, Namespace
>>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "a": [2], "b": [3], "c": [4]})
>>> conv1 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                features=Feature("a"))
>>> conv1.convert_df()
['1 | a:2']
>>> conv2 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                features=[Feature(col) for col in ["a", "b"]])
>>> conv2.convert_df()
['1 | a:2 b:3']
>>> conv3 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                namespaces=Namespace(
...                        name="DoubleIt", value=2,
...                        features=Feature(value="a", rename_feature="feat_a")))
>>> conv3.convert_df()
['1 |DoubleIt:2 feat_a:2']
>>> conv4 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                namespaces=[Namespace(name="NS1", features=[Feature(col) for col in ["a", "c"]]),
...                            Namespace(name="NS2", features=Feature("b"))])
>>> conv4.convert_df()
['1 |NS1 a:2 c:4 |NS2 b:3']
check_columns_type_and_values()

Check columns type and values range.

check_features_type(features)

Check if the features argument is of type Feature.

features: (list of) Feature,

The features argument to check.

Raises

TypeError – If the features is not a Feature of a list of Feature.

check_instance_columns(instance)

Check the columns type and values of a given instance. The method iterate through the attributes and look for _Col type attribute. Once found, the method use the _Col methods to check the type and the value range of the column. Also, the instance type in which the errors occur are prepend to the error message to be more explicit about where the error occurs in the formula.

Raises
  • TypeError – If a column is not of valid type.

  • ValueError – If a column values are not in the valid range.

check_label_type()

Check label type.

Raises

TypeError – If label is not of type SimpleLabel, MulticlassLabel, Multilabel, ContextualbanditLabel.

check_missing_columns_df()

Check if the columns are in the dataframe.

check_namespaces_type()

Check if namespaces arguments are of type Namespace.

Raises

TypeError – If namespaces are not of type Namespace or list of Namespace.

convert_df()

Main method that converts the dataframe to the VW format.

Return type

List[str]

Returns

The list of parsed lines in VW format.

empty_col()

Create an empty string column.

Return type

Series

Returns

A column of empty string with as much rows as the input dataframe.

classmethod from_colnames(y, x, df, label_type='simple_label')

Build DFtoVW instance using column names only.

Parameters
  • y (Union[Hashable, List[Hashable]]) – (list of) any hashable type (str/int/float/tuple/etc.) representing a column name The column for the label.

  • x (Union[Hashable, List[Hashable]]) – (list of) any hashable type (str/int/float/tuple/etc.) representing a column name The column(s) for the feature(s).

  • df (DataFrame) – The dataframe used.

  • label_type (str) – The type of the label. Available labels: ‘simple_label’, ‘multiclass_label’, ‘multi_label’. (default: ‘simple_label’)

Raises
  • TypeError – If argument label is not of valid type.

  • ValueError – If argument label_type is not valid.

Examples

>>> from vowpalwabbit.dftovw import DFtoVW
>>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "x": [2]})
>>> conv = DFtoVW.from_colnames(y="y", x="x", df=df)
>>> conv.convert_df()
['1 | x:2']
>>> df2 = pd.DataFrame({"y": [1], "x1": [2], "x2": [3], "x3": [4]})
>>> conv2 = DFtoVW.from_colnames(y="y", x=sorted(list(set(df2.columns) - set("y"))), df=df2)
>>> conv2.convert_df()
['1 | x1:2 x2:3 x3:4']
Return type

DFtoVW

Returns

An initialized DFtoVW instance.

process_features(features)

Process the features (of a namespace) into a unique column.

Parameters

features (List[Feature]) – The list of Feature objects.

Return type

Series

Returns

The column of the processed features.

process_label_and_tag()

Process the label and tag into a unique column.

Return type

Series

Returns

A column where each row is the processed label and tag.

raise_missing_col_error(missing_cols_dict)

Raises error if some columns are missing.

Raises

ValueError – If one or more columns are not in the dataframe.

set_namespaces(namespaces, features)

Set namespaces attributes. Only one of namespaces or features should be passed when being called.

Parameters
Raises

ValueError – If argument ‘features’ or ‘namespaces’ are not valid.

class vowpalwabbit.dftovw.Feature(value, rename_feature=None, as_type=None)

Bases: object

The feature type for the constructor of DFtoVW

__init__(value, rename_feature=None, as_type=None)

Initialize a Feature instance.

Parameters
  • value (Hashable) – The column name with the value of the feature.

  • rename_feature (Optional[str]) – The name to use instead of the default (which is the column name defined in the value argument).

  • as_type (Optional[str]) – Enforce a specific type (‘numerical’ or ‘categorical’)

process(df)

Returns the Feature string representation.

Parameters

df (DataFrame) – The dataframe from which to select the column(s).

Return type

Series

Returns

The Feature string representation.

value: Any

Feature value column name

class vowpalwabbit.dftovw.MultiLabel(label)

Bases: object

The multi labels type for the constructor of DFtoVW.

__init__(label)

Initialize a MultiLabel instance.

Parameters

label (Union[Hashable, List[Hashable]]) – The (list of) column name(s) of the multi label(s).

label: Any

Multilabel label value column name

process(df)

Returns the MultiLabel string representation.

Parameters

df (DataFrame) – The dataframe from which to select the column(s).

Return type

Series

Returns

The MultiLabel string representation.

class vowpalwabbit.dftovw.MulticlassLabel(label, weight=None)

Bases: object

The multiclass label type for the constructor of DFtoVW.

__init__(label, weight=None)

Initialize a MulticlassLabel instance.

Parameters
  • label (Hashable) – The column name with the multi class label.

  • weight (Optional[Hashable]) – The column name with the (importance) weight of the multi class label.

label: Any

Multiclass label value column name

process(df)

Returns the MulticlassLabel string representation.

Args: df: The dataframe from which to select the column(s).

Return type

Series

Returns

The MulticlassLabel string representation.

weight: Any

Multiclass label weight column name

class vowpalwabbit.dftovw.Namespace(features, name=None, value=None)

Bases: object

The namespace type for the constructor of DFtoVW. The Namespace is a container for Feature object(s), and thus must be composed of a Feature object or a list of Feature objects.

__init__(features, name=None, value=None)

Initialize a Namespace instance.

Parameters

Examples

>>> from vowpalwabbit.dftovw import Namespace, Feature
>>> ns_one_feature = Namespace(Feature("a"))
>>> ns_multi_features = Namespace([Feature("a"), Feature("b")])
>>> ns_one_feature_with_name = Namespace(Feature("a"), name="FirstNamespace")
>>> ns_one_feature_with_name_and_value = Namespace(Feature("a"), name="FirstNamespace", value=2)
check_attributes_type()

Check if attributes are of valid type.

Raises

TypeError – If one of the attribute is not valid.

expected_type = {'features': (<class 'vowpalwabbit.dftovw.Feature'>,), 'name': (<class 'str'>, <class 'int'>, <class 'float'>), 'value': (<class 'int'>, <class 'float'>)}
process()

Returns the Namespace string representation

Return type

str

Returns

The Namespace string representation.

class vowpalwabbit.dftovw.SimpleLabel(label, weight=None)

Bases: object

The simple label type for the constructor of DFtoVW.

__init__(label, weight=None)

Initialize a SimpleLabel instance.

Parameters
label: Any

Simple label value column name

process(df)

Returns the SimpleLabel string representation.

Parameters

df (DataFrame) – The dataframe from which to select the column.

Return type

Series

Returns

The SimpleLabel string representation.

weight: Any

Simple label weight column name