vowpalwabbit.DFtoVW

class vowpalwabbit.DFtoVW.AttributeDescriptor(attribute_name, expected_type, min_value=None, max_value=None)

Bases: object

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

__init__(attribute_name, expected_type, min_value=None, max_value=None)

Initialize an AttributeDescriptor instance

Parameters
attribute_name: str

The name of the attribute.

expected_type: tuple

The expected type of the attribute.

min_value: str/int/float

The minimum value of the attribute.

max_value: str/int/float

The maximum value of the attribute.

Raises
——
TypeError

If one of the arguments passed is not of valid type.

class vowpalwabbit.DFtoVW.ContextualbanditLabel(action, cost, probability)

Bases: object

The contextual bandit label type for the constructor of DFtoVW.

__init__(action, cost, probability)

Initialize a ContextualbanditLabel instance. Parameters ———- action: str

The action taken where we observed the cost.

cost: str

The cost observed for this action (lower is better)

probability: str

The probability of the exploration policy to choose this action when collecting the data.

self : ContextualbanditLabel

action

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

cost

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

probability

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

process(df)

Returns the ContextualbanditLabel string representation. Parameters ———- df : pandas.DataFrame

The dataframe from which to select the column(s).

pandas.Series

The ContextualbanditLabel string representation.

class vowpalwabbit.DFtoVW.DFtoVW(df, features=None, namespaces=None, label=None, tag=None)

Bases: object

Convert a pandas DataFrame to a suitable VW format. Instances of this class are built with classes such as SimpleLabel, MulticlassLabel, Feature or Namespace.

The class also provided a convenience constructor to initialize the class based on the target/features column names only.

__init__(df, features=None, namespaces=None, label=None, tag=None)

Initialize a DFtoVW instance.

Parameters
dfpandas.DataFrame

The dataframe to convert to VW input format.

features: Feature/list of Feature

One or more Feature object(s).

namespacesNamespace/list of Namespace

One or more Namespace object(s), each of being composed of one or more Feature object(s).

labelSimpleLabel/MulticlassLabel/MultiLabel

The label.

tagstr/int/float

The tag (used as identifiers for examples).

Returns
selfDFtoVW

Examples

>>> from vowpalwabbit.DFtoVW import DFtoVW, SimpleLabel, Feature, Namespace
>>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "a": [2], "b": [3], "c": [4]})
>>> conv1 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                features=Feature("a"))
>>> conv1.convert_df()
['1 | a:2']
>>> conv2 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                features=[Feature(col) for col in ["a", "b"]])
>>> conv2.convert_df()
['1 | a:2 b:3']
>>> conv3 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                namespaces=Namespace(
...                        name="DoubleIt", value=2,
...                        features=Feature(value="a", rename_feature="feat_a")))
>>> conv3.convert_df()
['1 |DoubleIt:2 feat_a:2']
>>> conv4 = DFtoVW(df=df,
...                label=SimpleLabel("y"),
...                namespaces=[Namespace(name="NS1", features=[Feature(col) for col in ["a", "c"]]),
...                            Namespace(name="NS2", features=Feature("b"))])
>>> conv4.convert_df()
['1 |NS1 a:2 c:4 |NS2 b:3']
check_columns_type_and_values()

Check columns type and values range.

check_features_type(features)

Check if the features argument is of type Feature.

Parameters
features: (list of) Feature,

The features argument to check.

Raises
TypeError

If the features is not a Feature of a list of Feature.

check_instance_columns(instance)

Check the columns type and values of a given instance. The method iterate through the attributes and look for _Col type attribute. Once found, the method use the _Col methods to check the type and the value range of the column. Also, the instance type in which the errors occur are prepend to the error message to be more explicit about where the error occurs in the formula.

Raises
TypeError

If a column is not of valid type.

ValueError

If a column values are not in the valid range.

check_label_type()

Check label type.

Raises
TypeError

If label is not of type SimpleLabel or MulticlassLabel.

check_missing_columns_df()

Check if the columns are in the dataframe.

check_namespaces_type()

Check if namespaces arguments are of type Namespace.

Raises
TypeError

If namespaces are not of type Namespace or list of Namespace.

convert_df()

Main method that converts the dataframe to the VW format.

Returns
list

The list of parsed lines in VW format.

empty_col()

Create an empty string column.

Returns
pandas.Series

A column of empty string with as much rows as the input dataframe.

classmethod from_colnames(y, x, df, label_type='simple_label')

Build DFtoVW instance using column names only.

Parameters
y(list of) any hashable type (str/int/float/tuple/etc.) representing a column name

The column for the label.

x(list of) any hashable type (str/int/float/tuple/etc.) representing a column name

The column(s) for the feature(s).

dfpandas.DataFrame

The dataframe used.

label_type: str (default: ‘simple_label’)

The type of the label. Available labels: ‘simple_label’, ‘multiclass_label’, ‘multi_label’.

Returns
DFtoVW

A initialized DFtoVW instance.

Raises
TypeError

If argument label is not of valid type.

ValueError

If argument label_type is not valid.

Examples

>>> from vowpalwabbit.DFtoVW import DFtoVW
>>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "x": [2]})
>>> conv = DFtoVW.from_colnames(y="y", x="x", df=df)
>>> conv.convert_df()
['1 | x:2']
>>> df2 = pd.DataFrame({"y": [1], "x1": [2], "x2": [3], "x3": [4]})
>>> conv2 = DFtoVW.from_colnames(y="y", x=sorted(list(set(df2.columns) - set("y"))), df=df2)
>>> conv2.convert_df()
['1 | x1:2 x2:3 x3:4']
process_features(features)

Process the features (of a namespace) into a unique column.

Parameters
featureslist of Feature

The list of Feature objects.

Returns
outpandas.Series

The column of the processed features.

process_label_and_tag()

Process the label and tag into a unique column.

Returns
outpandas.Series

A column where each row is the processed label and tag.

raise_missing_col_error(missing_cols_dict)

Raises error if some columns are missing.

Raises
ValueError

If one or more columns are not in the dataframe.

set_namespaces(namespaces, features)

Set namespaces attributes

Parameters
namespaces: Namespace / list of Namespace objects

The namespaces argument.

features: Feature / list of Feature objects

The features argument.

class vowpalwabbit.DFtoVW.Feature(value, rename_feature=None, as_type=None)

Bases: object

The feature type for the constructor of DFtoVW

__init__(value, rename_feature=None, as_type=None)

Initialize a Feature instance.

Parameters
valuestr

The column name with the value of the feature.

rename_featurestr, optional

The name to use instead of the default (which is the column name defined in the value argument).

as_type: str

Enforce a specific type (‘numerical’ or ‘categorical’)

Returns
——-
selfFeature
process(df)

Returns the Feature string representation.

Parameters
dfpandas.DataFrame

The dataframe from which to select the column(s).

Returns
pandas.Series

The Feature string representation.

value

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

class vowpalwabbit.DFtoVW.MultiLabel(label)

Bases: object

The multi labels type for the constructor of DFtoVW.

__init__(label)

Initialize a MultiLabel instance.

Parameters
labelstr or list of str

The (list of) column name(s) of the multi label(s).

Returns
selfMulticlassLabel
label

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

process(df)

Returns the MultiLabel string representation.

Parameters
dfpandas.DataFrame

The dataframe from which to select the column(s).

Returns
pandas.Series

The MultiLabel string representation.

class vowpalwabbit.DFtoVW.MulticlassLabel(label, weight=None)

Bases: object

The multiclass label type for the constructor of DFtoVW.

__init__(label, weight=None)

Initialize a MulticlassLabel instance.

Parameters
labelstr

The column name with the multi class label.

weight: str, optional

The column name with the (importance) weight of the multi class label.

Returns
selfMulticlassLabel
label

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

process(df)

Returns the MulticlassLabel string representation.

Parameters
dfpandas.DataFrame

The dataframe from which to select the column(s).

Returns
pandas.Series

The MulticlassLabel string representation.

weight

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

class vowpalwabbit.DFtoVW.Namespace(features, name=None, value=None)

Bases: object

The namespace type for the constructor of DFtoVW. The Namespace is a container for Feature object(s), and thus must be composed of a Feature object or a list of Feature objects.

__init__(features, name=None, value=None)

Initialize a Namespace instance.

Parameters
featuresFeature or list of Feature

A (list of) Feature object(s) that form the namespace.

namestr/int/float, optional

The name of the namespace.

valueint/float, optional

A constant that specify the scaling factor for the features of this namespace.

Returns
selfNamespace

Examples

>>> from vowpalwabbit.DFtoVW import Namespace, Feature
>>> ns_one_feature = Namespace(Feature("a"))
>>> ns_multi_features = Namespace([Feature("a"), Feature("b")])
>>> ns_one_feature_with_name = Namespace(Feature("a"), name="FirstNamespace")
>>> ns_one_feature_with_name_and_value = Namespace(Feature("a"), name="FirstNamespace", value=2)
check_attributes_type()

Check if attributes are of valid type.

Raises
TypeError

If one of the attribute is not valid.

expected_type = {'features': (<class 'vowpalwabbit.DFtoVW.Feature'>,), 'name': (<class 'str'>, <class 'int'>, <class 'float'>), 'value': (<class 'int'>, <class 'float'>)}
process()

Returns the Namespace string representation

Returns
str

The Namespace string representation.

class vowpalwabbit.DFtoVW.SimpleLabel(label, weight=None)

Bases: object

The simple label type for the constructor of DFtoVW.

__init__(label, weight=None)

Initialize a SimpleLabel instance.

Parameters
labelstr

The column name with the label.

weightstr

The column name with the weight.

Returns
selfSimpleLabel
label

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

process(df)

Returns the SimpleLabel string representation.

Parameters
dfpandas.DataFrame

The dataframe from which to select the column.

Returns
pandas.Series

The SimpleLabel string representation.

weight

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.