vowpalwabbit.DFtoVW

class vowpalwabbit.DFtoVW.AttributeDescriptor(attribute_name, expected_type, min_value=None)

Bases: object

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

Methods

initialize_col_type(self, instance, colname) Initialize the attribute as a _Col type
__init__(self, attribute_name, expected_type, min_value=None)

Initialize an AttributeDescriptor instance

Parameters:
attribute_name: str

The name of the attribute.

expected_type: tuple

The expected type of the attribute.

min_value: str/int/float

The minimum value of the attribute.

Raises:
TypeError

If one of the arguments passed is not of valid type.

initialize_col_type(self, instance, colname)

Initialize the attribute as a _Col type

Parameters:
instance: object

The managed instance.

colname: str/int/float

The colname used for the _Col instance.

class vowpalwabbit.DFtoVW.DFtoVW(df, features=None, namespaces=None, label=None, tag=None)

Convert a pandas DataFrame to a suitable VW format. Instances of this class are built with classes such as SimpleLabel, MulticlassLabel, Feature or Namespace.

The class also provided a convenience constructor to initialize the class based on the target/features column names only.

Methods

check_columns_existence_in_df(self) Check if the columns are in the dataframe.
check_columns_type_and_values(self) Check columns type and values range
check_features_type(self, features) Check if the features argument is of type Feature.
check_instance_columns(self, instance) Check the columns type and values of a given instance.
check_label_type(self) Check label type.
check_namespaces_type(self) Check if namespaces arguments are of type Namespace.
convert_df(self) Main method that converts the dataframe to the VW format.
empty_col(self) Create an empty string column.
from_colnames(cls, y, x, df[, label_type]) Build DFtoVW instance using column names only.
generate_missing_col_error(self, …) Generate error if some columns are missing
process_features(self, features) Process the features (of a namespace) into a unique column.
process_label_and_tag(self) Process the label and tag into a unique column.
set_namespaces_or_features(self, namespaces, …) Set namespaces attributes
__init__(self, df, features=None, namespaces=None, label=None, tag=None)

Initialize a DFtoVW instance.

Parameters:
df : pandas.DataFrame

The dataframe to convert to VW input format.

features: Feature/list of Feature

One or more Feature object(s).

namespaces : Namespace/list of Namespace

One or more Namespace object(s), each of being composed of one or more Feature object(s).

label : SimpleLabel/MulticlassLabel/MultiLabel

The label.

tag : str/int/float

The tag (used as identifiers for examples).

Returns:
self : DFtoVW

Examples

>>> from vowpalwabbit.DFtoVW import DFtoVW, SimpleLabel, Feature
>>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "a": [2], "b": [3], "c": [4]})
>>> conv1 = DFtoVW(df=df,
                   label=SimpleLabel("y"),
                   features=Feature("a"))
>>> conv1.convert_df()
>>> conv2 = DFtoVW(df=df,
                   label=SimpleLabel("y"),
                   features=[Feature(col) for col in ["a", "b"]])
>>> conv2.convert_df()
>>> conv3 = DFtoVW(df=df,
                   label=SimpleLabel("y"),
                   namespaces=Namespace(
                           name="DoubleIt", value=2,
                           features=Feature(value="a", rename_feature="feat_a")))
>>> conv3.convert_df()
>>> conv4 = DFtoVW(df=df,
                   label=SimpleLabel("y"),
                   namespaces=[Namespace(name="NS1", features=[Feature(col) for col in ["a", "c"]]),
                               Namespace(name="NS2", features=Feature("b"))])
>>> conv4.convert_df()
check_columns_existence_in_df(self)

Check if the columns are in the dataframe.

check_columns_type_and_values(self)

Check columns type and values range

check_features_type(self, features)

Check if the features argument is of type Feature.

Parameters:
features: (list of) Feature,

The features argument to check.

Raises:
TypeError

If the features is not a Feature of a list of Feature.

check_instance_columns(self, instance)

Check the columns type and values of a given instance. The method iterate through the attributes and look for _Col type attribute. Once found, the method use the _Col methods to check the type and the value range of the column. Also, the instance type in which the errors occur are prepend to the error message to be more explicit about where the error occurs in the formula.

Raises:
TypeError

If a column is not of valid type.

ValueError

If a column values are not in the valid range.

check_label_type(self)

Check label type.

Raises:
TypeError

If label is not of type SimpleLabel or MulticlassLabel.

check_namespaces_type(self)

Check if namespaces arguments are of type Namespace.

Raises:
TypeError

If namespaces are not of type Namespace or list of Namespace.

convert_df(self)

Main method that converts the dataframe to the VW format.

Returns:
list

The list of parsed lines in VW format.

empty_col(self)

Create an empty string column.

Returns:
pandas.Series

A column of empty string with as much rows as the input dataframe.

classmethod from_colnames(cls, y, x, df, label_type='simple_label')

Build DFtoVW instance using column names only.

Parameters:
y : str/list of str

The column for the label.

x : str/list of str

The column(s) for the feature(s).

df : pandas.DataFrame

The dataframe used.

label_type: str (default: ‘simple_label’)

The type of the label. Available labels: ‘simple_label’, ‘multiclass’, ‘multilabel’.

Returns:
DFtoVW

A initialized DFtoVW instance.

Raises:
TypeError

If argument label is not of valid type.

ValueError

If argument label_type is not valid.

Examples

>>> from vowpalwabbit.DFtoVW import DFtoVW
>>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "x": [2]})
>>> conv = DFtoVW.from_colnames(y="y", x="x", df=df)
>>> conv.convert_df()
>>> df2 = pd.DataFrame({"y": [1], "x1": [2], "x2": [3], "x3": [4]})
>>> conv2 = DFtoVW.from_colnames(y="y", x=set(df2.columns) - set("y"), df=df2)
>>> conv2.convert_df()
generate_missing_col_error(self, absent_cols_dict)

Generate error if some columns are missing

Raises:
ValueError

If one or more columns are not in the dataframe.

process_features(self, features)

Process the features (of a namespace) into a unique column.

Parameters:
features : list of Feature

The list of Feature objects.

Returns:
out : pandas.Series

The column of the processed features.

process_label_and_tag(self)

Process the label and tag into a unique column.

Returns:
out : pandas.Series

A column where each row is the processed label and tag.

set_namespaces_or_features(self, namespaces, features)

Set namespaces attributes

Parameters:
namespaces: Namespace / list of Namespace objects

The namespaces argument.

features: Feature / list of Feature objects

The features argument.

class vowpalwabbit.DFtoVW.Feature(value, rename_feature=None, as_type=None)

Bases: object

The feature type for the constructor of DFtoVW

Methods

process(self, df) Returns the Feature string representation.
__init__(self, value, rename_feature=None, as_type=None)

Initialize a Feature instance.

Parameters:
value : str

The column name with the value of the feature.

rename_feature : str, optional

The name to use instead of the default (which is the column name defined in the value argument).

as_type: str

Enforce a specific type (‘numerical’ or ‘categorical’)

Returns
——-
self : Feature
process(self, df)

Returns the Feature string representation.

Parameters:
df : pandas.DataFrame

The dataframe from which to select the column(s).

Returns:
out : str or pandas.Series

The Feature string representation.

value

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

class vowpalwabbit.DFtoVW.MultiLabel(label)

Bases: object

The multi labels type for the constructor of DFtoVW.

Methods

process(self, df) Returns the MultiLabel string representation.
__init__(self, label)

Initialize a MultiLabel instance.

Parameters:
label : str or list of str

The (list of) column name(s) of the multi label(s).

Returns:
self : MulticlassLabel
label

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

process(self, df)

Returns the MultiLabel string representation.

Parameters:
df : pandas.DataFrame

The dataframe from which to select the column(s).

Returns:
str or pandas.Series

The MultiLabel string representation.

class vowpalwabbit.DFtoVW.MulticlassLabel(label, weight=None)

Bases: object

The multiclass label type for the constructor of DFtoVW.

Methods

process(self, df) Returns the MulticlassLabel string representation.
__init__(self, label, weight=None)

Initialize a MulticlassLabel instance.

Parameters:
label : str

The column name with the multi class label.

weight: str, optional

The column name with the (importance) weight of the multi class label.

Returns:
self : MulticlassLabel
label

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

process(self, df)

Returns the MulticlassLabel string representation.

Parameters:
df : pandas.DataFrame

The dataframe from which to select the column(s).

Returns:
str or pandas.Series

The MulticlassLabel string representation.

weight

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

class vowpalwabbit.DFtoVW.Namespace(features, name=None, value=None)

Bases: object

The namespace type for the constructor of DFtoVW. The Namespace is a container for Feature object(s), and thus must be composed of a Feature object or a list of Feature objects.

Methods

check_attributes_type(self) Check if attributes are of valid type.
process(self) Returns the Namespace string representation
__init__(self, features, name=None, value=None)

Initialize a Namespace instance.

Parameters:
features : Feature or list of Feature

A (list of) Feature object(s) that form the namespace.

name : str/int/float, optional

The name of the namespace.

value : int/float, optional

A constant that specify the scaling factor for the features of this namespace.

Returns:
self : Namespace

Examples

>>> from vowpalwabbit.DFtoVW import Namespace, Feature
>>> ns_one_feature = Namespace(Feature("a"))
>>> ns_multi_features = Namespace([Feature("a"), Feature("b")])
>>> ns_one_feature_with_name = Namespace(Feature("a"), name="FirstNamespace")
>>> ns_one_feature_with_name_and_value = Namespace(Feature("a"), name="FirstNamespace", value=2)
check_attributes_type(self)

Check if attributes are of valid type.

Raises:
TypeError

If one of the attribute is not valid.

expected_type = {'features': (<class 'vowpalwabbit.DFtoVW.Feature'>,), 'name': (<type 'str'>, <type 'int'>, <type 'float'>), 'value': (<type 'int'>, <type 'float'>)}
process(self)

Returns the Namespace string representation

class vowpalwabbit.DFtoVW.SimpleLabel(label)

Bases: object

The simple label type for the constructor of DFtoVW.

Methods

process(self, df) Returns the SimpleLabel string representation.
__init__(self, label)

Initialize a SimpleLabel instance.

Parameters:
label : str

The column name with the label.

Returns:
self : SimpleLabel
label

This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.

process(self, df)

Returns the SimpleLabel string representation.

Parameters:
df : pandas.DataFrame

The dataframe from which to select the column.

Returns:
str or pandas.Series

The SimpleLabel string representation.