vowpalwabbit.dftovw¶
This is an optional module which implements a dataframe converter to VW format.
Deprecated alias¶
Deprecated since version 9.0.0: The module name vowpalwabbit.DFtoVW
has been renamed to vowpalwabbit.dftovw
. Please use the new module name instead.
Module contents¶
- class vowpalwabbit.dftovw.ContextualbanditLabel(action, cost, probability)¶
Bases:
object
The contextual bandit label type for the constructor of DFtoVW.
- __init__(action, cost, probability)¶
Initialize a ContextualbanditLabel instance.
- class vowpalwabbit.dftovw.DFtoVW(df, features=None, namespaces=None, label=None, tag=None)¶
Bases:
object
Convert a pandas DataFrame to a suitable VW format. Instances of this class are built with classes such as SimpleLabel, MulticlassLabel, Feature or Namespace.
The class also provided a convenience constructor to initialize the class based on the target/features column names only.
- __init__(df, features=None, namespaces=None, label=None, tag=None)¶
Initialize a DFtoVW instance.
- Parameters:
df (
DataFrame
) – The dataframe to convert to VW input format.features (
Union
[Feature
,List
[Feature
],None
]) – One or more Feature object(s).namespaces (
Union
[Namespace
,List
[Namespace
],None
]) – One or more Namespace object(s), each of being composed of one or more Feature object(s).label (
Union
[SimpleLabel
,MulticlassLabel
,MultiLabel
,ContextualbanditLabel
,List
[MultiLabel
],List
[ContextualbanditLabel
],None
]) – One or more label objects used to build the label stringtag (
Optional
[Hashable
]) – The tag column name (used as identifiers for examples).
Examples
>>> from vowpalwabbit.dftovw import DFtoVW, SimpleLabel, Feature, Namespace >>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "a": [2], "b": [3], "c": [4]}) >>> conv1 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... features=Feature("a")) >>> conv1.convert_df() ['1 | a:2']
>>> conv2 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... features=[Feature(col) for col in ["a", "b"]]) >>> conv2.convert_df() ['1 | a:2 b:3']
>>> conv3 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... namespaces=Namespace( ... name="DoubleIt", value=2, ... features=Feature(value="a", rename_feature="feat_a"))) >>> conv3.convert_df() ['1 |DoubleIt:2 feat_a:2']
>>> conv4 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... namespaces=[Namespace(name="NS1", features=[Feature(col) for col in ["a", "c"]]), ... Namespace(name="NS2", features=Feature("b"))]) >>> conv4.convert_df() ['1 |NS1 a:2 c:4 |NS2 b:3']
- check_columns_type_and_values()¶
Check columns type and values range.
- check_features_type(features)¶
Check if the features argument is of type Feature.
- check_instance_columns(instance)¶
Check the columns type and values of a given instance. The method iterate through the attributes and look for _Col type attribute. Once found, the method use the _Col methods to check the type and the value range of the column. Also, the instance type in which the errors occur are prepend to the error message to be more explicit about where the error occurs in the formula.
- Raises:
TypeError – If a column is not of valid type.
ValueError – If a column values are not in the valid range.
- check_label_type()¶
Check label type.
- Raises:
TypeError – If label is not of type SimpleLabel, MulticlassLabel, Multilabel, ContextualbanditLabel.
- check_missing_columns_df()¶
Check if the columns are in the dataframe.
- check_namespaces_type()¶
Check if namespaces arguments are of type Namespace.
- Raises:
TypeError – If namespaces are not of type Namespace or list of Namespace.
- convert_df()¶
Main method that converts the dataframe to the VW format.
- empty_col()¶
Create an empty string column.
- Return type:
- Returns:
A column of empty string with as much rows as the input dataframe.
- classmethod from_colnames(y, x, df, label_type='simple_label')¶
Build DFtoVW instance using column names only.
Deprecated since version 9.2.0: Use
DFtoVW.from_column_names()
instead.- Parameters:
y (
Union
[Hashable
,List
[Hashable
]]) – (list of) any hashable type (str/int/float/tuple/etc.) representing a column name The column for the label.x (
Union
[Hashable
,List
[Hashable
]]) – (list of) any hashable type (str/int/float/tuple/etc.) representing a column name The column(s) for the feature(s).df (
DataFrame
) – The dataframe used.label_type (
str
) – The type of the label. Available labels: ‘simple_label’, ‘multiclass_label’, ‘multi_label’. (default: ‘simple_label’)
- Raises:
TypeError – If argument label is not of valid type.
ValueError – If argument label_type is not valid.
Examples
>>> from vowpalwabbit.dftovw import DFtoVW >>> import pandas as pd >>> df = pd.DataFrame({"y": [1], "x": [2]}) >>> conv = DFtoVW.from_colnames(y="y", x="x", df=df) >>> conv.convert_df() ['1 | x:2']
>>> df2 = pd.DataFrame({"y": [1], "x1": [2], "x2": [3], "x3": [4]}) >>> conv2 = DFtoVW.from_colnames(y="y", x=sorted(list(set(df2.columns) - set("y"))), df=df2) >>> conv2.convert_df() ['1 | x1:2 x2:3 x3:4']
- Return type:
- Returns:
An initialized DFtoVW instance.
- classmethod from_column_names(*, y=None, x, df, label_type='simple_label')¶
Build DFtoVW instance using column names only. Compared to
DFtoVW.from_colnames()
, this method allows for y and label_type to be optional and args are named and cannot be positional.- Parameters:
y (
Union
[Hashable
,List
[Hashable
],None
]) – (list of) any hashable type (str/int/float/tuple/etc.) representing a column name The column for the label. Optional.x (
Union
[Hashable
,List
[Hashable
]]) – (list of) any hashable type (str/int/float/tuple/etc.) representing a column name The column(s) for the feature(s).df (
DataFrame
) – The dataframe used.label_type (
Optional
[str
]) – The type of the label. Available labels: ‘simple_label’, ‘multiclass_label’, ‘multi_label’. (default: ‘simple_label’). Optional.
- Raises:
TypeError – If argument label is not of valid type.
ValueError – If argument label_type is not valid.
Examples
>>> from vowpalwabbit.dftovw import DFtoVW >>> import pandas as pd >>> df = pd.DataFrame({"y": [1], "x": [2]}) >>> conv = DFtoVW.from_column_names(y="y", x="x", df=df) >>> conv.convert_df() ['1 | x:2']
>>> df2 = pd.DataFrame({"y": [1], "x1": [2], "x2": [3], "x3": [4]}) >>> conv2 = DFtoVW.from_column_names(y="y", x=sorted(list(set(df2.columns) - set("y"))), df=df2) >>> conv2.convert_df() ['1 | x1:2 x2:3 x3:4']
- Return type:
- Returns:
An initialized DFtoVW instance.
- process_features(features)¶
Process the features (of a namespace) into a unique column.
- process_label_and_tag()¶
Process the label and tag into a unique column.
- Return type:
- Returns:
A column where each row is the processed label and tag.
- raise_missing_col_error(missing_cols_dict)¶
Raises error if some columns are missing.
- Raises:
ValueError – If one or more columns are not in the dataframe.
- set_namespaces(namespaces, features)¶
Set namespaces attributes. Only one of namespaces or features should be passed when being called.
- class vowpalwabbit.dftovw.Feature(value, rename_feature=None, as_type=None)¶
Bases:
object
The feature type for the constructor of DFtoVW
- __init__(value, rename_feature=None, as_type=None)¶
Initialize a Feature instance.
- process(df, ensure_valid_values=True)¶
Returns the Feature string representation.
- Parameters:
df (
DataFrame
) – The dataframe from which to select the column(s).- Return type:
- Returns:
The Feature string representation.
- class vowpalwabbit.dftovw.MultiLabel(label)¶
Bases:
object
The multi labels type for the constructor of DFtoVW.
- __init__(label)¶
Initialize a MultiLabel instance.
- class vowpalwabbit.dftovw.MulticlassLabel(label, weight=None)¶
Bases:
object
The multiclass label type for the constructor of DFtoVW.
- __init__(label, weight=None)¶
Initialize a MulticlassLabel instance.
- process(df)¶
Returns the MulticlassLabel string representation.
Args: df: The dataframe from which to select the column(s).
- Return type:
- Returns:
The MulticlassLabel string representation.
- class vowpalwabbit.dftovw.Namespace(features, name=None, value=None)¶
Bases:
object
The namespace type for the constructor of DFtoVW. The Namespace is a container for Feature object(s), and thus must be composed of a Feature object or a list of Feature objects.
- __init__(features, name=None, value=None)¶
Initialize a Namespace instance.
- Parameters:
Examples
>>> from vowpalwabbit.dftovw import Namespace, Feature >>> ns_one_feature = Namespace(Feature("a")) >>> ns_multi_features = Namespace([Feature("a"), Feature("b")]) >>> ns_one_feature_with_name = Namespace(Feature("a"), name="FirstNamespace") >>> ns_one_feature_with_name_and_value = Namespace(Feature("a"), name="FirstNamespace", value=2)
- check_attributes_type()¶
Check if attributes are of valid type.
- Raises:
TypeError – If one of the attribute is not valid.
- expected_type = {'features': (<class 'vowpalwabbit.dftovw.Feature'>,), 'name': (<class 'str'>, <class 'int'>, <class 'float'>), 'value': (<class 'int'>, <class 'float'>)}¶
- class vowpalwabbit.dftovw.SimpleLabel(label, weight=None)¶
Bases:
object
The simple label type for the constructor of DFtoVW.
- __init__(label, weight=None)¶
Initialize a SimpleLabel instance.
- process(df)¶
Returns the SimpleLabel string representation.
- Parameters:
df (
DataFrame
) – The dataframe from which to select the column.- Return type:
- Returns:
The SimpleLabel string representation.