vowpalwabbit.DFtoVW¶
-
class
vowpalwabbit.DFtoVW.
AttributeDescriptor
(attribute_name, expected_type, min_value=None, max_value=None)¶ Bases:
object
This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
__init__
(self, attribute_name, expected_type, min_value=None, max_value=None)¶ Initialize an AttributeDescriptor instance
Parameters: - attribute_name: str
The name of the attribute.
- expected_type: tuple
The expected type of the attribute.
- min_value: str/int/float
The minimum value of the attribute.
- max_value: str/int/float
The maximum value of the attribute.
- Raises
- ——
- TypeError
If one of the arguments passed is not of valid type.
-
-
class
vowpalwabbit.DFtoVW.
ContextualbanditLabel
(action, cost, probability)¶ Bases:
object
The contextual bandit label type for the constructor of DFtoVW.
Methods
process
(self, df)Returns the ContextualbanditLabel string representation. -
__init__
(self, action, cost, probability)¶ Initialize a ContextualbanditLabel instance. Parameters ———- action: str
The action taken where we observed the cost.- cost: str
- The cost observed for this action (lower is better)
- probability: str
- The probability of the exploration policy to choose this action when collecting the data.
self : ContextualbanditLabel
-
action
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
cost
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
probability
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
process
(self, df)¶ Returns the ContextualbanditLabel string representation. Parameters ———- df : pandas.DataFrame
The dataframe from which to select the column(s).- pandas.Series
- The ContextualbanditLabel string representation.
-
-
class
vowpalwabbit.DFtoVW.
DFtoVW
(df, features=None, namespaces=None, label=None, tag=None)¶ Convert a pandas DataFrame to a suitable VW format. Instances of this class are built with classes such as SimpleLabel, MulticlassLabel, Feature or Namespace.
The class also provided a convenience constructor to initialize the class based on the target/features column names only.
Methods
check_columns_type_and_values
(self)Check columns type and values range. check_features_type
(self, features)Check if the features argument is of type Feature. check_instance_columns
(self, instance)Check the columns type and values of a given instance. check_label_type
(self)Check label type. check_missing_columns_df
(self)Check if the columns are in the dataframe. check_namespaces_type
(self)Check if namespaces arguments are of type Namespace. convert_df
(self)Main method that converts the dataframe to the VW format. empty_col
(self)Create an empty string column. from_colnames
(cls, y, x, df[, label_type])Build DFtoVW instance using column names only. process_features
(self, features)Process the features (of a namespace) into a unique column. process_label_and_tag
(self)Process the label and tag into a unique column. raise_missing_col_error
(self, missing_cols_dict)Raises error if some columns are missing. set_namespaces
(self, namespaces, features)Set namespaces attributes -
__init__
(self, df, features=None, namespaces=None, label=None, tag=None)¶ Initialize a DFtoVW instance.
Parameters: - df : pandas.DataFrame
The dataframe to convert to VW input format.
- features: Feature/list of Feature
One or more Feature object(s).
- namespaces : Namespace/list of Namespace
One or more Namespace object(s), each of being composed of one or more Feature object(s).
- label : SimpleLabel/MulticlassLabel/MultiLabel
The label.
- tag : str/int/float
The tag (used as identifiers for examples).
Returns: - self : DFtoVW
Examples
>>> from vowpalwabbit.DFtoVW import DFtoVW, SimpleLabel, Feature, Namespace >>> import pandas as pd
>>> df = pd.DataFrame({"y": [1], "a": [2], "b": [3], "c": [4]}) >>> conv1 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... features=Feature("a")) >>> conv1.convert_df() ['1 | a:2']
>>> conv2 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... features=[Feature(col) for col in ["a", "b"]]) >>> conv2.convert_df() ['1 | a:2 b:3']
>>> conv3 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... namespaces=Namespace( ... name="DoubleIt", value=2, ... features=Feature(value="a", rename_feature="feat_a"))) >>> conv3.convert_df() ['1 |DoubleIt:2 feat_a:2']
>>> conv4 = DFtoVW(df=df, ... label=SimpleLabel("y"), ... namespaces=[Namespace(name="NS1", features=[Feature(col) for col in ["a", "c"]]), ... Namespace(name="NS2", features=Feature("b"))]) >>> conv4.convert_df() ['1 |NS1 a:2 c:4 |NS2 b:3']
-
check_columns_type_and_values
(self)¶ Check columns type and values range.
-
check_features_type
(self, features)¶ Check if the features argument is of type Feature.
Parameters: - features: (list of) Feature,
The features argument to check.
Raises: - TypeError
If the features is not a Feature of a list of Feature.
-
check_instance_columns
(self, instance)¶ Check the columns type and values of a given instance. The method iterate through the attributes and look for _Col type attribute. Once found, the method use the _Col methods to check the type and the value range of the column. Also, the instance type in which the errors occur are prepend to the error message to be more explicit about where the error occurs in the formula.
Raises: - TypeError
If a column is not of valid type.
- ValueError
If a column values are not in the valid range.
-
check_label_type
(self)¶ Check label type.
Raises: - TypeError
If label is not of type SimpleLabel or MulticlassLabel.
-
check_missing_columns_df
(self)¶ Check if the columns are in the dataframe.
-
check_namespaces_type
(self)¶ Check if namespaces arguments are of type Namespace.
Raises: - TypeError
If namespaces are not of type Namespace or list of Namespace.
-
convert_df
(self)¶ Main method that converts the dataframe to the VW format.
Returns: - list
The list of parsed lines in VW format.
-
empty_col
(self)¶ Create an empty string column.
Returns: - pandas.Series
A column of empty string with as much rows as the input dataframe.
-
classmethod
from_colnames
(cls, y, x, df, label_type='simple_label')¶ Build DFtoVW instance using column names only.
Parameters: - y : (list of) any hashable type (str/int/float/tuple/etc.) representing a column name
The column for the label.
- x : (list of) any hashable type (str/int/float/tuple/etc.) representing a column name
The column(s) for the feature(s).
- df : pandas.DataFrame
The dataframe used.
- label_type: str (default: ‘simple_label’)
The type of the label. Available labels: ‘simple_label’, ‘multiclass_label’, ‘multi_label’.
Returns: - DFtoVW
A initialized DFtoVW instance.
Raises: - TypeError
If argument label is not of valid type.
- ValueError
If argument label_type is not valid.
Examples
>>> from vowpalwabbit.DFtoVW import DFtoVW >>> import pandas as pd >>> df = pd.DataFrame({"y": [1], "x": [2]}) >>> conv = DFtoVW.from_colnames(y="y", x="x", df=df) >>> conv.convert_df() ['1 | x:2']
>>> df2 = pd.DataFrame({"y": [1], "x1": [2], "x2": [3], "x3": [4]}) >>> conv2 = DFtoVW.from_colnames(y="y", x=sorted(list(set(df2.columns) - set("y"))), df=df2) >>> conv2.convert_df() ['1 | x1:2 x2:3 x3:4']
-
process_features
(self, features)¶ Process the features (of a namespace) into a unique column.
Parameters: - features : list of Feature
The list of Feature objects.
Returns: - out : pandas.Series
The column of the processed features.
-
process_label_and_tag
(self)¶ Process the label and tag into a unique column.
Returns: - out : pandas.Series
A column where each row is the processed label and tag.
-
raise_missing_col_error
(self, missing_cols_dict)¶ Raises error if some columns are missing.
Raises: - ValueError
If one or more columns are not in the dataframe.
-
set_namespaces
(self, namespaces, features)¶ Set namespaces attributes
Parameters: - namespaces: Namespace / list of Namespace objects
The namespaces argument.
- features: Feature / list of Feature objects
The features argument.
-
-
class
vowpalwabbit.DFtoVW.
Feature
(value, rename_feature=None, as_type=None)¶ Bases:
object
The feature type for the constructor of DFtoVW
Methods
process
(self, df)Returns the Feature string representation. -
__init__
(self, value, rename_feature=None, as_type=None)¶ Initialize a Feature instance.
Parameters: - value : str
The column name with the value of the feature.
- rename_feature : str, optional
The name to use instead of the default (which is the column name defined in the value argument).
- as_type: str
Enforce a specific type (‘numerical’ or ‘categorical’)
- Returns
- ——-
- self : Feature
-
process
(self, df)¶ Returns the Feature string representation.
Parameters: - df : pandas.DataFrame
The dataframe from which to select the column(s).
Returns: - pandas.Series
The Feature string representation.
-
value
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
-
class
vowpalwabbit.DFtoVW.
MultiLabel
(label)¶ Bases:
object
The multi labels type for the constructor of DFtoVW.
Methods
process
(self, df)Returns the MultiLabel string representation. -
__init__
(self, label)¶ Initialize a MultiLabel instance.
Parameters: - label : str or list of str
The (list of) column name(s) of the multi label(s).
Returns: - self : MulticlassLabel
-
label
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
process
(self, df)¶ Returns the MultiLabel string representation.
Parameters: - df : pandas.DataFrame
The dataframe from which to select the column(s).
Returns: - pandas.Series
The MultiLabel string representation.
-
-
class
vowpalwabbit.DFtoVW.
MulticlassLabel
(label, weight=None)¶ Bases:
object
The multiclass label type for the constructor of DFtoVW.
Methods
process
(self, df)Returns the MulticlassLabel string representation. -
__init__
(self, label, weight=None)¶ Initialize a MulticlassLabel instance.
Parameters: - label : str
The column name with the multi class label.
- weight: str, optional
The column name with the (importance) weight of the multi class label.
Returns: - self : MulticlassLabel
-
label
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
process
(self, df)¶ Returns the MulticlassLabel string representation.
Parameters: - df : pandas.DataFrame
The dataframe from which to select the column(s).
Returns: - pandas.Series
The MulticlassLabel string representation.
-
weight
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
-
class
vowpalwabbit.DFtoVW.
Namespace
(features, name=None, value=None)¶ Bases:
object
The namespace type for the constructor of DFtoVW. The Namespace is a container for Feature object(s), and thus must be composed of a Feature object or a list of Feature objects.
Methods
check_attributes_type
(self)Check if attributes are of valid type. process
(self)Returns the Namespace string representation -
__init__
(self, features, name=None, value=None)¶ Initialize a Namespace instance.
Parameters: - features : Feature or list of Feature
A (list of) Feature object(s) that form the namespace.
- name : str/int/float, optional
The name of the namespace.
- value : int/float, optional
A constant that specify the scaling factor for the features of this namespace.
Returns: - self : Namespace
Examples
>>> from vowpalwabbit.DFtoVW import Namespace, Feature >>> ns_one_feature = Namespace(Feature("a")) >>> ns_multi_features = Namespace([Feature("a"), Feature("b")]) >>> ns_one_feature_with_name = Namespace(Feature("a"), name="FirstNamespace") >>> ns_one_feature_with_name_and_value = Namespace(Feature("a"), name="FirstNamespace", value=2)
-
check_attributes_type
(self)¶ Check if attributes are of valid type.
Raises: - TypeError
If one of the attribute is not valid.
-
expected_type
= {'features': (<class 'vowpalwabbit.DFtoVW.Feature'>,), 'name': (<type 'str'>, <type 'int'>, <type 'float'>), 'value': (<type 'int'>, <type 'float'>)}¶
-
process
(self)¶ Returns the Namespace string representation
Returns: - str
The Namespace string representation.
-
-
class
vowpalwabbit.DFtoVW.
SimpleLabel
(label, weight=None)¶ Bases:
object
The simple label type for the constructor of DFtoVW.
Methods
process
(self, df)Returns the SimpleLabel string representation. -
__init__
(self, label, weight=None)¶ Initialize a SimpleLabel instance.
Parameters: - label : str
The column name with the label.
- weight : str
The column name with the weight.
Returns: - self : SimpleLabel
-
label
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-
process
(self, df)¶ Returns the SimpleLabel string representation.
Parameters: - df : pandas.DataFrame
The dataframe from which to select the column.
Returns: - pandas.Series
The SimpleLabel string representation.
-
weight
¶ This descriptor class add type and value checking informations to the _Col instance for future usage in the DFtoVW class. Indeed, the type and value checking can only be done once the dataframe is known (i.e in DFtoVW class). This descriptor class is used in the following managed class: SimpleLabel, MulticlassLabel, Feature, etc.
-