vowpalwabbit.sklearn¶
Utilities to support integration of Vowpal Wabbit and scikit-learn
-
class
vowpalwabbit.sklearn_vw.
ThresholdingLinearClassifierMixin
(**params)¶ Bases:
sklearn.linear_model._base.LinearClassifierMixin
Mixin for linear classifiers. A threshold is used to specify the positive class cutoff
Handles prediction for sparse and dense X.
-
classes_
= array([-1., 1.])¶
-
predict
(X)¶ Predict class labels for samples in X.
- X : {array-like, sparse matrix}, shape = [n_samples, n_features]
- Samples.
- C : array, shape = [n_samples]
- Predicted class label per sample.
-
-
class
vowpalwabbit.sklearn_vw.
VW
(rank=None, lrq=None, lrqdropout=None, probabilities=None, random_seed=None, ring_size=None, convert_to_vw=None, bfgs=None, mem=None, ftrl=None, ftrl_alpha=None, ftrl_beta=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, input_feature_regularizer=None, audit=None, a=None, progress=None, P=None, quiet=None, data=None, d=None, cache=None, c=None, k=None, passes=None, no_stdin=None, hash=None, ignore=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, link=None, quantile_tau=None, l1=None, l2=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, nn=None, dropout=None, inpass=None, meanfield=None, multitask=None)¶ Bases:
sklearn.base.BaseEstimator
Vowpal Wabbit Scikit-learn Base Estimator wrapper
- params : {dict}
- dictionary of model parameter keys and values
- fit_ : {bool}
- this variable is only created after the model is fitted
-
fit
(X, y=None, sample_weight=None)¶ Fit the model according to the given training data
- TODO: for first pass create and store example objects.
- for N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)
- X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or
- Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
- y : array-like, shape (n_samples,), optional if not convert_to_vw
- Target vector relative to X.
- sample_weight : array-like, shape (n_samples,)
- sample weight vector relative to X.
return self so pipeline can call transform() after fit
-
get_coefs
()¶ Returns coefficient weights as ordered sparse matrix
{sparse matrix} coefficient weights for model
-
get_intercept
()¶ Returns intercept weight for model
{int} intercept value, 0 if noconstant
-
get_params
(deep=True)¶ This returns the set of vw and estimator parameters currently in use
-
get_vw
()¶ Factory to create a vw instance on demand
pyvw.vw instance
-
load
(filename)¶ Load model from file
-
params
= {}¶
-
predict
(X)¶ Predict with Vowpal Wabbit model
- X : {array-like, sparse matrix}, shape (n_samples, n_features or 1)
- Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
- y : array-like, shape (n_samples, 1 or n_classes)
- Output vector relative to X.
-
save
(filename)¶ Save model to file
-
set_coefs
(coefs)¶ Sets coefficients weights from ordered sparse matrix
coefs : {sparse matrix} coefficient weights for model
-
set_params
(**params)¶ - This destroys and recreates the Vowpal Wabbit model with updated parameters
- any parameters not provided will remain as they were initialized to at construction
- params : {dict}
- dictionary of model parameter keys and values to update
-
transform
(X, y=None)¶ - Transform does nothing by default besides closing the model. Transform is required for any estimator
- in a sklearn pipeline that isn’t the final estimator
- X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or
- Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
- y : array-like, shape (n_samples,), optional if not convert_to_vw
- Target vector relative to X.
return X to be passed into next estimator in pipeline
-
class
vowpalwabbit.sklearn_vw.
VWClassifier
(**params)¶ Bases:
sklearn.linear_model._base.SparseCoefMixin
,vowpalwabbit.sklearn_vw.ThresholdingLinearClassifierMixin
,vowpalwabbit.sklearn_vw.VW
Vowpal Wabbit Classifier model Only supports binary classification currently. Use VW directly for multiclass classification note - don’t try to apply link=’logistic’ on top of the existing functionality
-
decision_function
(X)¶ Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.
- X : {array-like, sparse matrix}, shape = (n_samples, n_features)
- Samples.
- array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
- Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
-
predict
(X)¶ Predict class labels for samples in X.
- X : {array-like, sparse matrix}, shape = [n_samples, n_features]
- Samples.
- C : array, shape = [n_samples]
- Predicted class label per sample.
-
-
class
vowpalwabbit.sklearn_vw.
VWRegressor
(rank=None, lrq=None, lrqdropout=None, probabilities=None, random_seed=None, ring_size=None, convert_to_vw=None, bfgs=None, mem=None, ftrl=None, ftrl_alpha=None, ftrl_beta=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, input_feature_regularizer=None, audit=None, a=None, progress=None, P=None, quiet=None, data=None, d=None, cache=None, c=None, k=None, passes=None, no_stdin=None, hash=None, ignore=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, link=None, quantile_tau=None, l1=None, l2=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, nn=None, dropout=None, inpass=None, meanfield=None, multitask=None)¶ Bases:
vowpalwabbit.sklearn_vw.VW
,sklearn.base.RegressorMixin
Vowpal Wabbit Regressor model
-
vowpalwabbit.sklearn_vw.
tovw
(x, y=None, sample_weight=None)¶ Convert array or sparse matrix to Vowpal Wabbit format
- x : {array-like, sparse matrix}, shape (n_samples, n_features)
- Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : {array-like}, shape (n_samples,), optional
- Target vector relative to X.
- sample_weight : {array-like}, shape (n_samples,), optional
- sample weight vector relative to X.
- out : {array-like}, shape (n_samples, 1)
- Training vectors in VW string format
>>> import pandas as pd >>> from sklearn.feature_extraction.text import HashingVectorizer >>> from vowpalwabbit.sklearn_vw import tovw >>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog') >>> y = pd.Series([-1, 1, -1, -1], name='label') >>> hv = HashingVectorizer() >>> hashed = hv.fit_transform(X) >>> tovw(x=hashed, y=y)