vowpalwabbit.sklearn

Utilities to support integration of Vowpal Wabbit and scikit-learn

class vowpalwabbit.sklearn_vw.ThresholdingLinearClassifierMixin(**params)

Bases: sklearn.linear_model._base.LinearClassifierMixin

Mixin for linear classifiers. A threshold is used to specify the positive class cutoff

Handles prediction for sparse and dense X.

classes_ = array([-1., 1.])
predict(X)

Predict class labels for samples in X.

X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Samples.
C : array, shape = [n_samples]
Predicted class label per sample.
class vowpalwabbit.sklearn_vw.VW(rank=None, lrq=None, lrqdropout=None, probabilities=None, random_seed=None, ring_size=None, convert_to_vw=None, bfgs=None, mem=None, ftrl=None, ftrl_alpha=None, ftrl_beta=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, input_feature_regularizer=None, audit=None, a=None, progress=None, P=None, quiet=None, data=None, d=None, cache=None, c=None, k=None, passes=None, no_stdin=None, hash=None, ignore=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, link=None, quantile_tau=None, l1=None, l2=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, nn=None, dropout=None, inpass=None, meanfield=None, multitask=None)

Bases: sklearn.base.BaseEstimator

Vowpal Wabbit Scikit-learn Base Estimator wrapper

params : {dict}
dictionary of model parameter keys and values
fit_ : {bool}
this variable is only created after the model is fitted
fit(X, y=None, sample_weight=None)

Fit the model according to the given training data

TODO: for first pass create and store example objects.
for N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)
X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or
Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
y : array-like, shape (n_samples,), optional if not convert_to_vw
Target vector relative to X.
sample_weight : array-like, shape (n_samples,)
sample weight vector relative to X.

return self so pipeline can call transform() after fit

get_coefs()

Returns coefficient weights as ordered sparse matrix

{sparse matrix} coefficient weights for model

get_intercept()

Returns intercept weight for model

{int} intercept value, 0 if noconstant

get_params(deep=True)

This returns the set of vw and estimator parameters currently in use

get_vw()

Factory to create a vw instance on demand

pyvw.vw instance

load(filename)

Load model from file

params = {}
predict(X)

Predict with Vowpal Wabbit model

X : {array-like, sparse matrix}, shape (n_samples, n_features or 1)
Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
y : array-like, shape (n_samples, 1 or n_classes)
Output vector relative to X.
save(filename)

Save model to file

set_coefs(coefs)

Sets coefficients weights from ordered sparse matrix

coefs : {sparse matrix} coefficient weights for model

set_params(**params)
This destroys and recreates the Vowpal Wabbit model with updated parameters
any parameters not provided will remain as they were initialized to at construction
params : {dict}
dictionary of model parameter keys and values to update
transform(X, y=None)
Transform does nothing by default besides closing the model. Transform is required for any estimator
in a sklearn pipeline that isn’t the final estimator
X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or
Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
y : array-like, shape (n_samples,), optional if not convert_to_vw
Target vector relative to X.

return X to be passed into next estimator in pipeline

class vowpalwabbit.sklearn_vw.VWClassifier(**params)

Bases: sklearn.linear_model._base.SparseCoefMixin, vowpalwabbit.sklearn_vw.ThresholdingLinearClassifierMixin, vowpalwabbit.sklearn_vw.VW

Vowpal Wabbit Classifier model Only supports binary classification currently. Use VW directly for multiclass classification note - don’t try to apply link=’logistic’ on top of the existing functionality

decision_function(X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

X : {array-like, sparse matrix}, shape = (n_samples, n_features)
Samples.
array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
predict(X)

Predict class labels for samples in X.

X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Samples.
C : array, shape = [n_samples]
Predicted class label per sample.
class vowpalwabbit.sklearn_vw.VWRegressor(rank=None, lrq=None, lrqdropout=None, probabilities=None, random_seed=None, ring_size=None, convert_to_vw=None, bfgs=None, mem=None, ftrl=None, ftrl_alpha=None, ftrl_beta=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, input_feature_regularizer=None, audit=None, a=None, progress=None, P=None, quiet=None, data=None, d=None, cache=None, c=None, k=None, passes=None, no_stdin=None, hash=None, ignore=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, link=None, quantile_tau=None, l1=None, l2=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, nn=None, dropout=None, inpass=None, meanfield=None, multitask=None)

Bases: vowpalwabbit.sklearn_vw.VW, sklearn.base.RegressorMixin

Vowpal Wabbit Regressor model

vowpalwabbit.sklearn_vw.tovw(x, y=None, sample_weight=None)

Convert array or sparse matrix to Vowpal Wabbit format

x : {array-like, sparse matrix}, shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
y : {array-like}, shape (n_samples,), optional
Target vector relative to X.
sample_weight : {array-like}, shape (n_samples,), optional
sample weight vector relative to X.
out : {array-like}, shape (n_samples, 1)
Training vectors in VW string format
>>> import pandas as pd
>>> from sklearn.feature_extraction.text import HashingVectorizer
>>> from vowpalwabbit.sklearn_vw import tovw
>>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog')
>>> y = pd.Series([-1, 1, -1, -1], name='label')
>>> hv = HashingVectorizer()
>>> hashed = hv.fit_transform(X)
>>> tovw(x=hashed, y=y)