
Utilities to support integration of Vowpal Wabbit and scikit-learn

class vowpalwabbit.sklearn_vw.LinearClassifierMixin

Bases: sklearn.linear_model.logistic.LogisticRegression


decision_function(self, X) Predict confidence scores for samples.
densify(self) Convert coefficient matrix to dense array format.
fit(self, X, y[, sample_weight]) Fit the model according to the given training data.
get_params(self[, deep]) Get parameters for this estimator.
predict(self, X) Predict class labels for samples in X.
predict_log_proba(self, X) Log of probability estimates.
predict_proba(self, X) Probability estimates.
score(self, X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_params(self, \*\*params) Set the parameters of this estimator.
sparsify(self) Convert coefficient matrix to sparse format.

x.__init__(…) initializes x; see help(type(x)) for signature

class vowpalwabbit.sklearn_vw.VW(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

Bases: sklearn.base.BaseEstimator

Vowpal Wabbit Scikit-learn Base Estimator wrapper

convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

vw_ : pyvw.vw

vw instance


fit(self[, X, y, sample_weight]) Fit the model according to the given training data
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict with Vowpal Wabbit model
save(self, filename) Save model to file
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
__init__(self, convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

VW model constructor, exposing all supported parameters to keep sklearn happy

Estimator options
convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int

size of example ring

strict_parse : bool

throw on malformed examples

Update options

learning_rate,l : float

Set learning rate

power_t : float

t power value

decay_learning_rate : float

Set Decay factor for learning_rate between passes

initial_t : float

initial t value

feature_mask : str

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str

Initial regressor(s)

initial_weight : float

Set all weights to an initial value of arg.

random_weights : bool

make initial weights random

normal_weights : bool

make initial weights normal

truncated_normal_weights : bool

make initial weights truncated normal

sparse_weights : float

Use a sparse datastructure for weights

input_feature_regularizer : str

Per feature regularization input file

Diagnostic options

quiet : bool

Don’t output disgnostics and progress updates

Randomization options

random_seed : integer

seed random number generator

Feature options

hash : str

how to hash the features. Available options: strings, all

hash_seed : int

seed for hash function

ignore : str

ignore namespaces beginning with character <arg>

ignore_linear : str

ignore namespaces beginning with character <arg> for linear terms only

keep : str

keep namespaces beginning with character <arg>

redefine : str

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,b : integer

number of bits in the feature table

noconstant : bool

Don’t add a constant feature

constant,C : float

Set initial value of constant

ngram : str

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skips : str

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limit : str

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affix : str

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spelling : str

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionary : str

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_path : str

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactions : str

Create feature interactions of any level between namespaces.

permutations : bool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactions : bool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,q : str

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubic : str

Create and use cubic features

Example options

testonly,t : bool

Ignore label information and just test

holdout_off : bool

no holdout data in multiple passes

holdout_period : int

holdout period for test only

holdout_after : int

holdout after n training examples

early_terminate : int

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passes : int

Number of Training Passes

initial_pass_length : int

initial number of examples per pass

examples : int

number of examples to parse

min_prediction : float

Smallest prediction to output

max_prediction : float

Largest prediction to output

sort_features : bool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_function : str

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_tau : float

Parameter tau associated with Quantile loss. Defaults to 0.5

l1 : float

l_1 lambda (L1 regularization)

l2 : float

l_2 lambda (L2 regularization)

no_bias_regularization : bool

no bias in regularization

named_labels : str

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str

Final regressor

readable_model : str

Output human-readable final regressor with numeric features

invert_hash : str

Output human-readable final regressor with feature names. Computationally expensive.

save_resume : bool

save extra state so learning can be resumed later with new data

preserve_performance_counters : bool

reset performance counters when warmstarting

output_feature_regularizer_binary : str

Per feature regularization output file

output_feature_regularizer_text : str

Per feature regularization output file, in text

Multiclass options
oaa : integer

Use one-against-all multiclass learning with labels

oaa_subsample : int

subsample this number of negative examples when learning

ect : integer

Use error correcting tournament multiclass learning

csoaa : integer

Use cost sensitive one-against-all multiclass learning

wap : integer

Use weighted all pairs multiclass learning

probabilities : float

predict probabilities of all classes

Neural Network options

nn : integer

Use a sigmoidal feed-forward neural network with N hidden units

inpass : bool

Train or test sigmoidal feed-forward network with input pass-through

multitask : bool

Share hidden layer across all reduced tasks

dropout : bool

Train or test sigmoidal feed-forward network using dropout

meanfield : bool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool

use conjugate gradient based optimization

bgfs : bool

use bfgs updates

hessian_on : bool

use second derivative in line search

mem : int

memory in bfgs

termination : float

termination threshold

Latent Dirichlet Allocation options

lda : int

Run lda with <int> topics

lda_alpha : float

Prior on sparsity of per-document topic weights

lda_rho : float

Prior on sparsity of topic distributions

lda_D : int

Number of documents

lda_epsilon : float

Loop convergence threshold

minibatch : int

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool

Streaming Stochastic Variance Reduced Gradient

stage_size : int

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool

Run Follow the Proximal Regularized Leader

coin : bool

Coin betting optimizer

pistol : bool

PiSTOL: Parameter free STOchastic Learning

ftrl_alpha : float

Alpha parameter for FTRL optimization

ftrl_beta : float

Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool

kernel svm

kernel : str

type of kernel (rbf or linear (default))

bandwidth : int

bandwidth of rbf kernel

degree : int

degree of poly kernel

Gradient Descent options

sgd : bool

use regular stochastic gradient descent update

adaptive : bool

use adaptive, individual learning rates

adax : bool

use adaptive learning rates with x^2 instead of g^2x^2

invariant : bool

use save/importance aware updates

normalized : bool

use per feature normalized updates

Scorer options

link : str

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool

use stagewise polynomial feature learning

sched_exponent : int

exponent controlling quantity of included features

batch_sz : int

multiplier on batch size before including more features

batch_sz_no_doubling : bool

batch_sz does not double

Low Rank Quadratics options:

lrq : bool

use low rank quadratic features

lrqdropout : bool

use dropout training for low rank quadratic features

lrqfa : bool

use low rank quadratic features with field aware weights

Input options

data,d : str

path to data file for fitting external to sklearn

cache,c : str

use a cache. default is <data>.cache

cache_file : str

path to cache file to use

json : bool

enable JSON parsing

kill_cache, k : bool

do not reuse existing cache file, create a new one always

self : BaseEstimator
convert_labels = True
convert_to_vw = True
fit(self, X=None, y=None, sample_weight=None)

Fit the model according to the given training data

TODO: for first pass create and store example objects.
for N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)
X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or

Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

y : array-like, shape (n_samples,), optional if not convert_to_vw

Target vector relative to X.

sample_weight : array-like, shape (n_samples,)

sample weight vector relative to X.

self : BaseEstimator

So pipeline can call transform() after fit


Returns coefficient weights as ordered sparse matrix

sparse matrix : coefficient weights for model

Returns intercept weight for model

intercept value : integer, 0 if no constant
get_params(self, deep=True)

This returns the full set of vw and estimator parameters currently in use


Get the vw instance

vw : pyvw.vw instance
load(self, filename)

Load model from file

predict(self, X)

Predict with Vowpal Wabbit model

X : {array-like, sparse matrix}, shape (n_samples, n_features or 1)

Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

y : array-like, shape (n_samples, 1 or n_classes)

Output vector relative to X.

save(self, filename)

Save model to file

set_coefs(self, coefs)

Sets coefficients weights from ordered sparse matrix

coefs : sparse matrix

coefficient weights for model

set_params(self, **kwargs)

This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently

vw_ = None
class vowpalwabbit.sklearn_vw.VWClassifier(loss_function='logistic', **kwargs)

Bases: vowpalwabbit.sklearn_vw.VW, vowpalwabbit.sklearn_vw.LinearClassifierMixin

Vowpal Wabbit Classifier model for binary classification Use VWMultiClassifier for multiclass classification Note - We are assuming the VW.predict returns logits, applying link=logistic will break this assumption

coef_ : scipy.sparse_matrix

Empty sparse matrix used the check if model has been fit

classes_ : np.array

Binary class labels


decision_function(self, X) Predict confidence scores for samples.
densify(self) Convert coefficient matrix to dense array format.
fit(self[, X, y, sample_weight]) Fit the model according to the given training data.
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict class labels for samples in X.
predict_log_proba(self, X) Log of probability estimates.
predict_proba(self, X) Predict probabilities for samples
save(self, filename) Save model to file
score(self, X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
sparsify(self) Convert coefficient matrix to sparse format.
__init__(self, loss_function='logistic', **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Estimator options
convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int

size of example ring

strict_parse : bool

throw on malformed examples

Update options

learning_rate,l : float

Set learning rate

power_t : float

t power value

decay_learning_rate : float

Set Decay factor for learning_rate between passes

initial_t : float

initial t value

feature_mask : str

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str

Initial regressor(s)

initial_weight : float

Set all weights to an initial value of arg.

random_weights : bool

make initial weights random

normal_weights : bool

make initial weights normal

truncated_normal_weights : bool

make initial weights truncated normal

sparse_weights : float

Use a sparse datastructure for weights

input_feature_regularizer : str

Per feature regularization input file

Diagnostic options

quiet : bool

Don’t output disgnostics and progress updates

Randomization options

random_seed : integer

seed random number generator

Feature options

hash : str

how to hash the features. Available options: strings, all

hash_seed : int

seed for hash function

ignore : str

ignore namespaces beginning with character <arg>

ignore_linear : str

ignore namespaces beginning with character <arg> for linear terms only

keep : str

keep namespaces beginning with character <arg>

redefine : str

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,b : integer

number of bits in the feature table

noconstant : bool

Don’t add a constant feature

constant,C : float

Set initial value of constant

ngram : str

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skips : str

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limit : str

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affix : str

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spelling : str

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionary : str

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_path : str

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactions : str

Create feature interactions of any level between namespaces.

permutations : bool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactions : bool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,q : str

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubic : str

Create and use cubic features

Example options

testonly,t : bool

Ignore label information and just test

holdout_off : bool

no holdout data in multiple passes

holdout_period : int

holdout period for test only

holdout_after : int

holdout after n training examples

early_terminate : int

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passes : int

Number of Training Passes

initial_pass_length : int

initial number of examples per pass

examples : int

number of examples to parse

min_prediction : float

Smallest prediction to output

max_prediction : float

Largest prediction to output

sort_features : bool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_function : str

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_tau : float

Parameter tau associated with Quantile loss. Defaults to 0.5

l1 : float

l_1 lambda (L1 regularization)

l2 : float

l_2 lambda (L2 regularization)

no_bias_regularization : bool

no bias in regularization

named_labels : str

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str

Final regressor

readable_model : str

Output human-readable final regressor with numeric features

invert_hash : str

Output human-readable final regressor with feature names. Computationally expensive.

save_resume : bool

save extra state so learning can be resumed later with new data

preserve_performance_counters : bool

reset performance counters when warmstarting

output_feature_regularizer_binary : str

Per feature regularization output file

output_feature_regularizer_text : str

Per feature regularization output file, in text

Multiclass options
oaa : integer

Use one-against-all multiclass learning with labels

oaa_subsample : int

subsample this number of negative examples when learning

ect : integer

Use error correcting tournament multiclass learning

csoaa : integer

Use cost sensitive one-against-all multiclass learning

wap : integer

Use weighted all pairs multiclass learning

probabilities : float

predict probabilities of all classes

Neural Network options

nn : integer

Use a sigmoidal feed-forward neural network with N hidden units

inpass : bool

Train or test sigmoidal feed-forward network with input pass-through

multitask : bool

Share hidden layer across all reduced tasks

dropout : bool

Train or test sigmoidal feed-forward network using dropout

meanfield : bool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool

use conjugate gradient based optimization

bgfs : bool

use bfgs updates

hessian_on : bool

use second derivative in line search

mem : int

memory in bfgs

termination : float

termination threshold

Latent Dirichlet Allocation options

lda : int

Run lda with <int> topics

lda_alpha : float

Prior on sparsity of per-document topic weights

lda_rho : float

Prior on sparsity of topic distributions

lda_D : int

Number of documents

lda_epsilon : float

Loop convergence threshold

minibatch : int

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool

Streaming Stochastic Variance Reduced Gradient

stage_size : int

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool

Run Follow the Proximal Regularized Leader

coin : bool

Coin betting optimizer

pistol : bool

PiSTOL: Parameter free STOchastic Learning

ftrl_alpha : float

Alpha parameter for FTRL optimization

ftrl_beta : float

Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool

kernel svm

kernel : str

type of kernel (rbf or linear (default))

bandwidth : int

bandwidth of rbf kernel

degree : int

degree of poly kernel

Gradient Descent options

sgd : bool

use regular stochastic gradient descent update

adaptive : bool

use adaptive, individual learning rates

adax : bool

use adaptive learning rates with x^2 instead of g^2x^2

invariant : bool

use save/importance aware updates

normalized : bool

use per feature normalized updates

Scorer options

link : str

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool

use stagewise polynomial feature learning

sched_exponent : int

exponent controlling quantity of included features

batch_sz : int

multiplier on batch size before including more features

batch_sz_no_doubling : bool

batch_sz does not double

Low Rank Quadratics options:

lrq : bool

use low rank quadratic features

lrqdropout : bool

use dropout training for low rank quadratic features

lrqfa : bool

use low rank quadratic features with field aware weights

Input options

data,d : str

path to data file for fitting external to sklearn

cache,c : str

use a cache. default is <data>.cache

cache_file : str

path to cache file to use

json : bool

enable JSON parsing

kill_cache, k : bool

do not reuse existing cache file, create a new one always

self : BaseEstimator
classes_ = array([-1., 1.])
coef_ = None
decision_function(self, X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

X : array_like or sparse matrix, shape (n_samples, n_features)


array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

fit(self, X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

X : {array-like, sparse matrix} of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : array-like of shape (n_samples,)

Target vector relative to X.

sample_weight : array-like of shape (n_samples,) default=None

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.


Fitted estimator.

predict(self, X)

Predict class labels for samples in X.

X : array_like or sparse matrix, shape (n_samples, n_features)


C : array, shape [n_samples]

Predicted class label per sample.

predict_proba(self, X)

Predict probabilities for samples

X : {array-like, sparse matrix}, shape = (n_samples, n_features)


T : array-like of shape (n_samples, n_classes)

Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

class vowpalwabbit.sklearn_vw.VWMultiClassifier(probabilities=True, **kwargs)

Bases: vowpalwabbit.sklearn_vw.VWClassifier

Vowpal Wabbit MultiClassifier model Note - We are assuming the VW.predict returns probabilities, setting probabilities=False will break this assumption

classes_ : np.array

class labels

estimator_: dict

type of estimator to use [csoaa, ect, oaa, wap] and number of classes


decision_function(self, X) Predict confidence scores for samples.
densify(self) Convert coefficient matrix to dense array format.
fit(self[, X, y, sample_weight]) Fit the model according to the given training data.
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict class labels for samples in X.
predict_log_proba(self, X) Log of probability estimates.
predict_proba(self, X) Predict probabilities for each class.
save(self, filename) Save model to file
score(self, X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
sparsify(self) Convert coefficient matrix to sparse format.
__init__(self, probabilities=True, **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Estimator options
convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int

size of example ring

strict_parse : bool

throw on malformed examples

Update options

learning_rate,l : float

Set learning rate

power_t : float

t power value

decay_learning_rate : float

Set Decay factor for learning_rate between passes

initial_t : float

initial t value

feature_mask : str

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str

Initial regressor(s)

initial_weight : float

Set all weights to an initial value of arg.

random_weights : bool

make initial weights random

normal_weights : bool

make initial weights normal

truncated_normal_weights : bool

make initial weights truncated normal

sparse_weights : float

Use a sparse datastructure for weights

input_feature_regularizer : str

Per feature regularization input file

Diagnostic options

quiet : bool

Don’t output disgnostics and progress updates

Randomization options

random_seed : integer

seed random number generator

Feature options

hash : str

how to hash the features. Available options: strings, all

hash_seed : int

seed for hash function

ignore : str

ignore namespaces beginning with character <arg>

ignore_linear : str

ignore namespaces beginning with character <arg> for linear terms only

keep : str

keep namespaces beginning with character <arg>

redefine : str

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,b : integer

number of bits in the feature table

noconstant : bool

Don’t add a constant feature

constant,C : float

Set initial value of constant

ngram : str

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skips : str

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limit : str

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affix : str

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spelling : str

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionary : str

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_path : str

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactions : str

Create feature interactions of any level between namespaces.

permutations : bool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactions : bool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,q : str

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubic : str

Create and use cubic features

Example options

testonly,t : bool

Ignore label information and just test

holdout_off : bool

no holdout data in multiple passes

holdout_period : int

holdout period for test only

holdout_after : int

holdout after n training examples

early_terminate : int

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passes : int

Number of Training Passes

initial_pass_length : int

initial number of examples per pass

examples : int

number of examples to parse

min_prediction : float

Smallest prediction to output

max_prediction : float

Largest prediction to output

sort_features : bool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_function : str

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_tau : float

Parameter tau associated with Quantile loss. Defaults to 0.5

l1 : float

l_1 lambda (L1 regularization)

l2 : float

l_2 lambda (L2 regularization)

no_bias_regularization : bool

no bias in regularization

named_labels : str

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str

Final regressor

readable_model : str

Output human-readable final regressor with numeric features

invert_hash : str

Output human-readable final regressor with feature names. Computationally expensive.

save_resume : bool

save extra state so learning can be resumed later with new data

preserve_performance_counters : bool

reset performance counters when warmstarting

output_feature_regularizer_binary : str

Per feature regularization output file

output_feature_regularizer_text : str

Per feature regularization output file, in text

Multiclass options
oaa : integer

Use one-against-all multiclass learning with labels

oaa_subsample : int

subsample this number of negative examples when learning

ect : integer

Use error correcting tournament multiclass learning

csoaa : integer

Use cost sensitive one-against-all multiclass learning

wap : integer

Use weighted all pairs multiclass learning

probabilities : float

predict probabilities of all classes

Neural Network options

nn : integer

Use a sigmoidal feed-forward neural network with N hidden units

inpass : bool

Train or test sigmoidal feed-forward network with input pass-through

multitask : bool

Share hidden layer across all reduced tasks

dropout : bool

Train or test sigmoidal feed-forward network using dropout

meanfield : bool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool

use conjugate gradient based optimization

bgfs : bool

use bfgs updates

hessian_on : bool

use second derivative in line search

mem : int

memory in bfgs

termination : float

termination threshold

Latent Dirichlet Allocation options

lda : int

Run lda with <int> topics

lda_alpha : float

Prior on sparsity of per-document topic weights

lda_rho : float

Prior on sparsity of topic distributions

lda_D : int

Number of documents

lda_epsilon : float

Loop convergence threshold

minibatch : int

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool

Streaming Stochastic Variance Reduced Gradient

stage_size : int

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool

Run Follow the Proximal Regularized Leader

coin : bool

Coin betting optimizer

pistol : bool

PiSTOL: Parameter free STOchastic Learning

ftrl_alpha : float

Alpha parameter for FTRL optimization

ftrl_beta : float

Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool

kernel svm

kernel : str

type of kernel (rbf or linear (default))

bandwidth : int

bandwidth of rbf kernel

degree : int

degree of poly kernel

Gradient Descent options

sgd : bool

use regular stochastic gradient descent update

adaptive : bool

use adaptive, individual learning rates

adax : bool

use adaptive learning rates with x^2 instead of g^2x^2

invariant : bool

use save/importance aware updates

normalized : bool

use per feature normalized updates

Scorer options

link : str

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool

use stagewise polynomial feature learning

sched_exponent : int

exponent controlling quantity of included features

batch_sz : int

multiplier on batch size before including more features

batch_sz_no_doubling : bool

batch_sz does not double

Low Rank Quadratics options:

lrq : bool

use low rank quadratic features

lrqdropout : bool

use dropout training for low rank quadratic features

lrqfa : bool

use low rank quadratic features with field aware weights

Input options

data,d : str

path to data file for fitting external to sklearn

cache,c : str

use a cache. default is <data>.cache

cache_file : str

path to cache file to use

json : bool

enable JSON parsing

kill_cache, k : bool

do not reuse existing cache file, create a new one always

self : BaseEstimator
classes_ = None
decision_function(self, X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

X : array_like or sparse matrix, shape (n_samples, n_features)


array, shape=(n_samples, n_classes)

Confidence scores per (sample, class) combination.

estimator_ = None
fit(self, X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

X : {array-like, sparse matrix} of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : array-like of shape (n_samples,)

Target vector relative to X.

sample_weight : array-like of shape (n_samples,) default=None

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.


Fitted estimator.

predict_proba(self, X)

Predict probabilities for each class.

X : {array-like, sparse matrix}, shape = (n_samples, n_features)


array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.


>>> import numpy as np
>>> X = np.array([ [10, 10], [8, 10], [-5, 5.5], [-5.4, 5.5], [-20, -20],  [-15, -20] ])
>>> y = np.array([1, 1, 2, 2, 3, 3])
>>> from vowpalwabbit.sklearn_vw import VWMultiClassifier
>>> model = VWMultiClassifier(oaa=3, loss_function='logistic')
>>> model.fit(X, y)
>>> model.predict_proba(X)
class vowpalwabbit.sklearn_vw.VWRegressor(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

Bases: vowpalwabbit.sklearn_vw.VW, sklearn.base.RegressorMixin

Vowpal Wabbit Regressor model



fit(self[, X, y, sample_weight]) Fit the model according to the given training data
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict with Vowpal Wabbit model
save(self, filename) Save model to file
score(self, X, y[, sample_weight]) Returns the coefficient of determination R^2 of the prediction.
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
vowpalwabbit.sklearn_vw.tovw(x, y=None, sample_weight=None, convert_labels=False)

Convert array or sparse matrix to Vowpal Wabbit format

x : {array-like, sparse matrix}, shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : {array-like}, shape (n_samples,), optional

Target vector relative to X.

sample_weight : {array-like}, shape (n_samples,), optional

sample weight vector relative to X.

convert_labels : {bool} convert labels of the form [0,1] to [-1,1]
out : {array-like}, shape (n_samples, 1)

Training vectors in VW string format


>>> import pandas as pd
>>> from sklearn.feature_extraction.text import HashingVectorizer
>>> from vowpalwabbit.sklearn_vw import tovw
>>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog')
>>> y = pd.Series([-1, 1, -1, -1], name='label')
>>> hv = HashingVectorizer()
>>> hashed = hv.fit_transform(X)
>>> tovw(x=hashed, y=y)