vowpalwabbit.sklearn

Utilities to support integration of Vowpal Wabbit and scikit-learn

class vowpalwabbit.sklearn_vw.LinearClassifierMixin

Bases: sklearn.linear_model._logistic.LogisticRegression

__init__()

Initialize self. See help(type(self)) for accurate signature.

class vowpalwabbit.sklearn_vw.VW(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

Bases: sklearn.base.BaseEstimator

Vowpal Wabbit Scikit-learn Base Estimator wrapper

Attributes
convert_to_vwbool

flag to convert X input to vw format

convert_labelsbool

Convert labels of the form [0,1] to [-1,1]

vw_pyvw.vw

vw instance

__init__(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters
Estimator options
convert_to_vwbool

flag to convert X input to vw format

convert_labelsbool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_sizeint

size of example ring

strict_parsebool

throw on malformed examples

Update options

learning_rate,lfloat

Set learning rate

power_tfloat

t power value

decay_learning_ratefloat

Set Decay factor for learning_rate between passes

initial_tfloat

initial t value

feature_maskstr

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,istr

Initial regressor(s)

initial_weightfloat

Set all weights to an initial value of arg.

random_weightsbool

make initial weights random

normal_weightsbool

make initial weights normal

truncated_normal_weightsbool

make initial weights truncated normal

sparse_weightsfloat

Use a sparse datastructure for weights

input_feature_regularizerstr

Per feature regularization input file

Diagnostic options

quietbool

Don’t output disgnostics and progress updates

Randomization options

random_seedinteger

seed random number generator

Feature options

hashstr

how to hash the features. Available options: strings, all

hash_seedint

seed for hash function

ignorestr

ignore namespaces beginning with character <arg>

ignore_linearstr

ignore namespaces beginning with character <arg> for linear terms only

keepstr

keep namespaces beginning with character <arg>

redefinestr

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,binteger

number of bits in the feature table

noconstantbool

Don’t add a constant feature

constant,Cfloat

Set initial value of constant

ngramstr

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skipsstr

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limitstr

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affixstr

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spellingstr

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionarystr

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_pathstr

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactionsstr

Create feature interactions of any level between namespaces.

permutationsbool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactionsbool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,qstr

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubicstr

Create and use cubic features

Example options

testonly,tbool

Ignore label information and just test

holdout_offbool

no holdout data in multiple passes

holdout_periodint

holdout period for test only

holdout_afterint

holdout after n training examples

early_terminateint

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passesint

Number of Training Passes

initial_pass_lengthint

initial number of examples per pass

examplesint

number of examples to parse

min_predictionfloat

Smallest prediction to output

max_predictionfloat

Largest prediction to output

sort_featuresbool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_functionstr

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_taufloat

Parameter tau associated with Quantile loss. Defaults to 0.5

l1float

l_1 lambda (L1 regularization)

l2float

l_2 lambda (L2 regularization)

no_bias_regularizationbool

no bias in regularization

named_labelsstr

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,fstr

Final regressor

readable_modelstr

Output human-readable final regressor with numeric features

invert_hashstr

Output human-readable final regressor with feature names. Computationally expensive.

save_resumebool

save extra state so learning can be resumed later with new data

preserve_performance_countersbool

reset performance counters when warmstarting

output_feature_regularizer_binarystr

Per feature regularization output file

output_feature_regularizer_textstr

Per feature regularization output file, in text

Multiclass options
oaainteger

Use one-against-all multiclass learning with labels

oaa_subsampleint

subsample this number of negative examples when learning

ectinteger

Use error correcting tournament multiclass learning

csoaainteger

Use cost sensitive one-against-all multiclass learning

wapinteger

Use weighted all pairs multiclass learning

probabilitiesfloat

predict probabilities of all classes

Neural Network options

nninteger

Use a sigmoidal feed-forward neural network with N hidden units

inpassbool

Train or test sigmoidal feed-forward network with input pass-through

multitaskbool

Share hidden layer across all reduced tasks

dropoutbool

Train or test sigmoidal feed-forward network using dropout

meanfieldbool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradientbool

use conjugate gradient based optimization

bgfsbool

use bfgs updates

hessian_onbool

use second derivative in line search

memint

memory in bfgs

terminationfloat

termination threshold

Latent Dirichlet Allocation options

ldaint

Run lda with <int> topics

lda_alphafloat

Prior on sparsity of per-document topic weights

lda_rhofloat

Prior on sparsity of topic distributions

lda_Dint

Number of documents

lda_epsilonfloat

Loop convergence threshold

minibatchint

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrgbool

Streaming Stochastic Variance Reduced Gradient

stage_sizeint

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrlbool

Run Follow the Proximal Regularized Leader

coinbool

Coin betting optimizer

pistolbool

PiSTOL: Parameter free STOchastic Learning

ftrl_alphafloat

Alpha parameter for FTRL optimization

ftrl_betafloat

Beta parameters for FTRL optimization

Kernel SVM options

ksvmbool

kernel svm

kernelstr

type of kernel (rbf or linear (default))

bandwidthint

bandwidth of rbf kernel

degreeint

degree of poly kernel

Gradient Descent options

sgdbool

use regular stochastic gradient descent update

adaptivebool

use adaptive, individual learning rates

adaxbool

use adaptive learning rates with x^2 instead of g^2x^2

invariantbool

use save/importance aware updates

normalizedbool

use per feature normalized updates

Scorer options

linkstr

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_polybool

use stagewise polynomial feature learning

sched_exponentint

exponent controlling quantity of included features

batch_szint

multiplier on batch size before including more features

batch_sz_no_doublingbool

batch_sz does not double

Low Rank Quadratics options:

lrqbool

use low rank quadratic features

lrqdropoutbool

use dropout training for low rank quadratic features

lrqfabool

use low rank quadratic features with field aware weights

Input options

data,dstr

path to data file for fitting external to sklearn

cache,cstr

use a cache. default is <data>.cache

cache_filestr

path to cache file to use

jsonbool

enable JSON parsing

kill_cache, kbool

do not reuse existing cache file, create a new one always

Returns
selfBaseEstimator
convert_labels = True
convert_to_vw = True
fit(X=None, y=None, sample_weight=None)

Fit the model according to the given training data

TODO: for first pass create and store example objects.

for N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)

Parameters
X{array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or

Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

yarray-like, shape (n_samples,), optional if not convert_to_vw

Target vector relative to X.

sample_weightarray-like, shape (n_samples,)

sample weight vector relative to X.

Returns
selfBaseEstimator

So pipeline can call transform() after fit

get_coefs()

Returns coefficient weights as ordered sparse matrix

Returns
sparse matrixcoefficient weights for model
get_intercept()

Returns intercept weight for model

Returns
intercept valueinteger, 0 if no constant
get_params(deep=True)

This returns the full set of vw and estimator parameters currently in use

get_vw()

Get the vw instance

Returns
vwpyvw.vw instance
load(filename)

Load model from file

predict(X)

Predict with Vowpal Wabbit model

Parameters
X{array-like, sparse matrix}, shape (n_samples, n_features or 1)

Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

Returns
yarray-like, shape (n_samples, 1 or n_classes)

Output vector relative to X.

save(filename)

Save model to file

set_coefs(coefs)

Sets coefficients weights from ordered sparse matrix

Parameters
coefssparse matrix

coefficient weights for model

set_params(**kwargs)

This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently

vw_ = None
class vowpalwabbit.sklearn_vw.VWClassifier(loss_function='logistic', **kwargs)

Bases: vowpalwabbit.sklearn_vw.VW, vowpalwabbit.sklearn_vw.LinearClassifierMixin

Vowpal Wabbit Classifier model for binary classification Use VWMultiClassifier for multiclass classification Note - We are assuming the VW.predict returns logits, applying link=logistic will break this assumption

Attributes
coef_scipy.sparse_matrix

Empty sparse matrix used the check if model has been fit

classes_np.array

Binary class labels

__init__(loss_function='logistic', **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters
Estimator options
convert_to_vwbool

flag to convert X input to vw format

convert_labelsbool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_sizeint

size of example ring

strict_parsebool

throw on malformed examples

Update options

learning_rate,lfloat

Set learning rate

power_tfloat

t power value

decay_learning_ratefloat

Set Decay factor for learning_rate between passes

initial_tfloat

initial t value

feature_maskstr

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,istr

Initial regressor(s)

initial_weightfloat

Set all weights to an initial value of arg.

random_weightsbool

make initial weights random

normal_weightsbool

make initial weights normal

truncated_normal_weightsbool

make initial weights truncated normal

sparse_weightsfloat

Use a sparse datastructure for weights

input_feature_regularizerstr

Per feature regularization input file

Diagnostic options

quietbool

Don’t output disgnostics and progress updates

Randomization options

random_seedinteger

seed random number generator

Feature options

hashstr

how to hash the features. Available options: strings, all

hash_seedint

seed for hash function

ignorestr

ignore namespaces beginning with character <arg>

ignore_linearstr

ignore namespaces beginning with character <arg> for linear terms only

keepstr

keep namespaces beginning with character <arg>

redefinestr

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,binteger

number of bits in the feature table

noconstantbool

Don’t add a constant feature

constant,Cfloat

Set initial value of constant

ngramstr

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skipsstr

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limitstr

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affixstr

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spellingstr

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionarystr

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_pathstr

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactionsstr

Create feature interactions of any level between namespaces.

permutationsbool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactionsbool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,qstr

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubicstr

Create and use cubic features

Example options

testonly,tbool

Ignore label information and just test

holdout_offbool

no holdout data in multiple passes

holdout_periodint

holdout period for test only

holdout_afterint

holdout after n training examples

early_terminateint

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passesint

Number of Training Passes

initial_pass_lengthint

initial number of examples per pass

examplesint

number of examples to parse

min_predictionfloat

Smallest prediction to output

max_predictionfloat

Largest prediction to output

sort_featuresbool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_functionstr

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_taufloat

Parameter tau associated with Quantile loss. Defaults to 0.5

l1float

l_1 lambda (L1 regularization)

l2float

l_2 lambda (L2 regularization)

no_bias_regularizationbool

no bias in regularization

named_labelsstr

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,fstr

Final regressor

readable_modelstr

Output human-readable final regressor with numeric features

invert_hashstr

Output human-readable final regressor with feature names. Computationally expensive.

save_resumebool

save extra state so learning can be resumed later with new data

preserve_performance_countersbool

reset performance counters when warmstarting

output_feature_regularizer_binarystr

Per feature regularization output file

output_feature_regularizer_textstr

Per feature regularization output file, in text

Multiclass options
oaainteger

Use one-against-all multiclass learning with labels

oaa_subsampleint

subsample this number of negative examples when learning

ectinteger

Use error correcting tournament multiclass learning

csoaainteger

Use cost sensitive one-against-all multiclass learning

wapinteger

Use weighted all pairs multiclass learning

probabilitiesfloat

predict probabilities of all classes

Neural Network options

nninteger

Use a sigmoidal feed-forward neural network with N hidden units

inpassbool

Train or test sigmoidal feed-forward network with input pass-through

multitaskbool

Share hidden layer across all reduced tasks

dropoutbool

Train or test sigmoidal feed-forward network using dropout

meanfieldbool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradientbool

use conjugate gradient based optimization

bgfsbool

use bfgs updates

hessian_onbool

use second derivative in line search

memint

memory in bfgs

terminationfloat

termination threshold

Latent Dirichlet Allocation options

ldaint

Run lda with <int> topics

lda_alphafloat

Prior on sparsity of per-document topic weights

lda_rhofloat

Prior on sparsity of topic distributions

lda_Dint

Number of documents

lda_epsilonfloat

Loop convergence threshold

minibatchint

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrgbool

Streaming Stochastic Variance Reduced Gradient

stage_sizeint

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrlbool

Run Follow the Proximal Regularized Leader

coinbool

Coin betting optimizer

pistolbool

PiSTOL: Parameter free STOchastic Learning

ftrl_alphafloat

Alpha parameter for FTRL optimization

ftrl_betafloat

Beta parameters for FTRL optimization

Kernel SVM options

ksvmbool

kernel svm

kernelstr

type of kernel (rbf or linear (default))

bandwidthint

bandwidth of rbf kernel

degreeint

degree of poly kernel

Gradient Descent options

sgdbool

use regular stochastic gradient descent update

adaptivebool

use adaptive, individual learning rates

adaxbool

use adaptive learning rates with x^2 instead of g^2x^2

invariantbool

use save/importance aware updates

normalizedbool

use per feature normalized updates

Scorer options

linkstr

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_polybool

use stagewise polynomial feature learning

sched_exponentint

exponent controlling quantity of included features

batch_szint

multiplier on batch size before including more features

batch_sz_no_doublingbool

batch_sz does not double

Low Rank Quadratics options:

lrqbool

use low rank quadratic features

lrqdropoutbool

use dropout training for low rank quadratic features

lrqfabool

use low rank quadratic features with field aware weights

Input options

data,dstr

path to data file for fitting external to sklearn

cache,cstr

use a cache. default is <data>.cache

cache_filestr

path to cache file to use

jsonbool

enable JSON parsing

kill_cache, kbool

do not reuse existing cache file, create a new one always

Returns
selfBaseEstimator
classes_ = array([-1.,  1.])
coef_ = None
decision_function(X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters
Xarray_like or sparse matrix, shape (n_samples, n_features)

Samples.

Returns
array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

fit(X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

Parameters
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Target vector relative to X.

sample_weightarray-like of shape (n_samples,) default=None

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns
self

Fitted estimator.

predict(X)

Predict class labels for samples in X.

Parameters
Xarray_like or sparse matrix, shape (n_samples, n_features)

Samples.

Returns
Carray, shape [n_samples]

Predicted class label per sample.

predict_proba(X)

Predict probabilities for samples

Parameters
X{array-like, sparse matrix}, shape = (n_samples, n_features)

Samples.

Returns
Tarray-like of shape (n_samples, n_classes)

Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

class vowpalwabbit.sklearn_vw.VWMultiClassifier(probabilities=True, **kwargs)

Bases: vowpalwabbit.sklearn_vw.VWClassifier

Vowpal Wabbit MultiClassifier model Note - We are assuming the VW.predict returns probabilities, setting probabilities=False will break this assumption

Attributes
classes_np.array

class labels

estimator_: dict

type of estimator to use [csoaa, ect, oaa, wap] and number of classes

__init__(probabilities=True, **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters
Estimator options
convert_to_vwbool

flag to convert X input to vw format

convert_labelsbool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_sizeint

size of example ring

strict_parsebool

throw on malformed examples

Update options

learning_rate,lfloat

Set learning rate

power_tfloat

t power value

decay_learning_ratefloat

Set Decay factor for learning_rate between passes

initial_tfloat

initial t value

feature_maskstr

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,istr

Initial regressor(s)

initial_weightfloat

Set all weights to an initial value of arg.

random_weightsbool

make initial weights random

normal_weightsbool

make initial weights normal

truncated_normal_weightsbool

make initial weights truncated normal

sparse_weightsfloat

Use a sparse datastructure for weights

input_feature_regularizerstr

Per feature regularization input file

Diagnostic options

quietbool

Don’t output disgnostics and progress updates

Randomization options

random_seedinteger

seed random number generator

Feature options

hashstr

how to hash the features. Available options: strings, all

hash_seedint

seed for hash function

ignorestr

ignore namespaces beginning with character <arg>

ignore_linearstr

ignore namespaces beginning with character <arg> for linear terms only

keepstr

keep namespaces beginning with character <arg>

redefinestr

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,binteger

number of bits in the feature table

noconstantbool

Don’t add a constant feature

constant,Cfloat

Set initial value of constant

ngramstr

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skipsstr

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limitstr

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affixstr

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spellingstr

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionarystr

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_pathstr

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactionsstr

Create feature interactions of any level between namespaces.

permutationsbool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactionsbool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,qstr

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubicstr

Create and use cubic features

Example options

testonly,tbool

Ignore label information and just test

holdout_offbool

no holdout data in multiple passes

holdout_periodint

holdout period for test only

holdout_afterint

holdout after n training examples

early_terminateint

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passesint

Number of Training Passes

initial_pass_lengthint

initial number of examples per pass

examplesint

number of examples to parse

min_predictionfloat

Smallest prediction to output

max_predictionfloat

Largest prediction to output

sort_featuresbool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_functionstr

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_taufloat

Parameter tau associated with Quantile loss. Defaults to 0.5

l1float

l_1 lambda (L1 regularization)

l2float

l_2 lambda (L2 regularization)

no_bias_regularizationbool

no bias in regularization

named_labelsstr

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,fstr

Final regressor

readable_modelstr

Output human-readable final regressor with numeric features

invert_hashstr

Output human-readable final regressor with feature names. Computationally expensive.

save_resumebool

save extra state so learning can be resumed later with new data

preserve_performance_countersbool

reset performance counters when warmstarting

output_feature_regularizer_binarystr

Per feature regularization output file

output_feature_regularizer_textstr

Per feature regularization output file, in text

Multiclass options
oaainteger

Use one-against-all multiclass learning with labels

oaa_subsampleint

subsample this number of negative examples when learning

ectinteger

Use error correcting tournament multiclass learning

csoaainteger

Use cost sensitive one-against-all multiclass learning

wapinteger

Use weighted all pairs multiclass learning

probabilitiesfloat

predict probabilities of all classes

Neural Network options

nninteger

Use a sigmoidal feed-forward neural network with N hidden units

inpassbool

Train or test sigmoidal feed-forward network with input pass-through

multitaskbool

Share hidden layer across all reduced tasks

dropoutbool

Train or test sigmoidal feed-forward network using dropout

meanfieldbool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradientbool

use conjugate gradient based optimization

bgfsbool

use bfgs updates

hessian_onbool

use second derivative in line search

memint

memory in bfgs

terminationfloat

termination threshold

Latent Dirichlet Allocation options

ldaint

Run lda with <int> topics

lda_alphafloat

Prior on sparsity of per-document topic weights

lda_rhofloat

Prior on sparsity of topic distributions

lda_Dint

Number of documents

lda_epsilonfloat

Loop convergence threshold

minibatchint

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrgbool

Streaming Stochastic Variance Reduced Gradient

stage_sizeint

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrlbool

Run Follow the Proximal Regularized Leader

coinbool

Coin betting optimizer

pistolbool

PiSTOL: Parameter free STOchastic Learning

ftrl_alphafloat

Alpha parameter for FTRL optimization

ftrl_betafloat

Beta parameters for FTRL optimization

Kernel SVM options

ksvmbool

kernel svm

kernelstr

type of kernel (rbf or linear (default))

bandwidthint

bandwidth of rbf kernel

degreeint

degree of poly kernel

Gradient Descent options

sgdbool

use regular stochastic gradient descent update

adaptivebool

use adaptive, individual learning rates

adaxbool

use adaptive learning rates with x^2 instead of g^2x^2

invariantbool

use save/importance aware updates

normalizedbool

use per feature normalized updates

Scorer options

linkstr

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_polybool

use stagewise polynomial feature learning

sched_exponentint

exponent controlling quantity of included features

batch_szint

multiplier on batch size before including more features

batch_sz_no_doublingbool

batch_sz does not double

Low Rank Quadratics options:

lrqbool

use low rank quadratic features

lrqdropoutbool

use dropout training for low rank quadratic features

lrqfabool

use low rank quadratic features with field aware weights

Input options

data,dstr

path to data file for fitting external to sklearn

cache,cstr

use a cache. default is <data>.cache

cache_filestr

path to cache file to use

jsonbool

enable JSON parsing

kill_cache, kbool

do not reuse existing cache file, create a new one always

Returns
selfBaseEstimator
classes_ = None
decision_function(X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters
Xarray_like or sparse matrix, shape (n_samples, n_features)

Samples.

Returns
array, shape=(n_samples, n_classes)

Confidence scores per (sample, class) combination.

estimator_ = None
fit(X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

Parameters
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Target vector relative to X.

sample_weightarray-like of shape (n_samples,) default=None

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns
self

Fitted estimator.

predict_proba(X)

Predict probabilities for each class.

Parameters
X{array-like, sparse matrix}, shape = (n_samples, n_features)

Samples.

Returns
array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Examples

>>> import numpy as np
>>> X = np.array([ [10, 10], [8, 10], [-5, 5.5], [-5.4, 5.5], [-20, -20],  [-15, -20] ])
>>> y = np.array([1, 1, 2, 2, 3, 3])
>>> from vowpalwabbit.sklearn_vw import VWMultiClassifier
>>> model = VWMultiClassifier(oaa=3, loss_function='logistic')
>>> _ = model.fit(X, y)
>>> model.predict_proba(X)
array([[0.38928846, 0.30534211, 0.30536944],
       [0.40664235, 0.29666999, 0.29668769],
       [0.52324486, 0.23841164, 0.23834346],
       [0.5268591 , 0.23660533, 0.23653553],
       [0.65397811, 0.17312808, 0.17289382],
       [0.61190444, 0.19416356, 0.19393198]])
class vowpalwabbit.sklearn_vw.VWRegressor(convert_labels=False, **kwargs)

Bases: vowpalwabbit.sklearn_vw.VW, sklearn.base.RegressorMixin

Vowpal Wabbit Regressor model

__init__(convert_labels=False, **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters
Estimator options
convert_to_vwbool

flag to convert X input to vw format

convert_labelsbool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_sizeint

size of example ring

strict_parsebool

throw on malformed examples

Update options

learning_rate,lfloat

Set learning rate

power_tfloat

t power value

decay_learning_ratefloat

Set Decay factor for learning_rate between passes

initial_tfloat

initial t value

feature_maskstr

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,istr

Initial regressor(s)

initial_weightfloat

Set all weights to an initial value of arg.

random_weightsbool

make initial weights random

normal_weightsbool

make initial weights normal

truncated_normal_weightsbool

make initial weights truncated normal

sparse_weightsfloat

Use a sparse datastructure for weights

input_feature_regularizerstr

Per feature regularization input file

Diagnostic options

quietbool

Don’t output disgnostics and progress updates

Randomization options

random_seedinteger

seed random number generator

Feature options

hashstr

how to hash the features. Available options: strings, all

hash_seedint

seed for hash function

ignorestr

ignore namespaces beginning with character <arg>

ignore_linearstr

ignore namespaces beginning with character <arg> for linear terms only

keepstr

keep namespaces beginning with character <arg>

redefinestr

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,binteger

number of bits in the feature table

noconstantbool

Don’t add a constant feature

constant,Cfloat

Set initial value of constant

ngramstr

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skipsstr

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limitstr

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affixstr

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spellingstr

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionarystr

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_pathstr

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactionsstr

Create feature interactions of any level between namespaces.

permutationsbool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactionsbool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,qstr

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubicstr

Create and use cubic features

Example options

testonly,tbool

Ignore label information and just test

holdout_offbool

no holdout data in multiple passes

holdout_periodint

holdout period for test only

holdout_afterint

holdout after n training examples

early_terminateint

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passesint

Number of Training Passes

initial_pass_lengthint

initial number of examples per pass

examplesint

number of examples to parse

min_predictionfloat

Smallest prediction to output

max_predictionfloat

Largest prediction to output

sort_featuresbool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_functionstr

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_taufloat

Parameter tau associated with Quantile loss. Defaults to 0.5

l1float

l_1 lambda (L1 regularization)

l2float

l_2 lambda (L2 regularization)

no_bias_regularizationbool

no bias in regularization

named_labelsstr

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,fstr

Final regressor

readable_modelstr

Output human-readable final regressor with numeric features

invert_hashstr

Output human-readable final regressor with feature names. Computationally expensive.

save_resumebool

save extra state so learning can be resumed later with new data

preserve_performance_countersbool

reset performance counters when warmstarting

output_feature_regularizer_binarystr

Per feature regularization output file

output_feature_regularizer_textstr

Per feature regularization output file, in text

Multiclass options
oaainteger

Use one-against-all multiclass learning with labels

oaa_subsampleint

subsample this number of negative examples when learning

ectinteger

Use error correcting tournament multiclass learning

csoaainteger

Use cost sensitive one-against-all multiclass learning

wapinteger

Use weighted all pairs multiclass learning

probabilitiesfloat

predict probabilities of all classes

Neural Network options

nninteger

Use a sigmoidal feed-forward neural network with N hidden units

inpassbool

Train or test sigmoidal feed-forward network with input pass-through

multitaskbool

Share hidden layer across all reduced tasks

dropoutbool

Train or test sigmoidal feed-forward network using dropout

meanfieldbool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradientbool

use conjugate gradient based optimization

bgfsbool

use bfgs updates

hessian_onbool

use second derivative in line search

memint

memory in bfgs

terminationfloat

termination threshold

Latent Dirichlet Allocation options

ldaint

Run lda with <int> topics

lda_alphafloat

Prior on sparsity of per-document topic weights

lda_rhofloat

Prior on sparsity of topic distributions

lda_Dint

Number of documents

lda_epsilonfloat

Loop convergence threshold

minibatchint

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrgbool

Streaming Stochastic Variance Reduced Gradient

stage_sizeint

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrlbool

Run Follow the Proximal Regularized Leader

coinbool

Coin betting optimizer

pistolbool

PiSTOL: Parameter free STOchastic Learning

ftrl_alphafloat

Alpha parameter for FTRL optimization

ftrl_betafloat

Beta parameters for FTRL optimization

Kernel SVM options

ksvmbool

kernel svm

kernelstr

type of kernel (rbf or linear (default))

bandwidthint

bandwidth of rbf kernel

degreeint

degree of poly kernel

Gradient Descent options

sgdbool

use regular stochastic gradient descent update

adaptivebool

use adaptive, individual learning rates

adaxbool

use adaptive learning rates with x^2 instead of g^2x^2

invariantbool

use save/importance aware updates

normalizedbool

use per feature normalized updates

Scorer options

linkstr

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_polybool

use stagewise polynomial feature learning

sched_exponentint

exponent controlling quantity of included features

batch_szint

multiplier on batch size before including more features

batch_sz_no_doublingbool

batch_sz does not double

Low Rank Quadratics options:

lrqbool

use low rank quadratic features

lrqdropoutbool

use dropout training for low rank quadratic features

lrqfabool

use low rank quadratic features with field aware weights

Input options

data,dstr

path to data file for fitting external to sklearn

cache,cstr

use a cache. default is <data>.cache

cache_filestr

path to cache file to use

jsonbool

enable JSON parsing

kill_cache, kbool

do not reuse existing cache file, create a new one always

Returns
selfBaseEstimator
vowpalwabbit.sklearn_vw.tovw(x, y=None, sample_weight=None, convert_labels=False)

Convert array or sparse matrix to Vowpal Wabbit format

Parameters
x{array-like, sparse matrix}, shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y{array-like}, shape (n_samples,), optional

Target vector relative to X.

sample_weight{array-like}, shape (n_samples,), optional

sample weight vector relative to X.

convert_labels{bool} convert labels of the form [0,1] to [-1,1]
Returns
out{array-like}, shape (n_samples, 1)

Training vectors in VW string format

Examples

>>> import pandas as pd
>>> from sklearn.feature_extraction.text import HashingVectorizer
>>> from vowpalwabbit.sklearn_vw import tovw
>>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog')
>>> y = pd.Series([-1, 1, -1, -1], name='label')
>>> hv = HashingVectorizer()
>>> hashed = hv.fit_transform(X)
>>> tovw(x=hashed, y=y)
['-1 1 | 300839:1', '1 1 | 980517:-1', '-1 1 | 300839:1', '-1 1 | 300839:1']