vowpalwabbit.sklearn

Utilities to support integration of Vowpal Wabbit and scikit-learn

class vowpalwabbit.sklearn_vw.LinearClassifierMixin

Bases: sklearn.linear_model.logistic.LogisticRegression

Methods

decision_function(self, X) Predict confidence scores for samples.
densify(self) Convert coefficient matrix to dense array format.
fit(self, X, y[, sample_weight]) Fit the model according to the given training data.
get_params(self[, deep]) Get parameters for this estimator.
predict(self, X) Predict class labels for samples in X.
predict_log_proba(self, X) Log of probability estimates.
predict_proba(self, X) Probability estimates.
score(self, X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_params(self, \*\*params) Set the parameters of this estimator.
sparsify(self) Convert coefficient matrix to sparse format.
__init__(self)

x.__init__(…) initializes x; see help(type(x)) for signature

class vowpalwabbit.sklearn_vw.VW(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

Bases: sklearn.base.BaseEstimator

Vowpal Wabbit Scikit-learn Base Estimator wrapper

Attributes:
convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

vw_ : pyvw.vw

vw instance

Methods

fit(self[, X, y, sample_weight]) Fit the model according to the given training data
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict with Vowpal Wabbit model
save(self, filename) Save model to file
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
__init__(self, convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:
Estimator options
convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int

size of example ring

strict_parse : bool

throw on malformed examples

Update options

learning_rate,l : float

Set learning rate

power_t : float

t power value

decay_learning_rate : float

Set Decay factor for learning_rate between passes

initial_t : float

initial t value

feature_mask : str

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str

Initial regressor(s)

initial_weight : float

Set all weights to an initial value of arg.

random_weights : bool

make initial weights random

normal_weights : bool

make initial weights normal

truncated_normal_weights : bool

make initial weights truncated normal

sparse_weights : float

Use a sparse datastructure for weights

input_feature_regularizer : str

Per feature regularization input file

Diagnostic options

quiet : bool

Don’t output disgnostics and progress updates

Randomization options

random_seed : integer

seed random number generator

Feature options

hash : str

how to hash the features. Available options: strings, all

hash_seed : int

seed for hash function

ignore : str

ignore namespaces beginning with character <arg>

ignore_linear : str

ignore namespaces beginning with character <arg> for linear terms only

keep : str

keep namespaces beginning with character <arg>

redefine : str

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,b : integer

number of bits in the feature table

noconstant : bool

Don’t add a constant feature

constant,C : float

Set initial value of constant

ngram : str

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skips : str

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limit : str

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affix : str

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spelling : str

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionary : str

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_path : str

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactions : str

Create feature interactions of any level between namespaces.

permutations : bool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactions : bool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,q : str

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubic : str

Create and use cubic features

Example options

testonly,t : bool

Ignore label information and just test

holdout_off : bool

no holdout data in multiple passes

holdout_period : int

holdout period for test only

holdout_after : int

holdout after n training examples

early_terminate : int

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passes : int

Number of Training Passes

initial_pass_length : int

initial number of examples per pass

examples : int

number of examples to parse

min_prediction : float

Smallest prediction to output

max_prediction : float

Largest prediction to output

sort_features : bool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_function : str

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_tau : float

Parameter tau associated with Quantile loss. Defaults to 0.5

l1 : float

l_1 lambda (L1 regularization)

l2 : float

l_2 lambda (L2 regularization)

no_bias_regularization : bool

no bias in regularization

named_labels : str

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str

Final regressor

readable_model : str

Output human-readable final regressor with numeric features

invert_hash : str

Output human-readable final regressor with feature names. Computationally expensive.

save_resume : bool

save extra state so learning can be resumed later with new data

preserve_performance_counters : bool

reset performance counters when warmstarting

output_feature_regularizer_binary : str

Per feature regularization output file

output_feature_regularizer_text : str

Per feature regularization output file, in text

Multiclass options
oaa : integer

Use one-against-all multiclass learning with labels

oaa_subsample : int

subsample this number of negative examples when learning

ect : integer

Use error correcting tournament multiclass learning

csoaa : integer

Use cost sensitive one-against-all multiclass learning

wap : integer

Use weighted all pairs multiclass learning

probabilities : float

predict probabilities of all classes

Neural Network options

nn : integer

Use a sigmoidal feed-forward neural network with N hidden units

inpass : bool

Train or test sigmoidal feed-forward network with input pass-through

multitask : bool

Share hidden layer across all reduced tasks

dropout : bool

Train or test sigmoidal feed-forward network using dropout

meanfield : bool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool

use conjugate gradient based optimization

bgfs : bool

use bfgs updates

hessian_on : bool

use second derivative in line search

mem : int

memory in bfgs

termination : float

termination threshold

Latent Dirichlet Allocation options

lda : int

Run lda with <int> topics

lda_alpha : float

Prior on sparsity of per-document topic weights

lda_rho : float

Prior on sparsity of topic distributions

lda_D : int

Number of documents

lda_epsilon : float

Loop convergence threshold

minibatch : int

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool

Streaming Stochastic Variance Reduced Gradient

stage_size : int

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool

Run Follow the Proximal Regularized Leader

coin : bool

Coin betting optimizer

pistol : bool

PiSTOL: Parameter free STOchastic Learning

ftrl_alpha : float

Alpha parameter for FTRL optimization

ftrl_beta : float

Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool

kernel svm

kernel : str

type of kernel (rbf or linear (default))

bandwidth : int

bandwidth of rbf kernel

degree : int

degree of poly kernel

Gradient Descent options

sgd : bool

use regular stochastic gradient descent update

adaptive : bool

use adaptive, individual learning rates

adax : bool

use adaptive learning rates with x^2 instead of g^2x^2

invariant : bool

use save/importance aware updates

normalized : bool

use per feature normalized updates

Scorer options

link : str

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool

use stagewise polynomial feature learning

sched_exponent : int

exponent controlling quantity of included features

batch_sz : int

multiplier on batch size before including more features

batch_sz_no_doubling : bool

batch_sz does not double

Low Rank Quadratics options:

lrq : bool

use low rank quadratic features

lrqdropout : bool

use dropout training for low rank quadratic features

lrqfa : bool

use low rank quadratic features with field aware weights

Input options

data,d : str

path to data file for fitting external to sklearn

cache,c : str

use a cache. default is <data>.cache

cache_file : str

path to cache file to use

json : bool

enable JSON parsing

kill_cache, k : bool

do not reuse existing cache file, create a new one always

Returns:
self : BaseEstimator
convert_labels = True
convert_to_vw = True
fit(self, X=None, y=None, sample_weight=None)

Fit the model according to the given training data

TODO: for first pass create and store example objects.
for N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)
Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or

Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

y : array-like, shape (n_samples,), optional if not convert_to_vw

Target vector relative to X.

sample_weight : array-like, shape (n_samples,)

sample weight vector relative to X.

Returns:
self : BaseEstimator

So pipeline can call transform() after fit

get_coefs(self)

Returns coefficient weights as ordered sparse matrix

Returns:
sparse matrix : coefficient weights for model
get_intercept(self)

Returns intercept weight for model

Returns:
intercept value : integer, 0 if no constant
get_params(self, deep=True)

This returns the full set of vw and estimator parameters currently in use

get_vw(self)

Get the vw instance

Returns:
vw : pyvw.vw instance
load(self, filename)

Load model from file

predict(self, X)

Predict with Vowpal Wabbit model

Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features or 1)

Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

Returns:
y : array-like, shape (n_samples, 1 or n_classes)

Output vector relative to X.

save(self, filename)

Save model to file

set_coefs(self, coefs)

Sets coefficients weights from ordered sparse matrix

Parameters:
coefs : sparse matrix

coefficient weights for model

set_params(self, **kwargs)

This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently

vw_ = None
class vowpalwabbit.sklearn_vw.VWClassifier(loss_function='logistic', **kwargs)

Bases: vowpalwabbit.sklearn_vw.VW, vowpalwabbit.sklearn_vw.LinearClassifierMixin

Vowpal Wabbit Classifier model for binary classification Use VWMultiClassifier for multiclass classification Note - We are assuming the VW.predict returns logits, applying link=logistic will break this assumption

Attributes:
coef_ : scipy.sparse_matrix

Empty sparse matrix used the check if model has been fit

classes_ : np.array

Binary class labels

Methods

decision_function(self, X) Predict confidence scores for samples.
densify(self) Convert coefficient matrix to dense array format.
fit(self[, X, y, sample_weight]) Fit the model according to the given training data.
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict class labels for samples in X.
predict_log_proba(self, X) Log of probability estimates.
predict_proba(self, X) Predict probabilities for samples
save(self, filename) Save model to file
score(self, X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
sparsify(self) Convert coefficient matrix to sparse format.
__init__(self, loss_function='logistic', **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:
Estimator options
convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int

size of example ring

strict_parse : bool

throw on malformed examples

Update options

learning_rate,l : float

Set learning rate

power_t : float

t power value

decay_learning_rate : float

Set Decay factor for learning_rate between passes

initial_t : float

initial t value

feature_mask : str

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str

Initial regressor(s)

initial_weight : float

Set all weights to an initial value of arg.

random_weights : bool

make initial weights random

normal_weights : bool

make initial weights normal

truncated_normal_weights : bool

make initial weights truncated normal

sparse_weights : float

Use a sparse datastructure for weights

input_feature_regularizer : str

Per feature regularization input file

Diagnostic options

quiet : bool

Don’t output disgnostics and progress updates

Randomization options

random_seed : integer

seed random number generator

Feature options

hash : str

how to hash the features. Available options: strings, all

hash_seed : int

seed for hash function

ignore : str

ignore namespaces beginning with character <arg>

ignore_linear : str

ignore namespaces beginning with character <arg> for linear terms only

keep : str

keep namespaces beginning with character <arg>

redefine : str

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,b : integer

number of bits in the feature table

noconstant : bool

Don’t add a constant feature

constant,C : float

Set initial value of constant

ngram : str

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skips : str

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limit : str

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affix : str

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spelling : str

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionary : str

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_path : str

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactions : str

Create feature interactions of any level between namespaces.

permutations : bool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactions : bool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,q : str

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubic : str

Create and use cubic features

Example options

testonly,t : bool

Ignore label information and just test

holdout_off : bool

no holdout data in multiple passes

holdout_period : int

holdout period for test only

holdout_after : int

holdout after n training examples

early_terminate : int

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passes : int

Number of Training Passes

initial_pass_length : int

initial number of examples per pass

examples : int

number of examples to parse

min_prediction : float

Smallest prediction to output

max_prediction : float

Largest prediction to output

sort_features : bool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_function : str

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_tau : float

Parameter tau associated with Quantile loss. Defaults to 0.5

l1 : float

l_1 lambda (L1 regularization)

l2 : float

l_2 lambda (L2 regularization)

no_bias_regularization : bool

no bias in regularization

named_labels : str

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str

Final regressor

readable_model : str

Output human-readable final regressor with numeric features

invert_hash : str

Output human-readable final regressor with feature names. Computationally expensive.

save_resume : bool

save extra state so learning can be resumed later with new data

preserve_performance_counters : bool

reset performance counters when warmstarting

output_feature_regularizer_binary : str

Per feature regularization output file

output_feature_regularizer_text : str

Per feature regularization output file, in text

Multiclass options
oaa : integer

Use one-against-all multiclass learning with labels

oaa_subsample : int

subsample this number of negative examples when learning

ect : integer

Use error correcting tournament multiclass learning

csoaa : integer

Use cost sensitive one-against-all multiclass learning

wap : integer

Use weighted all pairs multiclass learning

probabilities : float

predict probabilities of all classes

Neural Network options

nn : integer

Use a sigmoidal feed-forward neural network with N hidden units

inpass : bool

Train or test sigmoidal feed-forward network with input pass-through

multitask : bool

Share hidden layer across all reduced tasks

dropout : bool

Train or test sigmoidal feed-forward network using dropout

meanfield : bool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool

use conjugate gradient based optimization

bgfs : bool

use bfgs updates

hessian_on : bool

use second derivative in line search

mem : int

memory in bfgs

termination : float

termination threshold

Latent Dirichlet Allocation options

lda : int

Run lda with <int> topics

lda_alpha : float

Prior on sparsity of per-document topic weights

lda_rho : float

Prior on sparsity of topic distributions

lda_D : int

Number of documents

lda_epsilon : float

Loop convergence threshold

minibatch : int

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool

Streaming Stochastic Variance Reduced Gradient

stage_size : int

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool

Run Follow the Proximal Regularized Leader

coin : bool

Coin betting optimizer

pistol : bool

PiSTOL: Parameter free STOchastic Learning

ftrl_alpha : float

Alpha parameter for FTRL optimization

ftrl_beta : float

Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool

kernel svm

kernel : str

type of kernel (rbf or linear (default))

bandwidth : int

bandwidth of rbf kernel

degree : int

degree of poly kernel

Gradient Descent options

sgd : bool

use regular stochastic gradient descent update

adaptive : bool

use adaptive, individual learning rates

adax : bool

use adaptive learning rates with x^2 instead of g^2x^2

invariant : bool

use save/importance aware updates

normalized : bool

use per feature normalized updates

Scorer options

link : str

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool

use stagewise polynomial feature learning

sched_exponent : int

exponent controlling quantity of included features

batch_sz : int

multiplier on batch size before including more features

batch_sz_no_doubling : bool

batch_sz does not double

Low Rank Quadratics options:

lrq : bool

use low rank quadratic features

lrqdropout : bool

use dropout training for low rank quadratic features

lrqfa : bool

use low rank quadratic features with field aware weights

Input options

data,d : str

path to data file for fitting external to sklearn

cache,c : str

use a cache. default is <data>.cache

cache_file : str

path to cache file to use

json : bool

enable JSON parsing

kill_cache, k : bool

do not reuse existing cache file, create a new one always

Returns:
self : BaseEstimator
classes_ = array([-1., 1.])
coef_ = None
decision_function(self, X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters:
X : array_like or sparse matrix, shape (n_samples, n_features)

Samples.

Returns:
array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

fit(self, X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

Parameters:
X : {array-like, sparse matrix} of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : array-like of shape (n_samples,)

Target vector relative to X.

sample_weight : array-like of shape (n_samples,) default=None

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:
self

Fitted estimator.

predict(self, X)

Predict class labels for samples in X.

Parameters:
X : array_like or sparse matrix, shape (n_samples, n_features)

Samples.

Returns:
C : array, shape [n_samples]

Predicted class label per sample.

predict_proba(self, X)

Predict probabilities for samples

Parameters:
X : {array-like, sparse matrix}, shape = (n_samples, n_features)

Samples.

Returns:
T : array-like of shape (n_samples, n_classes)

Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

class vowpalwabbit.sklearn_vw.VWMultiClassifier(probabilities=True, **kwargs)

Bases: vowpalwabbit.sklearn_vw.VWClassifier

Vowpal Wabbit MultiClassifier model Note - We are assuming the VW.predict returns probabilities, setting probabilities=False will break this assumption

Attributes:
classes_ : np.array

class labels

estimator_: dict

type of estimator to use [csoaa, ect, oaa, wap] and number of classes

Methods

decision_function(self, X) Predict confidence scores for samples.
densify(self) Convert coefficient matrix to dense array format.
fit(self[, X, y, sample_weight]) Fit the model according to the given training data.
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict class labels for samples in X.
predict_log_proba(self, X) Log of probability estimates.
predict_proba(self, X) Predict probabilities for each class.
save(self, filename) Save model to file
score(self, X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
sparsify(self) Convert coefficient matrix to sparse format.
__init__(self, probabilities=True, **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:
Estimator options
convert_to_vw : bool

flag to convert X input to vw format

convert_labels : bool

Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int

size of example ring

strict_parse : bool

throw on malformed examples

Update options

learning_rate,l : float

Set learning rate

power_t : float

t power value

decay_learning_rate : float

Set Decay factor for learning_rate between passes

initial_t : float

initial t value

feature_mask : str

Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str

Initial regressor(s)

initial_weight : float

Set all weights to an initial value of arg.

random_weights : bool

make initial weights random

normal_weights : bool

make initial weights normal

truncated_normal_weights : bool

make initial weights truncated normal

sparse_weights : float

Use a sparse datastructure for weights

input_feature_regularizer : str

Per feature regularization input file

Diagnostic options

quiet : bool

Don’t output disgnostics and progress updates

Randomization options

random_seed : integer

seed random number generator

Feature options

hash : str

how to hash the features. Available options: strings, all

hash_seed : int

seed for hash function

ignore : str

ignore namespaces beginning with character <arg>

ignore_linear : str

ignore namespaces beginning with character <arg> for linear terms only

keep : str

keep namespaces beginning with character <arg>

redefine : str

Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

bit_precision,b : integer

number of bits in the feature table

noconstant : bool

Don’t add a constant feature

constant,C : float

Set initial value of constant

ngram : str

Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

skips : str

Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

feature_limit : str

limit to N features. To apply to a single namespace ‘foo’, arg should be fN

affix : str

generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

spelling : str

compute spelling features for a give namespace (use ‘_’ for default namespace)

dictionary : str

read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

dictionary_path : str

look in this directory for dictionaries; defaults to current directory or env{PATH}

interactions : str

Create feature interactions of any level between namespaces.

permutations : bool

Use permutations instead of combinations for feature interactions of same namespace.

leave_duplicate_interactions : bool

Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

quadratic,q : str

Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

cubic : str

Create and use cubic features

Example options

testonly,t : bool

Ignore label information and just test

holdout_off : bool

no holdout data in multiple passes

holdout_period : int

holdout period for test only

holdout_after : int

holdout after n training examples

early_terminate : int

Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

passes : int

Number of Training Passes

initial_pass_length : int

initial number of examples per pass

examples : int

number of examples to parse

min_prediction : float

Smallest prediction to output

max_prediction : float

Largest prediction to output

sort_features : bool

turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

loss_function : str

default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

quantile_tau : float

Parameter tau associated with Quantile loss. Defaults to 0.5

l1 : float

l_1 lambda (L1 regularization)

l2 : float

l_2 lambda (L2 regularization)

no_bias_regularization : bool

no bias in regularization

named_labels : str

use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str

Final regressor

readable_model : str

Output human-readable final regressor with numeric features

invert_hash : str

Output human-readable final regressor with feature names. Computationally expensive.

save_resume : bool

save extra state so learning can be resumed later with new data

preserve_performance_counters : bool

reset performance counters when warmstarting

output_feature_regularizer_binary : str

Per feature regularization output file

output_feature_regularizer_text : str

Per feature regularization output file, in text

Multiclass options
oaa : integer

Use one-against-all multiclass learning with labels

oaa_subsample : int

subsample this number of negative examples when learning

ect : integer

Use error correcting tournament multiclass learning

csoaa : integer

Use cost sensitive one-against-all multiclass learning

wap : integer

Use weighted all pairs multiclass learning

probabilities : float

predict probabilities of all classes

Neural Network options

nn : integer

Use a sigmoidal feed-forward neural network with N hidden units

inpass : bool

Train or test sigmoidal feed-forward network with input pass-through

multitask : bool

Share hidden layer across all reduced tasks

dropout : bool

Train or test sigmoidal feed-forward network using dropout

meanfield : bool

Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool

use conjugate gradient based optimization

bgfs : bool

use bfgs updates

hessian_on : bool

use second derivative in line search

mem : int

memory in bfgs

termination : float

termination threshold

Latent Dirichlet Allocation options

lda : int

Run lda with <int> topics

lda_alpha : float

Prior on sparsity of per-document topic weights

lda_rho : float

Prior on sparsity of topic distributions

lda_D : int

Number of documents

lda_epsilon : float

Loop convergence threshold

minibatch : int

Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool

Streaming Stochastic Variance Reduced Gradient

stage_size : int

Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool

Run Follow the Proximal Regularized Leader

coin : bool

Coin betting optimizer

pistol : bool

PiSTOL: Parameter free STOchastic Learning

ftrl_alpha : float

Alpha parameter for FTRL optimization

ftrl_beta : float

Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool

kernel svm

kernel : str

type of kernel (rbf or linear (default))

bandwidth : int

bandwidth of rbf kernel

degree : int

degree of poly kernel

Gradient Descent options

sgd : bool

use regular stochastic gradient descent update

adaptive : bool

use adaptive, individual learning rates

adax : bool

use adaptive learning rates with x^2 instead of g^2x^2

invariant : bool

use save/importance aware updates

normalized : bool

use per feature normalized updates

Scorer options

link : str

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool

use stagewise polynomial feature learning

sched_exponent : int

exponent controlling quantity of included features

batch_sz : int

multiplier on batch size before including more features

batch_sz_no_doubling : bool

batch_sz does not double

Low Rank Quadratics options:

lrq : bool

use low rank quadratic features

lrqdropout : bool

use dropout training for low rank quadratic features

lrqfa : bool

use low rank quadratic features with field aware weights

Input options

data,d : str

path to data file for fitting external to sklearn

cache,c : str

use a cache. default is <data>.cache

cache_file : str

path to cache file to use

json : bool

enable JSON parsing

kill_cache, k : bool

do not reuse existing cache file, create a new one always

Returns:
self : BaseEstimator
classes_ = None
decision_function(self, X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters:
X : array_like or sparse matrix, shape (n_samples, n_features)

Samples.

Returns:
array, shape=(n_samples, n_classes)

Confidence scores per (sample, class) combination.

estimator_ = None
fit(self, X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

Parameters:
X : {array-like, sparse matrix} of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : array-like of shape (n_samples,)

Target vector relative to X.

sample_weight : array-like of shape (n_samples,) default=None

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:
self

Fitted estimator.

predict_proba(self, X)

Predict probabilities for each class.

Parameters:
X : {array-like, sparse matrix}, shape = (n_samples, n_features)

Samples.

Returns:
array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Examples

>>> import numpy as np
>>> X = np.array([ [10, 10], [8, 10], [-5, 5.5], [-5.4, 5.5], [-20, -20],  [-15, -20] ])
>>> y = np.array([1, 1, 2, 2, 3, 3])
>>> from vowpalwabbit.sklearn_vw import VWMultiClassifier
>>> model = VWMultiClassifier(oaa=3, loss_function='logistic')
>>> model.fit(X, y)
>>> model.predict_proba(X)
class vowpalwabbit.sklearn_vw.VWRegressor(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

Bases: vowpalwabbit.sklearn_vw.VW, sklearn.base.RegressorMixin

Vowpal Wabbit Regressor model

Attributes:
vw_

Methods

fit(self[, X, y, sample_weight]) Fit the model according to the given training data
get_coefs(self) Returns coefficient weights as ordered sparse matrix
get_intercept(self) Returns intercept weight for model
get_params(self[, deep]) This returns the full set of vw and estimator parameters currently in use
get_vw(self) Get the vw instance
load(self, filename) Load model from file
predict(self, X) Predict with Vowpal Wabbit model
save(self, filename) Save model to file
score(self, X, y[, sample_weight]) Returns the coefficient of determination R^2 of the prediction.
set_coefs(self, coefs) Sets coefficients weights from ordered sparse matrix
set_params(self, \*\*kwargs) This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
vowpalwabbit.sklearn_vw.tovw(x, y=None, sample_weight=None, convert_labels=False)

Convert array or sparse matrix to Vowpal Wabbit format

Parameters:
x : {array-like, sparse matrix}, shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : {array-like}, shape (n_samples,), optional

Target vector relative to X.

sample_weight : {array-like}, shape (n_samples,), optional

sample weight vector relative to X.

convert_labels : {bool} convert labels of the form [0,1] to [-1,1]
Returns:
out : {array-like}, shape (n_samples, 1)

Training vectors in VW string format

Examples

>>> import pandas as pd
>>> from sklearn.feature_extraction.text import HashingVectorizer
>>> from vowpalwabbit.sklearn_vw import tovw
>>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog')
>>> y = pd.Series([-1, 1, -1, -1], name='label')
>>> hv = HashingVectorizer()
>>> hashed = hv.fit_transform(X)
>>> tovw(x=hashed, y=y)