vowpalwabbit.sklearn¶

Utilities to support integration of Vowpal Wabbit and scikit-learn

class vowpalwabbit.sklearn_vw.LinearClassifierMixin¶

Bases: sklearn.linear_model.logistic.LogisticRegression

Methods

`decision_function`(self, X)	Predict confidence scores for samples.
`densify`(self)	Convert coefficient matrix to dense array format.
`fit`(self, X, y[, sample_weight])	Fit the model according to the given training data.
`get_params`(self[, deep])	Get parameters for this estimator.
`predict`(self, X)	Predict class labels for samples in X.
`predict_log_proba`(self, X)	Log of probability estimates.
`predict_proba`(self, X)	Probability estimates.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, \\params)	Set the parameters of this estimator.
`sparsify`(self)	Convert coefficient matrix to sparse format.

__init__(self)¶: x.__init__(…) initializes x; see help(type(x)) for signature

class vowpalwabbit.sklearn_vw.VW(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)¶

Bases: sklearn.base.BaseEstimator

Vowpal Wabbit Scikit-learn Base Estimator wrapper

Attributes:	convert_to_vw : bool flag to convert X input to vw format convert_labels : bool Convert labels of the form [0,1] to [-1,1] vw_ : pyvw.vw vw instance

Methods

`fit`(self[, X, y, sample_weight])	Fit the model according to the given training data
`get_coefs`(self)	Returns coefficient weights as ordered sparse matrix
`get_intercept`(self)	Returns intercept weight for model
`get_params`(self[, deep])	This returns the full set of vw and estimator parameters currently in use
`get_vw`(self)	Get the vw instance
`load`(self, filename)	Load model from file
`predict`(self, X)	Predict with Vowpal Wabbit model
`save`(self, filename)	Save model to file
`set_coefs`(self, coefs)	Sets coefficients weights from ordered sparse matrix
`set_params`(self, \\kwargs)	This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently

__init__(self, convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)¶

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:

Estimator options

convert_to_vw : bool: flag to convert X input to vw format
convert_labels : bool: Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int: size of example ring
strict_parse : bool: throw on malformed examples

Update options

learning_rate,l : float: Set learning rate
power_t : float: t power value
decay_learning_rate : float: Set Decay factor for learning_rate between passes
initial_t : float: initial t value
feature_mask : str: Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str: Initial regressor(s)
initial_weight : float: Set all weights to an initial value of arg.
random_weights : bool: make initial weights random
normal_weights : bool: make initial weights normal
truncated_normal_weights : bool: make initial weights truncated normal
sparse_weights : float: Use a sparse datastructure for weights
input_feature_regularizer : str: Per feature regularization input file

Diagnostic options

quiet : bool: Don’t output disgnostics and progress updates

Randomization options

random_seed : integer: seed random number generator

Feature options

hash : str: how to hash the features. Available options: strings, all
hash_seed : int: seed for hash function
ignore : str: ignore namespaces beginning with character <arg>
ignore_linear : str: ignore namespaces beginning with character <arg> for linear terms only
keep : str: keep namespaces beginning with character <arg>
redefine : str: Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.
bit_precision,b : integer: number of bits in the feature table
noconstant : bool: Don’t add a constant feature
constant,C : float: Set initial value of constant
ngram : str: Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.
skips : str: Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.
feature_limit : str: limit to N features. To apply to a single namespace ‘foo’, arg should be fN
affix : str: generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
spelling : str: compute spelling features for a give namespace (use ‘_’ for default namespace)
dictionary : str: read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)
dictionary_path : str: look in this directory for dictionaries; defaults to current directory or env{PATH}
interactions : str: Create feature interactions of any level between namespaces.
permutations : bool: Use permutations instead of combinations for feature interactions of same namespace.
leave_duplicate_interactions : bool: Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.
quadratic,q : str: Create and use quadratic features, q:: corresponds to a wildcard for all printable characters
cubic : str: Create and use cubic features

Example options

testonly,t : bool: Ignore label information and just test
holdout_off : bool: no holdout data in multiple passes
holdout_period : int: holdout period for test only
holdout_after : int: holdout after n training examples
early_terminate : int: Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination
passes : int: Number of Training Passes
initial_pass_length : int: initial number of examples per pass
examples : int: number of examples to parse
min_prediction : float: Smallest prediction to output
max_prediction : float: Largest prediction to output
sort_features : bool: turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
loss_function : str: default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.
quantile_tau : float: Parameter tau associated with Quantile loss. Defaults to 0.5
l1 : float: l_1 lambda (L1 regularization)
l2 : float: l_2 lambda (L2 regularization)
no_bias_regularization : bool: no bias in regularization
named_labels : str: use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str: Final regressor
readable_model : str: Output human-readable final regressor with numeric features
invert_hash : str: Output human-readable final regressor with feature names. Computationally expensive.
save_resume : bool: save extra state so learning can be resumed later with new data
preserve_performance_counters : bool: reset performance counters when warmstarting
output_feature_regularizer_binary : str: Per feature regularization output file
output_feature_regularizer_text : str: Per feature regularization output file, in text

Multiclass options

oaa : integer: Use one-against-all multiclass learning with labels
oaa_subsample : int: subsample this number of negative examples when learning
ect : integer: Use error correcting tournament multiclass learning
csoaa : integer: Use cost sensitive one-against-all multiclass learning
wap : integer: Use weighted all pairs multiclass learning
probabilities : float: predict probabilities of all classes

Neural Network options

nn : integer: Use a sigmoidal feed-forward neural network with N hidden units
inpass : bool: Train or test sigmoidal feed-forward network with input pass-through
multitask : bool: Share hidden layer across all reduced tasks
dropout : bool: Train or test sigmoidal feed-forward network using dropout
meanfield : bool: Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool: use conjugate gradient based optimization
bgfs : bool: use bfgs updates
hessian_on : bool: use second derivative in line search
mem : int: memory in bfgs
termination : float: termination threshold

Latent Dirichlet Allocation options

lda : int: Run lda with <int> topics
lda_alpha : float: Prior on sparsity of per-document topic weights
lda_rho : float: Prior on sparsity of topic distributions
lda_D : int: Number of documents
lda_epsilon : float: Loop convergence threshold
minibatch : int: Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool: Streaming Stochastic Variance Reduced Gradient
stage_size : int: Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool: Run Follow the Proximal Regularized Leader
coin : bool: Coin betting optimizer
pistol : bool: PiSTOL: Parameter free STOchastic Learning
ftrl_alpha : float: Alpha parameter for FTRL optimization
ftrl_beta : float: Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool: kernel svm
kernel : str: type of kernel (rbf or linear (default))
bandwidth : int: bandwidth of rbf kernel
degree : int: degree of poly kernel

Gradient Descent options

sgd : bool: use regular stochastic gradient descent update
adaptive : bool: use adaptive, individual learning rates
adax : bool: use adaptive learning rates with x^2 instead of g^2x^2
invariant : bool: use save/importance aware updates
normalized : bool: use per feature normalized updates

Scorer options

link : str: Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool: use stagewise polynomial feature learning
sched_exponent : int: exponent controlling quantity of included features
batch_sz : int: multiplier on batch size before including more features
batch_sz_no_doubling : bool: batch_sz does not double

Low Rank Quadratics options:

lrq : bool: use low rank quadratic features
lrqdropout : bool: use dropout training for low rank quadratic features
lrqfa : bool: use low rank quadratic features with field aware weights

Input options

data,d : str: path to data file for fitting external to sklearn
cache,c : str: use a cache. default is <data>.cache
cache_file : str: path to cache file to use
json : bool: enable JSON parsing
kill_cache, k : bool: do not reuse existing cache file, create a new one always

Returns:

self : BaseEstimator

convert_labels = True¶

convert_to_vw = True¶

fit(self, X=None, y=None, sample_weight=None)¶

Fit the model according to the given training data

TODO: for first pass create and store example objects.: for N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)

Parameters:

X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or: Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
y : array-like, shape (n_samples,), optional if not convert_to_vw: Target vector relative to X.
sample_weight : array-like, shape (n_samples,): sample weight vector relative to X.

Returns:

self : BaseEstimator: So pipeline can call transform() after fit

get_coefs(self)¶

Returns coefficient weights as ordered sparse matrix

Returns:	sparse matrix : coefficient weights for model

get_intercept(self)¶

Returns intercept weight for model

Returns:	intercept value : integer, 0 if no constant

get_params(self, deep=True)¶: This returns the full set of vw and estimator parameters currently in use

get_vw(self)¶

Get the vw instance

Returns:	vw : pyvw.vw instance

load(self, filename)¶: Load model from file

predict(self, X)¶

Predict with Vowpal Wabbit model

Parameters:	X : {array-like, sparse matrix}, shape (n_samples, n_features or 1) Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
Returns:	y : array-like, shape (n_samples, 1 or n_classes) Output vector relative to X.

save(self, filename)¶: Save model to file

set_coefs(self, coefs)¶

Sets coefficients weights from ordered sparse matrix

Parameters:	coefs : sparse matrix coefficient weights for model

set_params(self, **kwargs)¶: This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently

vw_ = None¶

class vowpalwabbit.sklearn_vw.VWClassifier(loss_function='logistic', **kwargs)¶

Bases: vowpalwabbit.sklearn_vw.VW, vowpalwabbit.sklearn_vw.LinearClassifierMixin

Vowpal Wabbit Classifier model for binary classification Use VWMultiClassifier for multiclass classification Note - We are assuming the VW.predict returns logits, applying link=logistic will break this assumption

Attributes:	coef_ : scipy.sparse_matrix Empty sparse matrix used the check if model has been fit classes_ : np.array Binary class labels

Methods

`decision_function`(self, X)	Predict confidence scores for samples.
`densify`(self)	Convert coefficient matrix to dense array format.
`fit`(self[, X, y, sample_weight])	Fit the model according to the given training data.
`get_coefs`(self)	Returns coefficient weights as ordered sparse matrix
`get_intercept`(self)	Returns intercept weight for model
`get_params`(self[, deep])	This returns the full set of vw and estimator parameters currently in use
`get_vw`(self)	Get the vw instance
`load`(self, filename)	Load model from file
`predict`(self, X)	Predict class labels for samples in X.
`predict_log_proba`(self, X)	Log of probability estimates.
`predict_proba`(self, X)	Predict probabilities for samples
`save`(self, filename)	Save model to file
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_coefs`(self, coefs)	Sets coefficients weights from ordered sparse matrix
`set_params`(self, \\kwargs)	This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
`sparsify`(self)	Convert coefficient matrix to sparse format.

__init__(self, loss_function='logistic', **kwargs)¶

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:

Estimator options

convert_to_vw : bool: flag to convert X input to vw format
convert_labels : bool: Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int: size of example ring
strict_parse : bool: throw on malformed examples

Update options

learning_rate,l : float: Set learning rate
power_t : float: t power value
decay_learning_rate : float: Set Decay factor for learning_rate between passes
initial_t : float: initial t value
feature_mask : str: Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str: Initial regressor(s)
initial_weight : float: Set all weights to an initial value of arg.
random_weights : bool: make initial weights random
normal_weights : bool: make initial weights normal
truncated_normal_weights : bool: make initial weights truncated normal
sparse_weights : float: Use a sparse datastructure for weights
input_feature_regularizer : str: Per feature regularization input file

Diagnostic options

quiet : bool: Don’t output disgnostics and progress updates

Randomization options

random_seed : integer: seed random number generator

Feature options

hash : str: how to hash the features. Available options: strings, all
hash_seed : int: seed for hash function
ignore : str: ignore namespaces beginning with character <arg>
ignore_linear : str: ignore namespaces beginning with character <arg> for linear terms only
keep : str: keep namespaces beginning with character <arg>
redefine : str: Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.
bit_precision,b : integer: number of bits in the feature table
noconstant : bool: Don’t add a constant feature
constant,C : float: Set initial value of constant
ngram : str: Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.
skips : str: Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.
feature_limit : str: limit to N features. To apply to a single namespace ‘foo’, arg should be fN
affix : str: generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
spelling : str: compute spelling features for a give namespace (use ‘_’ for default namespace)
dictionary : str: read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)
dictionary_path : str: look in this directory for dictionaries; defaults to current directory or env{PATH}
interactions : str: Create feature interactions of any level between namespaces.
permutations : bool: Use permutations instead of combinations for feature interactions of same namespace.
leave_duplicate_interactions : bool: Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.
quadratic,q : str: Create and use quadratic features, q:: corresponds to a wildcard for all printable characters
cubic : str: Create and use cubic features

Example options

testonly,t : bool: Ignore label information and just test
holdout_off : bool: no holdout data in multiple passes
holdout_period : int: holdout period for test only
holdout_after : int: holdout after n training examples
early_terminate : int: Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination
passes : int: Number of Training Passes
initial_pass_length : int: initial number of examples per pass
examples : int: number of examples to parse
min_prediction : float: Smallest prediction to output
max_prediction : float: Largest prediction to output
sort_features : bool: turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
loss_function : str: default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.
quantile_tau : float: Parameter tau associated with Quantile loss. Defaults to 0.5
l1 : float: l_1 lambda (L1 regularization)
l2 : float: l_2 lambda (L2 regularization)
no_bias_regularization : bool: no bias in regularization
named_labels : str: use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str: Final regressor
readable_model : str: Output human-readable final regressor with numeric features
invert_hash : str: Output human-readable final regressor with feature names. Computationally expensive.
save_resume : bool: save extra state so learning can be resumed later with new data
preserve_performance_counters : bool: reset performance counters when warmstarting
output_feature_regularizer_binary : str: Per feature regularization output file
output_feature_regularizer_text : str: Per feature regularization output file, in text

Multiclass options

oaa : integer: Use one-against-all multiclass learning with labels
oaa_subsample : int: subsample this number of negative examples when learning
ect : integer: Use error correcting tournament multiclass learning
csoaa : integer: Use cost sensitive one-against-all multiclass learning
wap : integer: Use weighted all pairs multiclass learning
probabilities : float: predict probabilities of all classes

Neural Network options

nn : integer: Use a sigmoidal feed-forward neural network with N hidden units
inpass : bool: Train or test sigmoidal feed-forward network with input pass-through
multitask : bool: Share hidden layer across all reduced tasks
dropout : bool: Train or test sigmoidal feed-forward network using dropout
meanfield : bool: Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool: use conjugate gradient based optimization
bgfs : bool: use bfgs updates
hessian_on : bool: use second derivative in line search
mem : int: memory in bfgs
termination : float: termination threshold

Latent Dirichlet Allocation options

lda : int: Run lda with <int> topics
lda_alpha : float: Prior on sparsity of per-document topic weights
lda_rho : float: Prior on sparsity of topic distributions
lda_D : int: Number of documents
lda_epsilon : float: Loop convergence threshold
minibatch : int: Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool: Streaming Stochastic Variance Reduced Gradient
stage_size : int: Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool: Run Follow the Proximal Regularized Leader
coin : bool: Coin betting optimizer
pistol : bool: PiSTOL: Parameter free STOchastic Learning
ftrl_alpha : float: Alpha parameter for FTRL optimization
ftrl_beta : float: Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool: kernel svm
kernel : str: type of kernel (rbf or linear (default))
bandwidth : int: bandwidth of rbf kernel
degree : int: degree of poly kernel

Gradient Descent options

sgd : bool: use regular stochastic gradient descent update
adaptive : bool: use adaptive, individual learning rates
adax : bool: use adaptive learning rates with x^2 instead of g^2x^2
invariant : bool: use save/importance aware updates
normalized : bool: use per feature normalized updates

Scorer options

link : str: Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool: use stagewise polynomial feature learning
sched_exponent : int: exponent controlling quantity of included features
batch_sz : int: multiplier on batch size before including more features
batch_sz_no_doubling : bool: batch_sz does not double

Low Rank Quadratics options:

lrq : bool: use low rank quadratic features
lrqdropout : bool: use dropout training for low rank quadratic features
lrqfa : bool: use low rank quadratic features with field aware weights

Input options

data,d : str: path to data file for fitting external to sklearn
cache,c : str: use a cache. default is <data>.cache
cache_file : str: path to cache file to use
json : bool: enable JSON parsing
kill_cache, k : bool: do not reuse existing cache file, create a new one always

Returns:

self : BaseEstimator

classes_ = array([-1., 1.])¶

coef_ = None¶

decision_function(self, X)¶

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters:	X : array_like or sparse matrix, shape (n_samples, n_features) Samples.
Returns:	array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

fit(self, X=None, y=None, sample_weight=None)¶

Fit the model according to the given training data.

Parameters:

X : {array-like, sparse matrix} of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like of shape (n_samples,): Target vector relative to X.
sample_weight : array-like of shape (n_samples,) default=None: Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:

self: Fitted estimator.

predict(self, X)¶

Predict class labels for samples in X.

Parameters:	X : array_like or sparse matrix, shape (n_samples, n_features) Samples.
Returns:	C : array, shape [n_samples] Predicted class label per sample.

predict_proba(self, X)¶

Predict probabilities for samples

Parameters:	X : {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.
Returns:	T : array-like of shape (n_samples, n_classes) Returns the probability of the sample for each class in the model, where classes are ordered as they are in `self.classes_`.

class vowpalwabbit.sklearn_vw.VWMultiClassifier(probabilities=True, **kwargs)¶

Bases: vowpalwabbit.sklearn_vw.VWClassifier

Vowpal Wabbit MultiClassifier model Note - We are assuming the VW.predict returns probabilities, setting probabilities=False will break this assumption

Attributes:	classes_ : np.array class labels estimator_: dict type of estimator to use [csoaa, ect, oaa, wap] and number of classes

Methods

`decision_function`(self, X)	Predict confidence scores for samples.
`densify`(self)	Convert coefficient matrix to dense array format.
`fit`(self[, X, y, sample_weight])	Fit the model according to the given training data.
`get_coefs`(self)	Returns coefficient weights as ordered sparse matrix
`get_intercept`(self)	Returns intercept weight for model
`get_params`(self[, deep])	This returns the full set of vw and estimator parameters currently in use
`get_vw`(self)	Get the vw instance
`load`(self, filename)	Load model from file
`predict`(self, X)	Predict class labels for samples in X.
`predict_log_proba`(self, X)	Log of probability estimates.
`predict_proba`(self, X)	Predict probabilities for each class.
`save`(self, filename)	Save model to file
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_coefs`(self, coefs)	Sets coefficients weights from ordered sparse matrix
`set_params`(self, \\kwargs)	This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
`sparsify`(self)	Convert coefficient matrix to sparse format.

__init__(self, probabilities=True, **kwargs)¶

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:

Estimator options

convert_to_vw : bool: flag to convert X input to vw format
convert_labels : bool: Convert labels of the form [0,1] to [-1,1]

VW options

ring_size : int: size of example ring
strict_parse : bool: throw on malformed examples

Update options

learning_rate,l : float: Set learning rate
power_t : float: t power value
decay_learning_rate : float: Set Decay factor for learning_rate between passes
initial_t : float: initial t value
feature_mask : str: Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options

initial_regressor,i : str: Initial regressor(s)
initial_weight : float: Set all weights to an initial value of arg.
random_weights : bool: make initial weights random
normal_weights : bool: make initial weights normal
truncated_normal_weights : bool: make initial weights truncated normal
sparse_weights : float: Use a sparse datastructure for weights
input_feature_regularizer : str: Per feature regularization input file

Diagnostic options

quiet : bool: Don’t output disgnostics and progress updates

Randomization options

random_seed : integer: seed random number generator

Feature options

hash : str: how to hash the features. Available options: strings, all
hash_seed : int: seed for hash function
ignore : str: ignore namespaces beginning with character <arg>
ignore_linear : str: ignore namespaces beginning with character <arg> for linear terms only
keep : str: keep namespaces beginning with character <arg>
redefine : str: Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.
bit_precision,b : integer: number of bits in the feature table
noconstant : bool: Don’t add a constant feature
constant,C : float: Set initial value of constant
ngram : str: Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.
skips : str: Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.
feature_limit : str: limit to N features. To apply to a single namespace ‘foo’, arg should be fN
affix : str: generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
spelling : str: compute spelling features for a give namespace (use ‘_’ for default namespace)
dictionary : str: read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)
dictionary_path : str: look in this directory for dictionaries; defaults to current directory or env{PATH}
interactions : str: Create feature interactions of any level between namespaces.
permutations : bool: Use permutations instead of combinations for feature interactions of same namespace.
leave_duplicate_interactions : bool: Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.
quadratic,q : str: Create and use quadratic features, q:: corresponds to a wildcard for all printable characters
cubic : str: Create and use cubic features

Example options

testonly,t : bool: Ignore label information and just test
holdout_off : bool: no holdout data in multiple passes
holdout_period : int: holdout period for test only
holdout_after : int: holdout after n training examples
early_terminate : int: Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination
passes : int: Number of Training Passes
initial_pass_length : int: initial number of examples per pass
examples : int: number of examples to parse
min_prediction : float: Smallest prediction to output
max_prediction : float: Largest prediction to output
sort_features : bool: turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
loss_function : str: default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.
quantile_tau : float: Parameter tau associated with Quantile loss. Defaults to 0.5
l1 : float: l_1 lambda (L1 regularization)
l2 : float: l_2 lambda (L2 regularization)
no_bias_regularization : bool: no bias in regularization
named_labels : str: use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

Output model

final_regressor,f : str: Final regressor
readable_model : str: Output human-readable final regressor with numeric features
invert_hash : str: Output human-readable final regressor with feature names. Computationally expensive.
save_resume : bool: save extra state so learning can be resumed later with new data
preserve_performance_counters : bool: reset performance counters when warmstarting
output_feature_regularizer_binary : str: Per feature regularization output file
output_feature_regularizer_text : str: Per feature regularization output file, in text

Multiclass options

oaa : integer: Use one-against-all multiclass learning with labels
oaa_subsample : int: subsample this number of negative examples when learning
ect : integer: Use error correcting tournament multiclass learning
csoaa : integer: Use cost sensitive one-against-all multiclass learning
wap : integer: Use weighted all pairs multiclass learning
probabilities : float: predict probabilities of all classes

Neural Network options

nn : integer: Use a sigmoidal feed-forward neural network with N hidden units
inpass : bool: Train or test sigmoidal feed-forward network with input pass-through
multitask : bool: Share hidden layer across all reduced tasks
dropout : bool: Train or test sigmoidal feed-forward network using dropout
meanfield : bool: Train or test sigmoidal feed-forward network using mean field

LBFGS and Conjugate Gradient options

conjugate_gradient : bool: use conjugate gradient based optimization
bgfs : bool: use bfgs updates
hessian_on : bool: use second derivative in line search
mem : int: memory in bfgs
termination : float: termination threshold

Latent Dirichlet Allocation options

lda : int: Run lda with <int> topics
lda_alpha : float: Prior on sparsity of per-document topic weights
lda_rho : float: Prior on sparsity of topic distributions
lda_D : int: Number of documents
lda_epsilon : float: Loop convergence threshold
minibatch : int: Minibatch size for LDA

Stochastic Variance Reduced Gradient options

svrg : bool: Streaming Stochastic Variance Reduced Gradient
stage_size : int: Number of passes per SVRG stage

Follow the Regularized Leader options

ftrl : bool: Run Follow the Proximal Regularized Leader
coin : bool: Coin betting optimizer
pistol : bool: PiSTOL: Parameter free STOchastic Learning
ftrl_alpha : float: Alpha parameter for FTRL optimization
ftrl_beta : float: Beta parameters for FTRL optimization

Kernel SVM options

ksvm : bool: kernel svm
kernel : str: type of kernel (rbf or linear (default))
bandwidth : int: bandwidth of rbf kernel
degree : int: degree of poly kernel

Gradient Descent options

sgd : bool: use regular stochastic gradient descent update
adaptive : bool: use adaptive, individual learning rates
adax : bool: use adaptive learning rates with x^2 instead of g^2x^2
invariant : bool: use save/importance aware updates
normalized : bool: use per feature normalized updates

Scorer options

link : str: Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

stage_poly : bool: use stagewise polynomial feature learning
sched_exponent : int: exponent controlling quantity of included features
batch_sz : int: multiplier on batch size before including more features
batch_sz_no_doubling : bool: batch_sz does not double

Low Rank Quadratics options:

lrq : bool: use low rank quadratic features
lrqdropout : bool: use dropout training for low rank quadratic features
lrqfa : bool: use low rank quadratic features with field aware weights

Input options

data,d : str: path to data file for fitting external to sklearn
cache,c : str: use a cache. default is <data>.cache
cache_file : str: path to cache file to use
json : bool: enable JSON parsing
kill_cache, k : bool: do not reuse existing cache file, create a new one always

Returns:

self : BaseEstimator

classes_ = None¶

decision_function(self, X)¶

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters:	X : array_like or sparse matrix, shape (n_samples, n_features) Samples.
Returns:	array, shape=(n_samples, n_classes) Confidence scores per (sample, class) combination.

estimator_ = None¶

fit(self, X=None, y=None, sample_weight=None)¶

Fit the model according to the given training data.

Parameters:

X : {array-like, sparse matrix} of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like of shape (n_samples,): Target vector relative to X.
sample_weight : array-like of shape (n_samples,) default=None: Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:

self: Fitted estimator.

predict_proba(self, X)¶

Predict probabilities for each class.

Parameters:	X : {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.
Returns:	array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Examples

>>> import numpy as np
>>> X = np.array([ [10, 10], [8, 10], [-5, 5.5], [-5.4, 5.5], [-20, -20],  [-15, -20] ])
>>> y = np.array([1, 1, 2, 2, 3, 3])
>>> from vowpalwabbit.sklearn_vw import VWMultiClassifier
>>> model = VWMultiClassifier(oaa=3, loss_function='logistic')
>>> model.fit(X, y)
>>> model.predict_proba(X)

class vowpalwabbit.sklearn_vw.VWRegressor(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)¶

Bases: vowpalwabbit.sklearn_vw.VW, sklearn.base.RegressorMixin

Vowpal Wabbit Regressor model

Attributes:	vw_

Methods

`fit`(self[, X, y, sample_weight])	Fit the model according to the given training data
`get_coefs`(self)	Returns coefficient weights as ordered sparse matrix
`get_intercept`(self)	Returns intercept weight for model
`get_params`(self[, deep])	This returns the full set of vw and estimator parameters currently in use
`get_vw`(self)	Get the vw instance
`load`(self, filename)	Load model from file
`predict`(self, X)	Predict with Vowpal Wabbit model
`save`(self, filename)	Save model to file
`score`(self, X, y[, sample_weight])	Returns the coefficient of determination R^2 of the prediction.
`set_coefs`(self, coefs)	Sets coefficients weights from ordered sparse matrix
`set_params`(self, \\kwargs)	This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently

vowpalwabbit.sklearn_vw.tovw(x, y=None, sample_weight=None, convert_labels=False)¶

Convert array or sparse matrix to Vowpal Wabbit format

Parameters:

x : {array-like, sparse matrix}, shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
y : {array-like}, shape (n_samples,), optional: Target vector relative to X.
sample_weight : {array-like}, shape (n_samples,), optional: sample weight vector relative to X.
convert_labels : {bool} convert labels of the form [0,1] to [-1,1]

Returns:

out : {array-like}, shape (n_samples, 1): Training vectors in VW string format

Examples

>>> import pandas as pd
>>> from sklearn.feature_extraction.text import HashingVectorizer
>>> from vowpalwabbit.sklearn_vw import tovw
>>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog')
>>> y = pd.Series([-1, 1, -1, -1], name='label')
>>> hv = HashingVectorizer()
>>> hashed = hv.fit_transform(X)
>>> tovw(x=hashed, y=y)