vowpalwabbit.sklearn¶
Utilities to support integration of Vowpal Wabbit and scikit-learn
-
class
vowpalwabbit.sklearn_vw.
LinearClassifierMixin
¶ Bases:
sklearn.linear_model.logistic.LogisticRegression
Methods
decision_function
(self, X)Predict confidence scores for samples. densify
(self)Convert coefficient matrix to dense array format. fit
(self, X, y[, sample_weight])Fit the model according to the given training data. get_params
(self[, deep])Get parameters for this estimator. predict
(self, X)Predict class labels for samples in X. predict_log_proba
(self, X)Log of probability estimates. predict_proba
(self, X)Probability estimates. score
(self, X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. set_params
(self, \*\*params)Set the parameters of this estimator. sparsify
(self)Convert coefficient matrix to sparse format. -
__init__
(self)¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
-
class
vowpalwabbit.sklearn_vw.
VW
(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)¶ Bases:
sklearn.base.BaseEstimator
Vowpal Wabbit Scikit-learn Base Estimator wrapper
Attributes: - convert_to_vw : bool
flag to convert X input to vw format
- convert_labels : bool
Convert labels of the form [0,1] to [-1,1]
- vw_ : pyvw.vw
vw instance
Methods
fit
(self[, X, y, sample_weight])Fit the model according to the given training data get_coefs
(self)Returns coefficient weights as ordered sparse matrix get_intercept
(self)Returns intercept weight for model get_params
(self[, deep])This returns the full set of vw and estimator parameters currently in use get_vw
(self)Get the vw instance load
(self, filename)Load model from file predict
(self, X)Predict with Vowpal Wabbit model save
(self, filename)Save model to file set_coefs
(self, coefs)Sets coefficients weights from ordered sparse matrix set_params
(self, \*\*kwargs)This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently -
__init__
(self, convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)¶ VW model constructor, exposing all supported parameters to keep sklearn happy
Parameters: - Estimator options
- convert_to_vw : bool
flag to convert X input to vw format
- convert_labels : bool
Convert labels of the form [0,1] to [-1,1]
VW options
- ring_size : int
size of example ring
- strict_parse : bool
throw on malformed examples
Update options
- learning_rate,l : float
Set learning rate
- power_t : float
t power value
- decay_learning_rate : float
Set Decay factor for learning_rate between passes
- initial_t : float
initial t value
- feature_mask : str
Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.
Weight options
- initial_regressor,i : str
Initial regressor(s)
- initial_weight : float
Set all weights to an initial value of arg.
- random_weights : bool
make initial weights random
- normal_weights : bool
make initial weights normal
- truncated_normal_weights : bool
make initial weights truncated normal
- sparse_weights : float
Use a sparse datastructure for weights
- input_feature_regularizer : str
Per feature regularization input file
Diagnostic options
- quiet : bool
Don’t output disgnostics and progress updates
Randomization options
- random_seed : integer
seed random number generator
Feature options
- hash : str
how to hash the features. Available options: strings, all
- hash_seed : int
seed for hash function
- ignore : str
ignore namespaces beginning with character <arg>
- ignore_linear : str
ignore namespaces beginning with character <arg> for linear terms only
- keep : str
keep namespaces beginning with character <arg>
- redefine : str
Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.
- bit_precision,b : integer
number of bits in the feature table
- noconstant : bool
Don’t add a constant feature
- constant,C : float
Set initial value of constant
- ngram : str
Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.
- skips : str
Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.
- feature_limit : str
limit to N features. To apply to a single namespace ‘foo’, arg should be fN
- affix : str
generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
- spelling : str
compute spelling features for a give namespace (use ‘_’ for default namespace)
- dictionary : str
read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)
- dictionary_path : str
look in this directory for dictionaries; defaults to current directory or env{PATH}
- interactions : str
Create feature interactions of any level between namespaces.
- permutations : bool
Use permutations instead of combinations for feature interactions of same namespace.
- leave_duplicate_interactions : bool
Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.
- quadratic,q : str
Create and use quadratic features, q:: corresponds to a wildcard for all printable characters
- cubic : str
Create and use cubic features
Example options
- testonly,t : bool
Ignore label information and just test
- holdout_off : bool
no holdout data in multiple passes
- holdout_period : int
holdout period for test only
- holdout_after : int
holdout after n training examples
- early_terminate : int
Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination
- passes : int
Number of Training Passes
- initial_pass_length : int
initial number of examples per pass
- examples : int
number of examples to parse
- min_prediction : float
Smallest prediction to output
- max_prediction : float
Largest prediction to output
- sort_features : bool
turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
- loss_function : str
default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.
- quantile_tau : float
Parameter tau associated with Quantile loss. Defaults to 0.5
- l1 : float
l_1 lambda (L1 regularization)
- l2 : float
l_2 lambda (L2 regularization)
- no_bias_regularization : bool
no bias in regularization
- named_labels : str
use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”
Output model
- final_regressor,f : str
Final regressor
- readable_model : str
Output human-readable final regressor with numeric features
- invert_hash : str
Output human-readable final regressor with feature names. Computationally expensive.
- save_resume : bool
save extra state so learning can be resumed later with new data
- preserve_performance_counters : bool
reset performance counters when warmstarting
- output_feature_regularizer_binary : str
Per feature regularization output file
- output_feature_regularizer_text : str
Per feature regularization output file, in text
- Multiclass options
- oaa : integer
Use one-against-all multiclass learning with labels
- oaa_subsample : int
subsample this number of negative examples when learning
- ect : integer
Use error correcting tournament multiclass learning
- csoaa : integer
Use cost sensitive one-against-all multiclass learning
- wap : integer
Use weighted all pairs multiclass learning
- probabilities : float
predict probabilities of all classes
Neural Network options
- nn : integer
Use a sigmoidal feed-forward neural network with N hidden units
- inpass : bool
Train or test sigmoidal feed-forward network with input pass-through
- multitask : bool
Share hidden layer across all reduced tasks
- dropout : bool
Train or test sigmoidal feed-forward network using dropout
- meanfield : bool
Train or test sigmoidal feed-forward network using mean field
LBFGS and Conjugate Gradient options
- conjugate_gradient : bool
use conjugate gradient based optimization
- bgfs : bool
use bfgs updates
- hessian_on : bool
use second derivative in line search
- mem : int
memory in bfgs
- termination : float
termination threshold
Latent Dirichlet Allocation options
- lda : int
Run lda with <int> topics
- lda_alpha : float
Prior on sparsity of per-document topic weights
- lda_rho : float
Prior on sparsity of topic distributions
- lda_D : int
Number of documents
- lda_epsilon : float
Loop convergence threshold
- minibatch : int
Minibatch size for LDA
Stochastic Variance Reduced Gradient options
- svrg : bool
Streaming Stochastic Variance Reduced Gradient
- stage_size : int
Number of passes per SVRG stage
Follow the Regularized Leader options
- ftrl : bool
Run Follow the Proximal Regularized Leader
- coin : bool
Coin betting optimizer
- pistol : bool
PiSTOL: Parameter free STOchastic Learning
- ftrl_alpha : float
Alpha parameter for FTRL optimization
- ftrl_beta : float
Beta parameters for FTRL optimization
Kernel SVM options
- ksvm : bool
kernel svm
- kernel : str
type of kernel (rbf or linear (default))
- bandwidth : int
bandwidth of rbf kernel
- degree : int
degree of poly kernel
Gradient Descent options
- sgd : bool
use regular stochastic gradient descent update
- adaptive : bool
use adaptive, individual learning rates
- adax : bool
use adaptive learning rates with x^2 instead of g^2x^2
- invariant : bool
use save/importance aware updates
- normalized : bool
use per feature normalized updates
Scorer options
- link : str
Specify the link function: identity, logistic, glf1 or poisson
Stagewise polynomial options:
- stage_poly : bool
use stagewise polynomial feature learning
- sched_exponent : int
exponent controlling quantity of included features
- batch_sz : int
multiplier on batch size before including more features
- batch_sz_no_doubling : bool
batch_sz does not double
Low Rank Quadratics options:
- lrq : bool
use low rank quadratic features
- lrqdropout : bool
use dropout training for low rank quadratic features
- lrqfa : bool
use low rank quadratic features with field aware weights
Input options
- data,d : str
path to data file for fitting external to sklearn
- cache,c : str
use a cache. default is <data>.cache
- cache_file : str
path to cache file to use
- json : bool
enable JSON parsing
- kill_cache, k : bool
do not reuse existing cache file, create a new one always
Returns: - self : BaseEstimator
-
convert_labels
= True¶
-
convert_to_vw
= True¶
-
fit
(self, X=None, y=None, sample_weight=None)¶ Fit the model according to the given training data
- TODO: for first pass create and store example objects.
- for N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)
Parameters: - X : {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or
Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
- y : array-like, shape (n_samples,), optional if not convert_to_vw
Target vector relative to X.
- sample_weight : array-like, shape (n_samples,)
sample weight vector relative to X.
Returns: - self : BaseEstimator
So pipeline can call transform() after fit
-
get_coefs
(self)¶ Returns coefficient weights as ordered sparse matrix
Returns: - sparse matrix : coefficient weights for model
-
get_intercept
(self)¶ Returns intercept weight for model
Returns: - intercept value : integer, 0 if no constant
-
get_params
(self, deep=True)¶ This returns the full set of vw and estimator parameters currently in use
-
get_vw
(self)¶ Get the vw instance
Returns: - vw : pyvw.vw instance
-
load
(self, filename)¶ Load model from file
-
predict
(self, X)¶ Predict with Vowpal Wabbit model
Parameters: - X : {array-like, sparse matrix}, shape (n_samples, n_features or 1)
Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels
Returns: - y : array-like, shape (n_samples, 1 or n_classes)
Output vector relative to X.
-
save
(self, filename)¶ Save model to file
-
set_coefs
(self, coefs)¶ Sets coefficients weights from ordered sparse matrix
Parameters: - coefs : sparse matrix
coefficient weights for model
-
set_params
(self, **kwargs)¶ This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
-
vw_
= None¶
-
class
vowpalwabbit.sklearn_vw.
VWClassifier
(loss_function='logistic', **kwargs)¶ Bases:
vowpalwabbit.sklearn_vw.VW
,vowpalwabbit.sklearn_vw.LinearClassifierMixin
Vowpal Wabbit Classifier model for binary classification Use VWMultiClassifier for multiclass classification Note - We are assuming the VW.predict returns logits, applying link=logistic will break this assumption
Attributes: - coef_ : scipy.sparse_matrix
Empty sparse matrix used the check if model has been fit
- classes_ : np.array
Binary class labels
Methods
decision_function
(self, X)Predict confidence scores for samples. densify
(self)Convert coefficient matrix to dense array format. fit
(self[, X, y, sample_weight])Fit the model according to the given training data. get_coefs
(self)Returns coefficient weights as ordered sparse matrix get_intercept
(self)Returns intercept weight for model get_params
(self[, deep])This returns the full set of vw and estimator parameters currently in use get_vw
(self)Get the vw instance load
(self, filename)Load model from file predict
(self, X)Predict class labels for samples in X. predict_log_proba
(self, X)Log of probability estimates. predict_proba
(self, X)Predict probabilities for samples save
(self, filename)Save model to file score
(self, X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. set_coefs
(self, coefs)Sets coefficients weights from ordered sparse matrix set_params
(self, \*\*kwargs)This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently sparsify
(self)Convert coefficient matrix to sparse format. -
__init__
(self, loss_function='logistic', **kwargs)¶ VW model constructor, exposing all supported parameters to keep sklearn happy
Parameters: - Estimator options
- convert_to_vw : bool
flag to convert X input to vw format
- convert_labels : bool
Convert labels of the form [0,1] to [-1,1]
VW options
- ring_size : int
size of example ring
- strict_parse : bool
throw on malformed examples
Update options
- learning_rate,l : float
Set learning rate
- power_t : float
t power value
- decay_learning_rate : float
Set Decay factor for learning_rate between passes
- initial_t : float
initial t value
- feature_mask : str
Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.
Weight options
- initial_regressor,i : str
Initial regressor(s)
- initial_weight : float
Set all weights to an initial value of arg.
- random_weights : bool
make initial weights random
- normal_weights : bool
make initial weights normal
- truncated_normal_weights : bool
make initial weights truncated normal
- sparse_weights : float
Use a sparse datastructure for weights
- input_feature_regularizer : str
Per feature regularization input file
Diagnostic options
- quiet : bool
Don’t output disgnostics and progress updates
Randomization options
- random_seed : integer
seed random number generator
Feature options
- hash : str
how to hash the features. Available options: strings, all
- hash_seed : int
seed for hash function
- ignore : str
ignore namespaces beginning with character <arg>
- ignore_linear : str
ignore namespaces beginning with character <arg> for linear terms only
- keep : str
keep namespaces beginning with character <arg>
- redefine : str
Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.
- bit_precision,b : integer
number of bits in the feature table
- noconstant : bool
Don’t add a constant feature
- constant,C : float
Set initial value of constant
- ngram : str
Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.
- skips : str
Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.
- feature_limit : str
limit to N features. To apply to a single namespace ‘foo’, arg should be fN
- affix : str
generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
- spelling : str
compute spelling features for a give namespace (use ‘_’ for default namespace)
- dictionary : str
read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)
- dictionary_path : str
look in this directory for dictionaries; defaults to current directory or env{PATH}
- interactions : str
Create feature interactions of any level between namespaces.
- permutations : bool
Use permutations instead of combinations for feature interactions of same namespace.
- leave_duplicate_interactions : bool
Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.
- quadratic,q : str
Create and use quadratic features, q:: corresponds to a wildcard for all printable characters
- cubic : str
Create and use cubic features
Example options
- testonly,t : bool
Ignore label information and just test
- holdout_off : bool
no holdout data in multiple passes
- holdout_period : int
holdout period for test only
- holdout_after : int
holdout after n training examples
- early_terminate : int
Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination
- passes : int
Number of Training Passes
- initial_pass_length : int
initial number of examples per pass
- examples : int
number of examples to parse
- min_prediction : float
Smallest prediction to output
- max_prediction : float
Largest prediction to output
- sort_features : bool
turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
- loss_function : str
default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.
- quantile_tau : float
Parameter tau associated with Quantile loss. Defaults to 0.5
- l1 : float
l_1 lambda (L1 regularization)
- l2 : float
l_2 lambda (L2 regularization)
- no_bias_regularization : bool
no bias in regularization
- named_labels : str
use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”
Output model
- final_regressor,f : str
Final regressor
- readable_model : str
Output human-readable final regressor with numeric features
- invert_hash : str
Output human-readable final regressor with feature names. Computationally expensive.
- save_resume : bool
save extra state so learning can be resumed later with new data
- preserve_performance_counters : bool
reset performance counters when warmstarting
- output_feature_regularizer_binary : str
Per feature regularization output file
- output_feature_regularizer_text : str
Per feature regularization output file, in text
- Multiclass options
- oaa : integer
Use one-against-all multiclass learning with labels
- oaa_subsample : int
subsample this number of negative examples when learning
- ect : integer
Use error correcting tournament multiclass learning
- csoaa : integer
Use cost sensitive one-against-all multiclass learning
- wap : integer
Use weighted all pairs multiclass learning
- probabilities : float
predict probabilities of all classes
Neural Network options
- nn : integer
Use a sigmoidal feed-forward neural network with N hidden units
- inpass : bool
Train or test sigmoidal feed-forward network with input pass-through
- multitask : bool
Share hidden layer across all reduced tasks
- dropout : bool
Train or test sigmoidal feed-forward network using dropout
- meanfield : bool
Train or test sigmoidal feed-forward network using mean field
LBFGS and Conjugate Gradient options
- conjugate_gradient : bool
use conjugate gradient based optimization
- bgfs : bool
use bfgs updates
- hessian_on : bool
use second derivative in line search
- mem : int
memory in bfgs
- termination : float
termination threshold
Latent Dirichlet Allocation options
- lda : int
Run lda with <int> topics
- lda_alpha : float
Prior on sparsity of per-document topic weights
- lda_rho : float
Prior on sparsity of topic distributions
- lda_D : int
Number of documents
- lda_epsilon : float
Loop convergence threshold
- minibatch : int
Minibatch size for LDA
Stochastic Variance Reduced Gradient options
- svrg : bool
Streaming Stochastic Variance Reduced Gradient
- stage_size : int
Number of passes per SVRG stage
Follow the Regularized Leader options
- ftrl : bool
Run Follow the Proximal Regularized Leader
- coin : bool
Coin betting optimizer
- pistol : bool
PiSTOL: Parameter free STOchastic Learning
- ftrl_alpha : float
Alpha parameter for FTRL optimization
- ftrl_beta : float
Beta parameters for FTRL optimization
Kernel SVM options
- ksvm : bool
kernel svm
- kernel : str
type of kernel (rbf or linear (default))
- bandwidth : int
bandwidth of rbf kernel
- degree : int
degree of poly kernel
Gradient Descent options
- sgd : bool
use regular stochastic gradient descent update
- adaptive : bool
use adaptive, individual learning rates
- adax : bool
use adaptive learning rates with x^2 instead of g^2x^2
- invariant : bool
use save/importance aware updates
- normalized : bool
use per feature normalized updates
Scorer options
- link : str
Specify the link function: identity, logistic, glf1 or poisson
Stagewise polynomial options:
- stage_poly : bool
use stagewise polynomial feature learning
- sched_exponent : int
exponent controlling quantity of included features
- batch_sz : int
multiplier on batch size before including more features
- batch_sz_no_doubling : bool
batch_sz does not double
Low Rank Quadratics options:
- lrq : bool
use low rank quadratic features
- lrqdropout : bool
use dropout training for low rank quadratic features
- lrqfa : bool
use low rank quadratic features with field aware weights
Input options
- data,d : str
path to data file for fitting external to sklearn
- cache,c : str
use a cache. default is <data>.cache
- cache_file : str
path to cache file to use
- json : bool
enable JSON parsing
- kill_cache, k : bool
do not reuse existing cache file, create a new one always
Returns: - self : BaseEstimator
-
classes_
= array([-1., 1.])¶
-
coef_
= None¶
-
decision_function
(self, X)¶ Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.
Parameters: - X : array_like or sparse matrix, shape (n_samples, n_features)
Samples.
Returns: - array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
-
fit
(self, X=None, y=None, sample_weight=None)¶ Fit the model according to the given training data.
Parameters: - X : {array-like, sparse matrix} of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : array-like of shape (n_samples,)
Target vector relative to X.
- sample_weight : array-like of shape (n_samples,) default=None
Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
Returns: - self
Fitted estimator.
-
predict
(self, X)¶ Predict class labels for samples in X.
Parameters: - X : array_like or sparse matrix, shape (n_samples, n_features)
Samples.
Returns: - C : array, shape [n_samples]
Predicted class label per sample.
-
predict_proba
(self, X)¶ Predict probabilities for samples
Parameters: - X : {array-like, sparse matrix}, shape = (n_samples, n_features)
Samples.
Returns: - T : array-like of shape (n_samples, n_classes)
Returns the probability of the sample for each class in the model, where classes are ordered as they are in
self.classes_
.
-
class
vowpalwabbit.sklearn_vw.
VWMultiClassifier
(probabilities=True, **kwargs)¶ Bases:
vowpalwabbit.sklearn_vw.VWClassifier
Vowpal Wabbit MultiClassifier model Note - We are assuming the VW.predict returns probabilities, setting probabilities=False will break this assumption
Attributes: - classes_ : np.array
class labels
- estimator_: dict
type of estimator to use [csoaa, ect, oaa, wap] and number of classes
Methods
decision_function
(self, X)Predict confidence scores for samples. densify
(self)Convert coefficient matrix to dense array format. fit
(self[, X, y, sample_weight])Fit the model according to the given training data. get_coefs
(self)Returns coefficient weights as ordered sparse matrix get_intercept
(self)Returns intercept weight for model get_params
(self[, deep])This returns the full set of vw and estimator parameters currently in use get_vw
(self)Get the vw instance load
(self, filename)Load model from file predict
(self, X)Predict class labels for samples in X. predict_log_proba
(self, X)Log of probability estimates. predict_proba
(self, X)Predict probabilities for each class. save
(self, filename)Save model to file score
(self, X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. set_coefs
(self, coefs)Sets coefficients weights from ordered sparse matrix set_params
(self, \*\*kwargs)This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently sparsify
(self)Convert coefficient matrix to sparse format. -
__init__
(self, probabilities=True, **kwargs)¶ VW model constructor, exposing all supported parameters to keep sklearn happy
Parameters: - Estimator options
- convert_to_vw : bool
flag to convert X input to vw format
- convert_labels : bool
Convert labels of the form [0,1] to [-1,1]
VW options
- ring_size : int
size of example ring
- strict_parse : bool
throw on malformed examples
Update options
- learning_rate,l : float
Set learning rate
- power_t : float
t power value
- decay_learning_rate : float
Set Decay factor for learning_rate between passes
- initial_t : float
initial t value
- feature_mask : str
Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.
Weight options
- initial_regressor,i : str
Initial regressor(s)
- initial_weight : float
Set all weights to an initial value of arg.
- random_weights : bool
make initial weights random
- normal_weights : bool
make initial weights normal
- truncated_normal_weights : bool
make initial weights truncated normal
- sparse_weights : float
Use a sparse datastructure for weights
- input_feature_regularizer : str
Per feature regularization input file
Diagnostic options
- quiet : bool
Don’t output disgnostics and progress updates
Randomization options
- random_seed : integer
seed random number generator
Feature options
- hash : str
how to hash the features. Available options: strings, all
- hash_seed : int
seed for hash function
- ignore : str
ignore namespaces beginning with character <arg>
- ignore_linear : str
ignore namespaces beginning with character <arg> for linear terms only
- keep : str
keep namespaces beginning with character <arg>
- redefine : str
Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.
- bit_precision,b : integer
number of bits in the feature table
- noconstant : bool
Don’t add a constant feature
- constant,C : float
Set initial value of constant
- ngram : str
Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.
- skips : str
Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.
- feature_limit : str
limit to N features. To apply to a single namespace ‘foo’, arg should be fN
- affix : str
generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
- spelling : str
compute spelling features for a give namespace (use ‘_’ for default namespace)
- dictionary : str
read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)
- dictionary_path : str
look in this directory for dictionaries; defaults to current directory or env{PATH}
- interactions : str
Create feature interactions of any level between namespaces.
- permutations : bool
Use permutations instead of combinations for feature interactions of same namespace.
- leave_duplicate_interactions : bool
Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.
- quadratic,q : str
Create and use quadratic features, q:: corresponds to a wildcard for all printable characters
- cubic : str
Create and use cubic features
Example options
- testonly,t : bool
Ignore label information and just test
- holdout_off : bool
no holdout data in multiple passes
- holdout_period : int
holdout period for test only
- holdout_after : int
holdout after n training examples
- early_terminate : int
Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination
- passes : int
Number of Training Passes
- initial_pass_length : int
initial number of examples per pass
- examples : int
number of examples to parse
- min_prediction : float
Smallest prediction to output
- max_prediction : float
Largest prediction to output
- sort_features : bool
turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
- loss_function : str
default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.
- quantile_tau : float
Parameter tau associated with Quantile loss. Defaults to 0.5
- l1 : float
l_1 lambda (L1 regularization)
- l2 : float
l_2 lambda (L2 regularization)
- no_bias_regularization : bool
no bias in regularization
- named_labels : str
use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”
Output model
- final_regressor,f : str
Final regressor
- readable_model : str
Output human-readable final regressor with numeric features
- invert_hash : str
Output human-readable final regressor with feature names. Computationally expensive.
- save_resume : bool
save extra state so learning can be resumed later with new data
- preserve_performance_counters : bool
reset performance counters when warmstarting
- output_feature_regularizer_binary : str
Per feature regularization output file
- output_feature_regularizer_text : str
Per feature regularization output file, in text
- Multiclass options
- oaa : integer
Use one-against-all multiclass learning with labels
- oaa_subsample : int
subsample this number of negative examples when learning
- ect : integer
Use error correcting tournament multiclass learning
- csoaa : integer
Use cost sensitive one-against-all multiclass learning
- wap : integer
Use weighted all pairs multiclass learning
- probabilities : float
predict probabilities of all classes
Neural Network options
- nn : integer
Use a sigmoidal feed-forward neural network with N hidden units
- inpass : bool
Train or test sigmoidal feed-forward network with input pass-through
- multitask : bool
Share hidden layer across all reduced tasks
- dropout : bool
Train or test sigmoidal feed-forward network using dropout
- meanfield : bool
Train or test sigmoidal feed-forward network using mean field
LBFGS and Conjugate Gradient options
- conjugate_gradient : bool
use conjugate gradient based optimization
- bgfs : bool
use bfgs updates
- hessian_on : bool
use second derivative in line search
- mem : int
memory in bfgs
- termination : float
termination threshold
Latent Dirichlet Allocation options
- lda : int
Run lda with <int> topics
- lda_alpha : float
Prior on sparsity of per-document topic weights
- lda_rho : float
Prior on sparsity of topic distributions
- lda_D : int
Number of documents
- lda_epsilon : float
Loop convergence threshold
- minibatch : int
Minibatch size for LDA
Stochastic Variance Reduced Gradient options
- svrg : bool
Streaming Stochastic Variance Reduced Gradient
- stage_size : int
Number of passes per SVRG stage
Follow the Regularized Leader options
- ftrl : bool
Run Follow the Proximal Regularized Leader
- coin : bool
Coin betting optimizer
- pistol : bool
PiSTOL: Parameter free STOchastic Learning
- ftrl_alpha : float
Alpha parameter for FTRL optimization
- ftrl_beta : float
Beta parameters for FTRL optimization
Kernel SVM options
- ksvm : bool
kernel svm
- kernel : str
type of kernel (rbf or linear (default))
- bandwidth : int
bandwidth of rbf kernel
- degree : int
degree of poly kernel
Gradient Descent options
- sgd : bool
use regular stochastic gradient descent update
- adaptive : bool
use adaptive, individual learning rates
- adax : bool
use adaptive learning rates with x^2 instead of g^2x^2
- invariant : bool
use save/importance aware updates
- normalized : bool
use per feature normalized updates
Scorer options
- link : str
Specify the link function: identity, logistic, glf1 or poisson
Stagewise polynomial options:
- stage_poly : bool
use stagewise polynomial feature learning
- sched_exponent : int
exponent controlling quantity of included features
- batch_sz : int
multiplier on batch size before including more features
- batch_sz_no_doubling : bool
batch_sz does not double
Low Rank Quadratics options:
- lrq : bool
use low rank quadratic features
- lrqdropout : bool
use dropout training for low rank quadratic features
- lrqfa : bool
use low rank quadratic features with field aware weights
Input options
- data,d : str
path to data file for fitting external to sklearn
- cache,c : str
use a cache. default is <data>.cache
- cache_file : str
path to cache file to use
- json : bool
enable JSON parsing
- kill_cache, k : bool
do not reuse existing cache file, create a new one always
Returns: - self : BaseEstimator
-
classes_
= None¶
-
decision_function
(self, X)¶ Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.
Parameters: - X : array_like or sparse matrix, shape (n_samples, n_features)
Samples.
Returns: - array, shape=(n_samples, n_classes)
Confidence scores per (sample, class) combination.
-
estimator_
= None¶
-
fit
(self, X=None, y=None, sample_weight=None)¶ Fit the model according to the given training data.
Parameters: - X : {array-like, sparse matrix} of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : array-like of shape (n_samples,)
Target vector relative to X.
- sample_weight : array-like of shape (n_samples,) default=None
Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
Returns: - self
Fitted estimator.
-
predict_proba
(self, X)¶ Predict probabilities for each class.
Parameters: - X : {array-like, sparse matrix}, shape = (n_samples, n_features)
Samples.
Returns: - array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
Examples
>>> import numpy as np >>> X = np.array([ [10, 10], [8, 10], [-5, 5.5], [-5.4, 5.5], [-20, -20], [-15, -20] ]) >>> y = np.array([1, 1, 2, 2, 3, 3]) >>> from vowpalwabbit.sklearn_vw import VWMultiClassifier >>> model = VWMultiClassifier(oaa=3, loss_function='logistic') >>> model.fit(X, y) >>> model.predict_proba(X)
-
class
vowpalwabbit.sklearn_vw.
VWRegressor
(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)¶ Bases:
vowpalwabbit.sklearn_vw.VW
,sklearn.base.RegressorMixin
Vowpal Wabbit Regressor model
Attributes: - vw_
Methods
fit
(self[, X, y, sample_weight])Fit the model according to the given training data get_coefs
(self)Returns coefficient weights as ordered sparse matrix get_intercept
(self)Returns intercept weight for model get_params
(self[, deep])This returns the full set of vw and estimator parameters currently in use get_vw
(self)Get the vw instance load
(self, filename)Load model from file predict
(self, X)Predict with Vowpal Wabbit model save
(self, filename)Save model to file score
(self, X, y[, sample_weight])Returns the coefficient of determination R^2 of the prediction. set_coefs
(self, coefs)Sets coefficients weights from ordered sparse matrix set_params
(self, \*\*kwargs)This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently
-
vowpalwabbit.sklearn_vw.
tovw
(x, y=None, sample_weight=None, convert_labels=False)¶ Convert array or sparse matrix to Vowpal Wabbit format
Parameters: - x : {array-like, sparse matrix}, shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : {array-like}, shape (n_samples,), optional
Target vector relative to X.
- sample_weight : {array-like}, shape (n_samples,), optional
sample weight vector relative to X.
- convert_labels : {bool} convert labels of the form [0,1] to [-1,1]
Returns: - out : {array-like}, shape (n_samples, 1)
Training vectors in VW string format
Examples
>>> import pandas as pd >>> from sklearn.feature_extraction.text import HashingVectorizer >>> from vowpalwabbit.sklearn_vw import tovw >>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog') >>> y = pd.Series([-1, 1, -1, -1], name='label') >>> hv = HashingVectorizer() >>> hashed = hv.fit_transform(X) >>> tovw(x=hashed, y=y)