vowpalwabbit.sklearn

This is an optional module which implements sklearn compatability.

Deprecated alias

Deprecated since version 9.0.0: The module name vowpalwabbit.sklearn_vw has been renamed to vowpalwabbit.sklearn. Please use the new module name instead.

Example usage

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from vowpalwabbit.sklearn import VWClassifier
    # generate some data
X, y = datasets.make_hastie_10_2(n_samples=10000, random_state=1)
X = X.astype(np.float32)
    # split train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=256)
    # build model
model = VWClassifier()
model.fit(X_train, y_train)
    # predict model
y_pred = model.predict(X_test)
    # evaluate model
model.score(X_train, y_train)
model.score(X_test, y_test)

Module contents

Utilities to support integration of Vowpal Wabbit and scikit-learn

class vowpalwabbit.sklearn.LinearClassifierMixin

Bases: LogisticRegression

__init__()
set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearClassifierMixin

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearClassifierMixin

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class vowpalwabbit.sklearn.VW(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

Bases: BaseEstimator

Vowpal Wabbit Scikit-learn Base Estimator wrapper

__init__(convert_to_vw=True, convert_labels=True, ring_size=None, strict_parse=None, learning_rate=None, l=None, power_t=None, decay_learning_rate=None, initial_t=None, feature_mask=None, initial_regressor=None, i=None, initial_weight=None, random_weights=None, normal_weights=None, truncated_normal_weights=None, sparse_weights=None, input_feature_regularizer=None, quiet=True, random_seed=None, hash=None, hash_seed=None, ignore=None, ignore_linear=None, keep=None, redefine=None, bit_precision=None, b=None, noconstant=None, constant=None, C=None, ngram=None, skips=None, feature_limit=None, affix=None, spelling=None, dictionary=None, dictionary_path=None, interactions=None, permutations=None, leave_duplicate_interactions=None, quadratic=None, q=None, cubic=None, testonly=None, t=None, holdout_off=None, holdout_period=None, holdout_after=None, early_terminate=None, passes=1, initial_pass_length=None, examples=None, min_prediction=None, max_prediction=None, sort_features=None, loss_function=None, quantile_tau=None, l1=None, l2=None, no_bias_regularization=None, named_labels=None, final_regressor=None, f=None, readable_model=None, invert_hash=None, save_resume=None, preserve_performance_counters=None, output_feature_regularizer_binary=None, output_feature_regularizer_text=None, oaa=None, ect=None, csoaa=None, wap=None, probabilities=None, nn=None, inpass=None, multitask=None, dropout=None, meanfield=None, conjugate_gradient=None, bfgs=None, hessian_on=None, mem=None, termination=None, lda=None, lda_alpha=None, lda_rho=None, lda_D=None, lda_epsilon=None, minibatch=None, svrg=None, stage_size=None, ftrl=None, coin=None, pistol=None, ftrl_alpha=None, ftrl_beta=None, ksvm=None, kernel=None, bandwidth=None, degree=None, sgd=None, adaptive=None, invariant=None, normalized=None, link=None, stage_poly=None, sched_exponent=None, batch_sz=None, batch_sz_no_doubling=None, lrq=None, lrqdropout=None, lrqfa=None, data=None, d=None, cache=None, c=None, cache_file=None, json=None, kill_cache=None, k=None)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:
  • convert_to_vw (bool) – flag to convert X input to vw format

  • convert_labels (bool) – Convert labels of the form [0,1] to [-1,1]

  • ring_size (int) – size of example ring

  • strict_parse (bool) – throw on malformed examples

  • learning_rate (float) – Set learning rate

  • l (float) – Set learning rate

  • power_t (float) – t power value

  • decay_learning_rate (float) – Set Decay factor for learning_rate between passes

  • initial_t (float) – initial t value

  • feature_mask (str) – Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

  • initial_regressor (str) – Initial regressor(s)

  • i (str) – Initial regressor(s)

  • initial_weight (float) – Set all weights to an initial value of arg.

  • random_weights (bool) – make initial weights random

  • normal_weights (bool) – make initial weights normal

  • truncated_normal_weights (bool) – make initial weights truncated normal

  • sparse_weights (float) – Use a sparse datastructure for weights

  • input_feature_regularizer (str) – Per feature regularization input file

  • quiet (bool) – Don’t output disgnostics and progress updates

  • random_seed (integer) – seed random number generator

  • hash (str) – , all

  • hash_seed (int) – seed for hash function

  • ignore (str) – ignore namespaces beginning with character <arg>

  • ignore_linear (str) – ignore namespaces beginning with character <arg> for linear terms only

  • keep (str) – keep namespaces beginning with character <arg>

  • redefine (str) – Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

  • bit_precision (integer) – number of bits in the feature table

  • b (integer) – number of bits in the feature table

  • noconstant (bool) – Don’t add a constant feature

  • constant (float) – Set initial value of constant

  • C (float) – Set initial value of constant

  • ngram (str) – Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

  • skips (str) – Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

  • feature_limit (str) – limit to N features. To apply to a single namespace ‘foo’, arg should be fN

  • affix (str) – generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

  • spelling (str) – compute spelling features for a give namespace (use ‘_’ for default namespace)

  • dictionary (str) – read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

  • dictionary_path (str) – look in this directory for dictionaries; defaults to current directory or env{PATH}

  • interactions (str) – Create feature interactions of any level between namespaces.

  • permutations (bool) – Use permutations instead of combinations for feature interactions of same namespace.

  • leave_duplicate_interactions (bool) – Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

  • quadratic (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • q (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • cubic (str) – Create and use cubic features

  • testonly (bool) – Ignore label information and just test

  • t (bool) – Ignore label information and just test

  • holdout_off (bool) – no holdout data in multiple passes

  • holdout_period (int) – holdout period for test only

  • holdout_after (int) – holdout after n training examples

  • early_terminate (int) – Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

  • passes (int) – Number of Training Passes

  • initial_pass_length (int) – initial number of examples per pass

  • examples (int) – number of examples to parse

  • min_prediction (float) – Smallest prediction to output

  • max_prediction (float) – Largest prediction to output

  • sort_features (bool) – turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

  • loss_function (str) – default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

  • quantile_tau (float) – Parameter tau associated with Quantile loss. Defaults to 0.5

  • l1 (float) – l_1 lambda (L1 regularization)

  • l2 (float) – l_2 lambda (L2 regularization)

  • no_bias_regularization (bool) – no bias in regularization

  • named_labels (str) – use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

  • final_regressor (str) – Final regressor

  • f (str) – Final regressor

  • readable_model (str) – Output human-readable final regressor with numeric features

  • invert_hash (str) – Output human-readable final regressor with feature names. Computationally expensive.

  • save_resume (bool) – save extra state so learning can be resumed later with new data

  • preserve_performance_counters (bool) – reset performance counters when warmstarting

  • output_feature_regularizer_binary (str) – Per feature regularization output file

  • output_feature_regularizer_text (str) – Per feature regularization output file, in text

  • oaa (integer) – Use one-against-all multiclass learning with labels

  • oaa_subsample (int) – subsample this number of negative examples when learning

  • ect (integer) – Use error correcting tournament multiclass learning

  • csoaa (integer) – Use cost sensitive one-against-all multiclass learning

  • wap (integer) – Use weighted all pairs multiclass learning

  • probabilities (float) – predict probabilities of all classes

  • nn (integer) – Use a sigmoidal feed-forward neural network with N hidden units

  • inpass (bool) – Train or test sigmoidal feed-forward network with input pass-through

  • multitask (bool) – Share hidden layer across all reduced tasks

  • dropout (bool) – Train or test sigmoidal feed-forward network using dropout

  • meanfield (bool) – Train or test sigmoidal feed-forward network using mean field

  • conjugate_gradient (bool) – use conjugate gradient based optimization

  • bgfs (bool) – use bfgs updates

  • hessian_on (bool) – use second derivative in line search

  • mem (int) – memory in bfgs

  • termination (float) – termination threshold

  • lda (int) – Run lda with <int> topics

  • lda_alpha (float) – Prior on sparsity of per-document topic weights

  • lda_rho (float) – Prior on sparsity of topic distributions

  • lda_D (int) – Number of documents

  • lda_epsilon (float) – Loop convergence threshold

  • minibatch (int) – Minibatch size for LDA

  • svrg (bool) – Streaming Stochastic Variance Reduced Gradient

  • stage_size (int) – Number of passes per SVRG stage

  • ftrl (bool) – Run Follow the Proximal Regularized Leader

  • coin (bool) – Coin betting optimizer

  • pistol (bool) – PiSTOL - Parameter free STOchastic Learning

  • ftrl_alpha (float) – Alpha parameter for FTRL optimization

  • ftrl_beta (float) – Beta parameters for FTRL optimization

  • ksvm (bool) – kernel svm

  • kernel (str) – type of kernel (rbf or linear (default))

  • bandwidth (int) – bandwidth of rbf kernel

  • degree (int) – degree of poly kernel

  • sgd (bool) – use regular stochastic gradient descent update

  • adaptive (bool) – use adaptive, individual learning rates

  • adax (bool) – use adaptive learning rates with x^2 instead of g^2x^2

  • invariant (bool) – use save/importance aware updates

  • normalized (bool) – use per feature normalized updates

  • link (str) – Specify the link function - identity, logistic, glf1 or poisson

  • stage_poly (bool) – use stagewise polynomial feature learning

  • sched_exponent (int) – exponent controlling quantity of included features

  • batch_sz (int) – multiplier on batch size before including more features

  • batch_sz_no_doubling (bool) – batch_sz does not double

  • lrq (bool) – use low rank quadratic features

  • lrqdropout (bool) – use dropout training for low rank quadratic features

  • lrqfa (bool) – use low rank quadratic features with field aware weights

  • data (str) – path to data file for fitting external to sklearn

  • d (str) – path to data file for fitting external to sklearn

  • cache (str) – use a cache. default is <data>.cache

  • c (str) – use a cache. default is <data>.cache

  • cache_file (str) – path to cache file to use

  • json (bool) – enable JSON parsing

  • kill_cache (bool) – do not reuse existing cache file, create a new one always

  • k (bool) – do not reuse existing cache file, create a new one always

convert_labels: bool = True

Convert labels of the form [0,1] to [-1,1]

convert_to_vw: bool = True

flag to convert X input to vw format

fit(X=None, y=None, sample_weight=None)

Fit the model according to the given training data

Todo

For first pass create and store example objects. For N-1 passes use example objects directly (simulate cache file…but in memory for faster processing)

Parameters:
  • X – {array-like, sparse matrix}, shape (n_samples, n_features or 1 if not convert_to_vw) or Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

  • y – array-like, shape (n_samples,), optional if not convert_to_vw Target vector relative to X.

  • sample_weight – array-like, shape (n_samples,) sample weight vector relative to X.

Returns:

self

get_coefs()

Returns coefficient weights as ordered sparse matrix

Returns:

coefficient weights for model

Return type:

sparse matrix

get_intercept()

Returns intercept weight for model

Returns:

intercept value. 0 if no constant

Return type:

int

get_params(deep=True)

This returns the full set of vw and estimator parameters currently in use

get_vw()

Get the vw instance

Returns:

instance

Return type:

vowpalwabbit.Workspace

load(filename)

Load model from file

predict(X)

Predict with Vowpal Wabbit model

Parameters:

X ({array-like, sparse matrix}, shape (n_samples, n_features or 1)) – Training vector, where n_samples in the number of samples and n_features is the number of features. if not using convert_to_vw, X is expected to be a list of vw formatted feature vector strings with labels

Returns:

  1. Output vector relative to X.

Return type:

array-like, shape (n_samples, 1 or n_classes)

save(filename)

Save model to file

set_coefs(coefs)

Sets coefficients weights from ordered sparse matrix

Parameters:

coefs (sparse matrix) – coefficient weights for model

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VW

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**kwargs)

This destroys and recreates the Vowpal Wabbit model with updated parameters any parameters not provided will remain as they are currently

vw_: Workspace = None
class vowpalwabbit.sklearn.VWClassifier(loss_function='logistic', **kwargs)

Bases: VW, LinearClassifierMixin

Vowpal Wabbit Classifier model for binary classification Use VWMultiClassifier for multiclass classification Note - We are assuming the VW.predict returns logits, applying link=logistic will break this assumption

__init__(loss_function='logistic', **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:
  • convert_to_vw (bool) – flag to convert X input to vw format

  • convert_labels (bool) – Convert labels of the form [0,1] to [-1,1]

  • ring_size (int) – size of example ring

  • strict_parse (bool) – throw on malformed examples

  • learning_rate (float) – Set learning rate

  • l (float) – Set learning rate

  • power_t (float) – t power value

  • decay_learning_rate (float) – Set Decay factor for learning_rate between passes

  • initial_t (float) – initial t value

  • feature_mask (str) – Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

  • initial_regressor (str) – Initial regressor(s)

  • i (str) – Initial regressor(s)

  • initial_weight (float) – Set all weights to an initial value of arg.

  • random_weights (bool) – make initial weights random

  • normal_weights (bool) – make initial weights normal

  • truncated_normal_weights (bool) – make initial weights truncated normal

  • sparse_weights (float) – Use a sparse datastructure for weights

  • input_feature_regularizer (str) – Per feature regularization input file

  • quiet (bool) – Don’t output disgnostics and progress updates

  • random_seed (integer) – seed random number generator

  • hash (str) – , all

  • hash_seed (int) – seed for hash function

  • ignore (str) – ignore namespaces beginning with character <arg>

  • ignore_linear (str) – ignore namespaces beginning with character <arg> for linear terms only

  • keep (str) – keep namespaces beginning with character <arg>

  • redefine (str) – Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

  • bit_precision (integer) – number of bits in the feature table

  • b (integer) – number of bits in the feature table

  • noconstant (bool) – Don’t add a constant feature

  • constant (float) – Set initial value of constant

  • C (float) – Set initial value of constant

  • ngram (str) – Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

  • skips (str) – Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

  • feature_limit (str) – limit to N features. To apply to a single namespace ‘foo’, arg should be fN

  • affix (str) – generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

  • spelling (str) – compute spelling features for a give namespace (use ‘_’ for default namespace)

  • dictionary (str) – read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

  • dictionary_path (str) – look in this directory for dictionaries; defaults to current directory or env{PATH}

  • interactions (str) – Create feature interactions of any level between namespaces.

  • permutations (bool) – Use permutations instead of combinations for feature interactions of same namespace.

  • leave_duplicate_interactions (bool) – Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

  • quadratic (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • q (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • cubic (str) – Create and use cubic features

  • testonly (bool) – Ignore label information and just test

  • t (bool) – Ignore label information and just test

  • holdout_off (bool) – no holdout data in multiple passes

  • holdout_period (int) – holdout period for test only

  • holdout_after (int) – holdout after n training examples

  • early_terminate (int) – Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

  • passes (int) – Number of Training Passes

  • initial_pass_length (int) – initial number of examples per pass

  • examples (int) – number of examples to parse

  • min_prediction (float) – Smallest prediction to output

  • max_prediction (float) – Largest prediction to output

  • sort_features (bool) – turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

  • loss_function (str) – default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

  • quantile_tau (float) – Parameter tau associated with Quantile loss. Defaults to 0.5

  • l1 (float) – l_1 lambda (L1 regularization)

  • l2 (float) – l_2 lambda (L2 regularization)

  • no_bias_regularization (bool) – no bias in regularization

  • named_labels (str) – use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

  • final_regressor (str) – Final regressor

  • f (str) – Final regressor

  • readable_model (str) – Output human-readable final regressor with numeric features

  • invert_hash (str) – Output human-readable final regressor with feature names. Computationally expensive.

  • save_resume (bool) – save extra state so learning can be resumed later with new data

  • preserve_performance_counters (bool) – reset performance counters when warmstarting

  • output_feature_regularizer_binary (str) – Per feature regularization output file

  • output_feature_regularizer_text (str) – Per feature regularization output file, in text

  • oaa (integer) – Use one-against-all multiclass learning with labels

  • oaa_subsample (int) – subsample this number of negative examples when learning

  • ect (integer) – Use error correcting tournament multiclass learning

  • csoaa (integer) – Use cost sensitive one-against-all multiclass learning

  • wap (integer) – Use weighted all pairs multiclass learning

  • probabilities (float) – predict probabilities of all classes

  • nn (integer) – Use a sigmoidal feed-forward neural network with N hidden units

  • inpass (bool) – Train or test sigmoidal feed-forward network with input pass-through

  • multitask (bool) – Share hidden layer across all reduced tasks

  • dropout (bool) – Train or test sigmoidal feed-forward network using dropout

  • meanfield (bool) – Train or test sigmoidal feed-forward network using mean field

  • conjugate_gradient (bool) – use conjugate gradient based optimization

  • bgfs (bool) – use bfgs updates

  • hessian_on (bool) – use second derivative in line search

  • mem (int) – memory in bfgs

  • termination (float) – termination threshold

  • lda (int) – Run lda with <int> topics

  • lda_alpha (float) – Prior on sparsity of per-document topic weights

  • lda_rho (float) – Prior on sparsity of topic distributions

  • lda_D (int) – Number of documents

  • lda_epsilon (float) – Loop convergence threshold

  • minibatch (int) – Minibatch size for LDA

  • svrg (bool) – Streaming Stochastic Variance Reduced Gradient

  • stage_size (int) – Number of passes per SVRG stage

  • ftrl (bool) – Run Follow the Proximal Regularized Leader

  • coin (bool) – Coin betting optimizer

  • pistol (bool) – PiSTOL - Parameter free STOchastic Learning

  • ftrl_alpha (float) – Alpha parameter for FTRL optimization

  • ftrl_beta (float) – Beta parameters for FTRL optimization

  • ksvm (bool) – kernel svm

  • kernel (str) – type of kernel (rbf or linear (default))

  • bandwidth (int) – bandwidth of rbf kernel

  • degree (int) – degree of poly kernel

  • sgd (bool) – use regular stochastic gradient descent update

  • adaptive (bool) – use adaptive, individual learning rates

  • adax (bool) – use adaptive learning rates with x^2 instead of g^2x^2

  • invariant (bool) – use save/importance aware updates

  • normalized (bool) – use per feature normalized updates

  • link (str) – Specify the link function - identity, logistic, glf1 or poisson

  • stage_poly (bool) – use stagewise polynomial feature learning

  • sched_exponent (int) – exponent controlling quantity of included features

  • batch_sz (int) – multiplier on batch size before including more features

  • batch_sz_no_doubling (bool) – batch_sz does not double

  • lrq (bool) – use low rank quadratic features

  • lrqdropout (bool) – use dropout training for low rank quadratic features

  • lrqfa (bool) – use low rank quadratic features with field aware weights

  • data (str) – path to data file for fitting external to sklearn

  • d (str) – path to data file for fitting external to sklearn

  • cache (str) – use a cache. default is <data>.cache

  • c (str) – use a cache. default is <data>.cache

  • cache_file (str) – path to cache file to use

  • json (bool) – enable JSON parsing

  • kill_cache (bool) – do not reuse existing cache file, create a new one always

  • k (bool) – do not reuse existing cache file, create a new one always

classes_ = array([-1.,  1.])

Binary class labels

coef_ = None

Empty sparse matrix used the check if model has been fit

decision_function(X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters:

X – array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns:

array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

fit(X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

Parameters:
  • X – {array-like, sparse matrix} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – array-like of shape (n_samples,) Target vector relative to X.

  • sample_weight – array-like of shape (n_samples,) default=None Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:

self

predict(X)

Predict class labels for samples in X.

Parameters:

X – array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns:

  1. Predicted class label per sample.

Return type:

array, shape [n_samples]

predict_proba(X)

Predict probabilities for samples

Parameters:

X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.

Returns:

  1. Returns the probability of the sample for each class in the model,

    where classes are ordered as they are in self.classes_.

Return type:

array-like of shape (n_samples, n_classes)

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VWClassifier

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VWClassifier

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class vowpalwabbit.sklearn.VWMultiClassifier(probabilities=True, **kwargs)

Bases: VWClassifier

Vowpal Wabbit MultiClassifier model Note - We are assuming the VW.predict returns probabilities, setting probabilities=False will break this assumption

__init__(probabilities=True, **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:
  • convert_to_vw (bool) – flag to convert X input to vw format

  • convert_labels (bool) – Convert labels of the form [0,1] to [-1,1]

  • ring_size (int) – size of example ring

  • strict_parse (bool) – throw on malformed examples

  • learning_rate (float) – Set learning rate

  • l (float) – Set learning rate

  • power_t (float) – t power value

  • decay_learning_rate (float) – Set Decay factor for learning_rate between passes

  • initial_t (float) – initial t value

  • feature_mask (str) – Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

  • initial_regressor (str) – Initial regressor(s)

  • i (str) – Initial regressor(s)

  • initial_weight (float) – Set all weights to an initial value of arg.

  • random_weights (bool) – make initial weights random

  • normal_weights (bool) – make initial weights normal

  • truncated_normal_weights (bool) – make initial weights truncated normal

  • sparse_weights (float) – Use a sparse datastructure for weights

  • input_feature_regularizer (str) – Per feature regularization input file

  • quiet (bool) – Don’t output disgnostics and progress updates

  • random_seed (integer) – seed random number generator

  • hash (str) – , all

  • hash_seed (int) – seed for hash function

  • ignore (str) – ignore namespaces beginning with character <arg>

  • ignore_linear (str) – ignore namespaces beginning with character <arg> for linear terms only

  • keep (str) – keep namespaces beginning with character <arg>

  • redefine (str) – Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

  • bit_precision (integer) – number of bits in the feature table

  • b (integer) – number of bits in the feature table

  • noconstant (bool) – Don’t add a constant feature

  • constant (float) – Set initial value of constant

  • C (float) – Set initial value of constant

  • ngram (str) – Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

  • skips (str) – Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

  • feature_limit (str) – limit to N features. To apply to a single namespace ‘foo’, arg should be fN

  • affix (str) – generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

  • spelling (str) – compute spelling features for a give namespace (use ‘_’ for default namespace)

  • dictionary (str) – read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

  • dictionary_path (str) – look in this directory for dictionaries; defaults to current directory or env{PATH}

  • interactions (str) – Create feature interactions of any level between namespaces.

  • permutations (bool) – Use permutations instead of combinations for feature interactions of same namespace.

  • leave_duplicate_interactions (bool) – Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

  • quadratic (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • q (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • cubic (str) – Create and use cubic features

  • testonly (bool) – Ignore label information and just test

  • t (bool) – Ignore label information and just test

  • holdout_off (bool) – no holdout data in multiple passes

  • holdout_period (int) – holdout period for test only

  • holdout_after (int) – holdout after n training examples

  • early_terminate (int) – Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

  • passes (int) – Number of Training Passes

  • initial_pass_length (int) – initial number of examples per pass

  • examples (int) – number of examples to parse

  • min_prediction (float) – Smallest prediction to output

  • max_prediction (float) – Largest prediction to output

  • sort_features (bool) – turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

  • loss_function (str) – default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

  • quantile_tau (float) – Parameter tau associated with Quantile loss. Defaults to 0.5

  • l1 (float) – l_1 lambda (L1 regularization)

  • l2 (float) – l_2 lambda (L2 regularization)

  • no_bias_regularization (bool) – no bias in regularization

  • named_labels (str) – use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

  • final_regressor (str) – Final regressor

  • f (str) – Final regressor

  • readable_model (str) – Output human-readable final regressor with numeric features

  • invert_hash (str) – Output human-readable final regressor with feature names. Computationally expensive.

  • save_resume (bool) – save extra state so learning can be resumed later with new data

  • preserve_performance_counters (bool) – reset performance counters when warmstarting

  • output_feature_regularizer_binary (str) – Per feature regularization output file

  • output_feature_regularizer_text (str) – Per feature regularization output file, in text

  • oaa (integer) – Use one-against-all multiclass learning with labels

  • oaa_subsample (int) – subsample this number of negative examples when learning

  • ect (integer) – Use error correcting tournament multiclass learning

  • csoaa (integer) – Use cost sensitive one-against-all multiclass learning

  • wap (integer) – Use weighted all pairs multiclass learning

  • probabilities (float) – predict probabilities of all classes

  • nn (integer) – Use a sigmoidal feed-forward neural network with N hidden units

  • inpass (bool) – Train or test sigmoidal feed-forward network with input pass-through

  • multitask (bool) – Share hidden layer across all reduced tasks

  • dropout (bool) – Train or test sigmoidal feed-forward network using dropout

  • meanfield (bool) – Train or test sigmoidal feed-forward network using mean field

  • conjugate_gradient (bool) – use conjugate gradient based optimization

  • bgfs (bool) – use bfgs updates

  • hessian_on (bool) – use second derivative in line search

  • mem (int) – memory in bfgs

  • termination (float) – termination threshold

  • lda (int) – Run lda with <int> topics

  • lda_alpha (float) – Prior on sparsity of per-document topic weights

  • lda_rho (float) – Prior on sparsity of topic distributions

  • lda_D (int) – Number of documents

  • lda_epsilon (float) – Loop convergence threshold

  • minibatch (int) – Minibatch size for LDA

  • svrg (bool) – Streaming Stochastic Variance Reduced Gradient

  • stage_size (int) – Number of passes per SVRG stage

  • ftrl (bool) – Run Follow the Proximal Regularized Leader

  • coin (bool) – Coin betting optimizer

  • pistol (bool) – PiSTOL - Parameter free STOchastic Learning

  • ftrl_alpha (float) – Alpha parameter for FTRL optimization

  • ftrl_beta (float) – Beta parameters for FTRL optimization

  • ksvm (bool) – kernel svm

  • kernel (str) – type of kernel (rbf or linear (default))

  • bandwidth (int) – bandwidth of rbf kernel

  • degree (int) – degree of poly kernel

  • sgd (bool) – use regular stochastic gradient descent update

  • adaptive (bool) – use adaptive, individual learning rates

  • adax (bool) – use adaptive learning rates with x^2 instead of g^2x^2

  • invariant (bool) – use save/importance aware updates

  • normalized (bool) – use per feature normalized updates

  • link (str) – Specify the link function - identity, logistic, glf1 or poisson

  • stage_poly (bool) – use stagewise polynomial feature learning

  • sched_exponent (int) – exponent controlling quantity of included features

  • batch_sz (int) – multiplier on batch size before including more features

  • batch_sz_no_doubling (bool) – batch_sz does not double

  • lrq (bool) – use low rank quadratic features

  • lrqdropout (bool) – use dropout training for low rank quadratic features

  • lrqfa (bool) – use low rank quadratic features with field aware weights

  • data (str) – path to data file for fitting external to sklearn

  • d (str) – path to data file for fitting external to sklearn

  • cache (str) – use a cache. default is <data>.cache

  • c (str) – use a cache. default is <data>.cache

  • cache_file (str) – path to cache file to use

  • json (bool) – enable JSON parsing

  • kill_cache (bool) – do not reuse existing cache file, create a new one always

  • k (bool) – do not reuse existing cache file, create a new one always

classes_ = None

Class labels

decision_function(X)

Predict confidence scores for samples. The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters:

X – array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns:

Confidence scores per (sample, class) combination.

Return type:

array, shape=(n_samples, n_classes)

estimator_ = None

“type of estimator to use [csoaa, ect, oaa, wap] and number of classes

fit(X=None, y=None, sample_weight=None)

Fit the model according to the given training data.

Parameters:
  • X – {array-like, sparse matrix} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – array-like of shape (n_samples,) Target vector relative to X.

  • sample_weight – array-like of shape (n_samples,) default=None Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:

self

predict_proba(X)

Predict probabilities for each class.

Parameters:

X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.

Returns:

array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Examples

>>> import numpy as np
>>> X = np.array([ [10, 10], [8, 10], [-5, 5.5], [-5.4, 5.5], [-20, -20],  [-15, -20] ])
>>> y = np.array([1, 1, 2, 2, 3, 3])
>>> from vowpalwabbit.sklearn import VWMultiClassifier
>>> model = VWMultiClassifier(oaa=3, loss_function='logistic')
>>> _ = model.fit(X, y)
>>> model.predict_proba(X)
array([[0.38926664, 0.30536669, 0.30536669],
       [0.40663728, 0.2966814 , 0.2966814 ],
       [0.52337217, 0.23831393, 0.23831393],
       [0.52698863, 0.23650573, 0.23650573],
       [0.6543135 , 0.17284323, 0.17284323],
       [0.61224902, 0.19387549, 0.19387549]])
set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VWMultiClassifier

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VWMultiClassifier

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class vowpalwabbit.sklearn.VWRegressor(convert_labels=False, **kwargs)

Bases: VW, RegressorMixin

Vowpal Wabbit Regressor model

__init__(convert_labels=False, **kwargs)

VW model constructor, exposing all supported parameters to keep sklearn happy

Parameters:
  • convert_to_vw (bool) – flag to convert X input to vw format

  • convert_labels (bool) – Convert labels of the form [0,1] to [-1,1]

  • ring_size (int) – size of example ring

  • strict_parse (bool) – throw on malformed examples

  • learning_rate (float) – Set learning rate

  • l (float) – Set learning rate

  • power_t (float) – t power value

  • decay_learning_rate (float) – Set Decay factor for learning_rate between passes

  • initial_t (float) – initial t value

  • feature_mask (str) – Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

  • initial_regressor (str) – Initial regressor(s)

  • i (str) – Initial regressor(s)

  • initial_weight (float) – Set all weights to an initial value of arg.

  • random_weights (bool) – make initial weights random

  • normal_weights (bool) – make initial weights normal

  • truncated_normal_weights (bool) – make initial weights truncated normal

  • sparse_weights (float) – Use a sparse datastructure for weights

  • input_feature_regularizer (str) – Per feature regularization input file

  • quiet (bool) – Don’t output disgnostics and progress updates

  • random_seed (integer) – seed random number generator

  • hash (str) – , all

  • hash_seed (int) – seed for hash function

  • ignore (str) – ignore namespaces beginning with character <arg>

  • ignore_linear (str) – ignore namespaces beginning with character <arg> for linear terms only

  • keep (str) – keep namespaces beginning with character <arg>

  • redefine (str) – Redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form ‘N:=S’ where := is operator. Empty N or S are treated as default namespace. Use ‘:’ as a wildcard in S.

  • bit_precision (integer) – number of bits in the feature table

  • b (integer) – number of bits in the feature table

  • noconstant (bool) – Don’t add a constant feature

  • constant (float) – Set initial value of constant

  • C (float) – Set initial value of constant

  • ngram (str) – Generate N grams. To generate N grams for a single namespace ‘foo’, arg should be fN.

  • skips (str) – Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace ‘foo’, arg should be fN.

  • feature_limit (str) – limit to N features. To apply to a single namespace ‘foo’, arg should be fN

  • affix (str) – generate prefixes/suffixes of features; argument ‘+2a,-3b,+1’ means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

  • spelling (str) – compute spelling features for a give namespace (use ‘_’ for default namespace)

  • dictionary (str) – read a dictionary for additional features (arg either ‘x:file’ or just ‘file’)

  • dictionary_path (str) – look in this directory for dictionaries; defaults to current directory or env{PATH}

  • interactions (str) – Create feature interactions of any level between namespaces.

  • permutations (bool) – Use permutations instead of combinations for feature interactions of same namespace.

  • leave_duplicate_interactions (bool) – Don’t remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: ‘-q ab -q ba’ and a lot more in ‘-q ::’.

  • quadratic (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • q (str) – Create and use quadratic features, q:: corresponds to a wildcard for all printable characters

  • cubic (str) – Create and use cubic features

  • testonly (bool) – Ignore label information and just test

  • t (bool) – Ignore label information and just test

  • holdout_off (bool) – no holdout data in multiple passes

  • holdout_period (int) – holdout period for test only

  • holdout_after (int) – holdout after n training examples

  • early_terminate (int) – Specify the number of passes tolerated when holdout loss doesn’t decrease before early termination

  • passes (int) – Number of Training Passes

  • initial_pass_length (int) – initial number of examples per pass

  • examples (int) – number of examples to parse

  • min_prediction (float) – Smallest prediction to output

  • max_prediction (float) – Largest prediction to output

  • sort_features (bool) – turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes

  • loss_function (str) – default_value(“squared”), “Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.

  • quantile_tau (float) – Parameter tau associated with Quantile loss. Defaults to 0.5

  • l1 (float) – l_1 lambda (L1 regularization)

  • l2 (float) – l_2 lambda (L2 regularization)

  • no_bias_regularization (bool) – no bias in regularization

  • named_labels (str) – use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg “–named_labels Noun,Verb,Adj,Punc”

  • final_regressor (str) – Final regressor

  • f (str) – Final regressor

  • readable_model (str) – Output human-readable final regressor with numeric features

  • invert_hash (str) – Output human-readable final regressor with feature names. Computationally expensive.

  • save_resume (bool) – save extra state so learning can be resumed later with new data

  • preserve_performance_counters (bool) – reset performance counters when warmstarting

  • output_feature_regularizer_binary (str) – Per feature regularization output file

  • output_feature_regularizer_text (str) – Per feature regularization output file, in text

  • oaa (integer) – Use one-against-all multiclass learning with labels

  • oaa_subsample (int) – subsample this number of negative examples when learning

  • ect (integer) – Use error correcting tournament multiclass learning

  • csoaa (integer) – Use cost sensitive one-against-all multiclass learning

  • wap (integer) – Use weighted all pairs multiclass learning

  • probabilities (float) – predict probabilities of all classes

  • nn (integer) – Use a sigmoidal feed-forward neural network with N hidden units

  • inpass (bool) – Train or test sigmoidal feed-forward network with input pass-through

  • multitask (bool) – Share hidden layer across all reduced tasks

  • dropout (bool) – Train or test sigmoidal feed-forward network using dropout

  • meanfield (bool) – Train or test sigmoidal feed-forward network using mean field

  • conjugate_gradient (bool) – use conjugate gradient based optimization

  • bgfs (bool) – use bfgs updates

  • hessian_on (bool) – use second derivative in line search

  • mem (int) – memory in bfgs

  • termination (float) – termination threshold

  • lda (int) – Run lda with <int> topics

  • lda_alpha (float) – Prior on sparsity of per-document topic weights

  • lda_rho (float) – Prior on sparsity of topic distributions

  • lda_D (int) – Number of documents

  • lda_epsilon (float) – Loop convergence threshold

  • minibatch (int) – Minibatch size for LDA

  • svrg (bool) – Streaming Stochastic Variance Reduced Gradient

  • stage_size (int) – Number of passes per SVRG stage

  • ftrl (bool) – Run Follow the Proximal Regularized Leader

  • coin (bool) – Coin betting optimizer

  • pistol (bool) – PiSTOL - Parameter free STOchastic Learning

  • ftrl_alpha (float) – Alpha parameter for FTRL optimization

  • ftrl_beta (float) – Beta parameters for FTRL optimization

  • ksvm (bool) – kernel svm

  • kernel (str) – type of kernel (rbf or linear (default))

  • bandwidth (int) – bandwidth of rbf kernel

  • degree (int) – degree of poly kernel

  • sgd (bool) – use regular stochastic gradient descent update

  • adaptive (bool) – use adaptive, individual learning rates

  • adax (bool) – use adaptive learning rates with x^2 instead of g^2x^2

  • invariant (bool) – use save/importance aware updates

  • normalized (bool) – use per feature normalized updates

  • link (str) – Specify the link function - identity, logistic, glf1 or poisson

  • stage_poly (bool) – use stagewise polynomial feature learning

  • sched_exponent (int) – exponent controlling quantity of included features

  • batch_sz (int) – multiplier on batch size before including more features

  • batch_sz_no_doubling (bool) – batch_sz does not double

  • lrq (bool) – use low rank quadratic features

  • lrqdropout (bool) – use dropout training for low rank quadratic features

  • lrqfa (bool) – use low rank quadratic features with field aware weights

  • data (str) – path to data file for fitting external to sklearn

  • d (str) – path to data file for fitting external to sklearn

  • cache (str) – use a cache. default is <data>.cache

  • c (str) – use a cache. default is <data>.cache

  • cache_file (str) – path to cache file to use

  • json (bool) – enable JSON parsing

  • kill_cache (bool) – do not reuse existing cache file, create a new one always

  • k (bool) – do not reuse existing cache file, create a new one always

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VWRegressor

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VWRegressor

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

vowpalwabbit.sklearn.log_logistic(X)

Compute log(1 / (1 + exp(-X))) in a numerically stable way.

This function was previously available in sklearn.utils.extmath but was removed in newer versions. We implement it locally for compatibility.

Parameters:

X – array-like

Returns:

log(1 / (1 + exp(-X)))

Return type:

array-like

vowpalwabbit.sklearn.tovw(x, y=None, sample_weight=None, convert_labels=False)

Convert array or sparse matrix to Vowpal Wabbit format

Parameters:
  • x – {array-like, sparse matrix}, shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – {array-like}, shape (n_samples,), optional Target vector relative to X.

  • sample_weight – {array-like}, shape (n_samples,), optional sample weight vector relative to X.

  • convert_labels – {bool} convert labels of the form [0,1] to [-1,1]

Returns:

{array-like}, shape (n_samples, 1)

Training vectors in VW string format

Examples

>>> import pandas as pd
>>> from sklearn.feature_extraction.text import HashingVectorizer
>>> from vowpalwabbit.sklearn import tovw
>>> X = pd.Series(['cat', 'dog', 'cat', 'cat'], name='catdog')
>>> y = pd.Series([-1, 1, -1, -1], name='label')
>>> hv = HashingVectorizer()
>>> hashed = hv.fit_transform(X)
>>> tovw(x=hashed, y=y)
['-1 1 | 300839:1', '1 1 | 980517:-1', '-1 1 | 300839:1', '-1 1 | 300839:1']