This tutorial demonstrates how to approach a regression problem with Vowpal Wabbit. It features an overview of a linear regression problem using a Vowpal Wabbit workflow tutorial with examples, introduces unique Vowpal Wabbit features, and explains how to structure input and understand the results.
To install Vowpal Wabbit, and for more information on building Vowpal Wabbit from source or using a package manager, see Get Started.
Note: See Command Line Tutorial for Vowpal Wabbit command line basics and a quick introduction to training and testing your model. See Python Tutorial to explore the basics for using Python to pass some data to Vowpal Wabbit to learn a model and get a prediction.
Before we begin making predictions for regression problems, we need to create a dataset. For example, say we want to predict whether a house will require a new roof in the next 10 years.
Create a training-set file in Vowpal Wabbit
house_dataset and copy the following dataset:
0 | price:.23 sqft:.25 age:.05 2006 1 2 'second_house | price:.18 sqft:.15 age:.35 1976 0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924
Vowpal Wabbit hashes feature names into in-memory indexes by default unless the feature names are positive integers.
For example, in the first line of the
house_dataset example, the first three features use an index derived from a hash function while the last feature uses index
2006 directly. Also, the first three features have explicit values (
.05 respectively) while the last,
2006 has an implicit default value of
0 | price:.23 sqft:.25 age:.05 2006
0label corresponds to no roof-replacement, while a
1label corresponds to a roof-replacement.
|separates label related data (what we want to predict) from features (what we always know).
2006. Each feature may have an optional
:<numeric_value>following it or, if the value is missing, an implied value of
The label information for the second line is more complex:
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
1is the label indicating that a roof-replacement is required.
2is an optional importance weight which implies that this example counts twice. Importance weights come up in many settings.
'second_houseis the tag. See Vowpal Wabbit live diagnostics section for more on importance weight.
The third line is more straightforward, except for an additional number. In the label information following the importance weight, the
0.5 is an initial prediction.:
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924
Sometimes you have multiple interacting learning systems and want to be able to predict an offset rather than an absolute value.
Now, we learn:
Num weight bits = 18 learning rate = 0.5 initial_t = 0 power_t = 0.5 using no cache Reading datafile = house_dataset num sources = 1 average since example example current current current loss last counter weight label predict features 0.000000 0.000000 1 1.0 0.0000 0.0000 5 0.666667 1.000000 2 3.0 1.0000 0.0000 5
finished run number of examples = 3 weighted example sum = 4.000000 weighted label sum = 2.000000 average loss = 0.750000 best constant = 0.500000 best constant’s loss = 0.250000 total feature number = 15
This section provides information on the various types of diagnostic output Vowpal Wabbit presents.
--quiet command to turn off diagnostic information in Vowpal Wabbit.
The following output shows the number of bits from the hash function:
Num weight bits = 18
This diagnostic ouput shows that the number of bits from the hash function is 18 (more than enough for this example).
-b bits to adjust the number of bits to be used from the hash function.
The following output shows the learning rate:
learning rate = 0.5
The default learning rate is
0.5 with current default update (
--normalized --invariant --adaptive).
If the data is noisy, you need a larger data-set or multiple passes to predict well. For massive data-sets, the learning rate decays towards
0 by default.
-l rate to adjust the learning rate up or down.
Note: A higher learning rate makes the model converge faster, but if you adjust the learning rate too high, you risk over-fit and end-up worse on average.
The following output shows the initial time for learning rate decay:
initial_t = 0
Note: Learning rates often decay over time, and this diagnostic output specifies the initial time. You can adjust with
--initial_t time, although this is rarely necessary these days.
The following output specifies the power on the learning rate decay:
power_t = 0.5
The default is
0.5 and a minimax optimal choice that works well for most problems in Vowpal Wabbit. A different way of stating this is: stationary data-sets where the fundamental relation between the input features and target label are not changing over time, should benefit from a high (close to 1.0)
--power_t while learning against changing conditions, like learning against an adversary who continuously changes the rules-of-the-game, would benefit from low (close to 0)
--power_t so the learner can react quickly to these changing conditions.
Note: You can adjust this
--power_t pwhere p is in the range [0,1]. 0 means the learning rate does not decay, which can be helpful when state tracking, while 1 is very aggressive, but plausibly optimal for IID data-sets.
The following output shows that you are not using a cache file:
using no cache
A cache file contains our dataset in a faster to handle format and can greatly speed up training if we use multiple passes or run multiple experiments on the same dataset (even with different options). The default cache file name is the dataset file name with
--cache_file housing.cache to override the default cache file name.
The cache file is created the first time you use
-c. If the cache exists and is newer than the dataset, that file is used by default.
-c for multiple passes
--passes, so Vowpal Wabbit caches the data in a faster format (passes > 1 should be much faster). If you want to experiment with the same dataset over and over, it is highly recommended to pass
-c every time you train.
The following output shows the source of the data:
Reading datafile = house_dataset
Note: There are many different ways to input data to Vowpal Wabbit. Here we’re just using a simple text file and Vowpal Wabbit tells us the source of the data. Alternative sources include cache files (from previous runs), stdin, or a tcp socket.
The following output shows the number of data sources:
num sources = 1
There is only one input file in this example, but we can specify multiple files.
Vowpal Wabbit prints live diagnostic information in the header like the following:
average since example example current current current loss last counter weight label predict features 0.000000 0.000000 1 1.0 0.0000 0.0000 5 0.666667 1.000000 2 3.0 1.0000 0.0000 5
average lossoutput computes the progressive validation loss. The critical thing to understand here is that progressive validation loss deviates like a test set, and hence is a reliable indicator of success on the first pass over any data-set.
since lastoutput is the progressive validation loss since the last printout.
example counteroutput tells you which example is printed. In this case, it’s example
example weightoutput tells you the sum of the importance weights of examples seen so far. In this case it’s
3.0, because the second example has an importance weight of
current labeloutput tells you the label of the second example.
current predictoutput tells you the prediction (before training) on the current example.
current featuresoutput tells you the amount of features in the current example.
current features diagnostic is great for debugging. Note that we have five features when you expect four. This happens because Vowpal Wabbit always adds a default constant feature.
--noconstant command-line option to turn it off.
Vowpal Wabbit prints a new line with an exponential backoff. This is very handy, because we can often debug a problem before the learning algorithm finishes going through a data-set.
finished run number of examples = 3 weighted example sum = 4.000000 weighted label sum = 2.000000 average loss = 0.750000 best constant = 0.500000 best constant's loss = 0.250000 total feature number = 15
At the end, some more straightforward totals are printed. The
best constant and
best constant's loss only work if you are using squared loss. Squared loss is the Vowpal Wabbit default. They compute the best constant’s predictor and the loss of the best constant predictor.
average loss is not better than
best constant's loss, something is wrong. In this case, we have too few examples to generalize.
If you want to overfit, use the following:
vw house_dataset -l 10 -c --passes 25 --holdout_off
The progress section of the output is:
average since example example current current current loss last counter weight label predict features 0.000000 0.000000 1 1.0 0.0000 0.0000 5 0.666667 1.000000 2 3.0 1.0000 0.0000 5 0.589385 0.531424 5 7.0 1.0000 0.2508 5 0.378923 0.194769 11 15.0 1.0000 0.8308 5 0.184476 0.002182 23 31.0 1.0000 0.9975 5 0.090774 0.000000 47 63.0 1.0000 1.0000 5
You’ll notice that by example 47 (25 passes over 3 examples result in 75 examples), the
since last column has dropped to
0, implying that by looking at the same (three lines) of data 25 times we have reached a perfect predictor. This is unsurprising with three examples having five features each.
The reason we have to add
--holdout_off is that when running multiple-passes, Vowpal Wabbit automatically switches to ‘over-fit avoidance’ mode by holding-out 10% of the examples (the “1 in 10” period can be changed using
--holdout_period period) and evaluating performance on the held-out data instead of using the online-training progressive loss.
Vowpal Wabbit learns the weights of the features and keeps them in an in memory vector by default.
-f filename to save the final regressor weights to a file.
vw house_dataset -l 10 -c --passes 25 --holdout_off -f house.model
We can make predictions in Vowpal Wabbit by supplying the
For example, to output them to standard out
vw house_dataset -p /dev/stdout --quiet
0.000000 0.000000 second_house 1.000000 third_house
0.000000refers to the first example which has an empty tag.
0.000000 second_houserefers to the second example. Notice that the tag appears here. The primary use of the tag is mapping predictions to the corresponding examples.
1.000000 third_houserefers to the third example. The initial prediction was set to
0.5, and the prediction is now
1.000000. This means some learning occurred.
In the last example, Vowpal Wabbit predicted while it learned. The model was being built in memory incrementally, as it went over the examples.
It is more common to learn first, then save the model to a file. Then, you make predictions using that saved model.
-i house.model to load the initial model to memory. Add
-t to specify test-only (do no learning):
vw -i house.model -t house_dataset -p /dev/stdout --quiet
0.000000 1.000000 second_house 0.000000 third_house
Obviously the results are different this time, because in the first prediction example, we learned as we went, and made only one pass over the data. For the second example, we loaded an over-fitted (25 pass) model and used our dataset
-t (testing only mode).
Note: Always use a different dataset for testing vs training for real prediction settings.
Vowpal Wabbit has a built in
--audit option that is helpful for debugging a machine learning application.
--audit to output helpful information about predictions and features:
vw house_dataset --audit --quiet
0 price:229902:0.23:0@0 sqft:162853:0.25:0@0 age:165201:0.05:0@0 2006:2006:1:0@0 Constant:116060:1:0@0 0 second_house price:229902:0.18:0@0 sqft:162853:0.15:0@0 age:165201:0.35:0@0 1976:1976:1:0@0 Constant:116060:1:0@0 1 third_house price:229902:0.53:firstname.lastname@example.org age:165201:0.87:email@example.com sqft:162853:0.32:firstname.lastname@example.org Constant:116060:1:0.15882@8 1924:1924:1:0@0
Every example uses two lines:
The first feature listed is:
The original feature name is
price. Vowpal Wabbit has an advanced namespaces option that allows us to group features and operate them on-the-fly. If we use a namespace, it appears before
Namespace options include the following:
-q XYto cross a pair of namespaces.
--cubic XYZto cross 3 namespaces.
--lrq XYnlow-rank quadratic interactions.
--ignore Xskip all features belonging to a namespace.
Now, let’s return to the first feature listed again:
229902, computed by a hash function on the feature name.
@0.25(when you use per-feature adaptive learning rates).
Notice that the feature
2006 uses the index 2006. This means that you may use hashes or pre-computed indices for features, as is common in other machine learning systems.
The advantage of using unique integer-based feature names is that they are guaranteed not to collide after hashing. The advantage of free-text (non integer) feature names is readability and self-documentation.
|, and spaces are special to the Vowpal Wabbit parser, we can give features easy-to-read names. For example:
height>2 value_in_range[1..5] color=red
We can even start feature names with a digit. For example:
This tutorial only describes a fraction of Vowpal Wabbit’s capabilities. To explore more about other Vowpal Wabbit features and performance — loss functions, optimizers, and representations — including ridiculously fast active learning with clusters of thousands of machines, see the following resources: