With VW 9.0.0, use --predict_only_model
when using --invert_hash
or --readable_model
to get old (pre 9.0.0) behavior.
With the introduction of Version 9.0.0, saved VW models allow online training to continue by default. This is different from previous versions where --save_resume
had to be explicitly specified. This change was motivated by a frequent failure mode in a common use case. Specifically, when experimenting interactively with VW models (such as in a in Jupyter notebook), train, test, save and load are called interactively and repeatedly. Failure to specify --save_resume
leads to unexpected behavior.
As a result, additional information is now serialized when --invert_hash
or --readable_model
is specified. In order to get the old behavior, please use --predict_only_model
when using --invert_hash
or --readable_model
invert_hash
or readable_model
When saving a textual representation of a model in 9.0.0 with either --invert_hash
or --readable_model
the default format is different. It additionally includes training resumption information. We have seen people depend on the format of the invert_hash
model. If you have a script which depends on this format you can add --predict_only_model
to produce the old format and maintain script behavior.
If parsing the invert_hash
contents most users may have searched for the :0
to denote the feature section of the invert hash model. This is not in the default 9.0.0 invert_hash
output as :0
and :1
actually refer to save_resume:false
and save_resume:true
respectively. If you depend on the format then passing --predict_only_model
is recommended.
With data.txt
:
0 | price:.23 sqft:.25 age:.05 2006
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924
invert_hash
As of version 9.0.0 vw -d data.txt --invert_hash model.txt
produces:
model.txt
:
Version 9.0.0
Id
Min label:0
Max label:1
bits:18
lda:0
0 ngram:
0 skip:
options:
Checksum: 2704463348
:1
initial_t 0
norm normalizer 15
t 4
sum_loss 3
sum_loss_since_last_dump 1
dump_interval 6
min_label 0
max_label 1
weighted_labeled_examples 4
weighted_labels 2
weighted_unlabeled_examples 0
example_number 3
total_features 15
total_weight 3
sd::oec.weighted_labeled_examples 3
current_pass 1
l1_state 0
l2_state 1
1924:1924:-0.145249 4 1
1976:1976:0.158838 8 1
Constant:116060:0.07498 12 1
sqft:162853:0.118057 0.5896 0.32
age:165201:0.0374855 4.0076 0.87
price:229902:0.052627 1.3828 0.53
invert_hash
Use --predict_only_model
to get the pre 9.0.0 default.
As of version 9.0.0 vw -d data.txt --invert_hash model.txt --predict_only_model
produces:
model.txt
:
Version 9.0.0
Id
Min label:0
Max label:1
bits:18
lda:0
0 ngram:
0 skip:
options:
Checksum: 2704463348
:0
1924:1924:-0.145249
1976:1976:0.158838
Constant:116060:0.07498
sqft:162853:0.118057
age:165201:0.0374855
price:229902:0.052627
This also means that binary models produced with --final_regressor
/-f
will contain information required for resuming training. VW is able to consume this format automatically. If you have tooling which consumes binary models keep in mind this default change in format.