Models
Tutorials
The best way to learn about building models on our data is through our tutorials:
Benchmark Models
Numerai Benchmark Models are a set of standard models that the Numerai team built. Their predictions are then given out every round so that anyone can easily submit them and stake on them if they want. These models are an easy way to compare your model to the current state-of-the-art.
The list of models and their recent performance is here: numer.ai/~benchmark_models
Download
The validation and live predictions are available through the api.
How they are made
Walk Forward Cross Validation
All predictions are made using a Walk-Forward framework. This means all predictions are made using models which were trained only on data which was available prior to the date of the prediction being made.
Specifically, the data is split up into chunks of 156 eras. Then for each chunk of eras, the predictions are given by a model which is trained up to first_era_of_chunk - purge_eras. The number of purge_eras is always 8 for 20D targets, and 16 for 60D targets.
So a model is trained on eras 1 through 148, then purge eras 149 through 156, and then predict eras 157 through 312. Next, train on eras 1 through 304, purge 305 through 312, predict 313 through 468, and so on. Your walk-forward validation windows should look something like this:
1
1
148
157
312
2
1
304
313
468
3
1
460
469
624
4
1
616
625
780
...
...
...
...
...
Standard Large LGBM params
Most models use the following LGBM parameters:
Deep LGBM params
We've found that having more trees can be helpful, and we've found that having less trees with more depth can also achieve similar results with lower compute requirements. You can read more about this hyper-parameter research in this forum post.
After the release of v5 data, we announced the higher performance "deep" parameters we used to train the v5 benchmark models:
Ensembles
All of the ensembles use the following steps:
gaussianize each of the predictions on a per-era basis
standardize to standard deviation 1
dot-product the predictions with a weights vector representing the desired weight on each model
gaussianize the resulting vector
(if applicable) neutralize the vector
Steps 1 through 4 look something like this:
Neutralization
A couple of the models have some neutralization involved. This is basically doing a regression to find out your predictions' exposures to each feature, and then subtracting those exposures from your predictions vector such that the result is a vector which is orthogonal to all of those features.
Here's the code to neutralize some set of vectors (columns) by some list of features (neutralizers):
What are they?
The naming formula for many benchmarks is as follows:
{data_version}_LGBM_{target}
There are many models that have some combination of a data version (V2, V3, V4, V41, V42, V43, V5) and a target (e.g. cyrusd_20, teager2b_20, etc.). These are models trained in the standard walk-forward way, with standard LGBM parameters, using the specified data version and target. That's all!
There are also unique models we created that don't have that naming scheme:
V5_LGBM_CT_BLEND (coming soon)
This is a simple 50/50 blend of V5_LGBM_TEAGER2B20 and V5_LGBM_CYRUSD20
The following models are on the Benchmark Models page, but their predictions aren't present in the predictions files because they are easily reproducible:
INTEGRATION_TEST - Submits our favorite model at the time. This has transitions through V2, V3, and V4 example predictions. It is now v5_lgbm_ct_blend.
NB_HELLO_NUMERAI - Submits the model created by the default Hello Numerai tutorial notebook.
NB_FEATURE_NEUTRAL - Submits the model created by the feature neutralization tutorial notebook.
NB_TARGET_ENSEMBLE - Submits the model created by the target ensemble tutorial notebook.
NB_EXAMPLE_MODEL - Submits the model created by the barebones example_model notebook.
Community Models
The Numerai community has also developed Numerbay, a website to buy and sell models built by and for the Numerai community. Keep in mind that Numerai does not gaurantee the performance of any model listed on Numerbay,
Last updated