Numerai is a data science tournament that powers the Numerai hedge fund. Watch the meta-model video to understand how it works at a high level.
The long term vision of Numerai is to manage all the money in the world with a decentralized network of autonomous AI agents. Read our master plan to learn more.
This document is a brief overview of the tournament structure and rules. If you are new, start here!
To make good predictions, you need good data. But production grade financial data is not easy to find. Hedge funds spend millions buying and managing this data, so they keep it secret.
Numerai provides this production grade financial data for free. Our data is obfuscated to keep the actual assets and features secret while preserving underlying structure.
This is what our
training_data looks like. Each
id represents an asset with some abstract
features and each
era is a unit period of time in history. The
target is an abstract measure of performance.
Your task is to train a model to make predictions on the out-of-sample
tournament_data. This dataset includes
test hold out sets, as well as
live features of current stock market.
Here is a basic example model.
import pandas as pdfrom xgboost import XGBRegressor# training data contains features and targetstraining_data = pd.read_csv("numerai_training_data.csv").set_index("id")# tournament data contains features onlytournament_data = pd.read_csv("numerai_tournament_data.csv").set_index("id")feature_names = [f for f in training_data.columns if "feature" in f]# train a model to make predictions on tournament datamodel = XGBRegressor(max_depth=5, learning_rate=0.01, \n_estimators=2000, colsample_bytree=0.1)model.fit(training_data[feature_names], training_data["target"])# submit predictions to numer.aipredictions = model.predict(tournament_data[feature_names])predictions.to_csv("predictions.csv")
To help you get started, we have also written two detailed walkthroughs of the problem in Python and R. These guides cover key concepts such as feature importance, cross validation, consistency, overfitting, and how to use
eras. Whether you are a novice or master level data scientist, we highly recommend that you go through these guides!
Every weekend, new
tournament_data is released and a new round begins. To participate in the round, run the new
tournament_data through your model and submit your predictions back to Numerai.
Submission files look like this. The
id column must match the one in
tournament_data exactly. The
prediction can be any number between 0 and 1 (exclusive).
You can upload your submission at any time before the next round opens. However, only submissions made before
Monday 14:30 UTC are considered on-time. Late submissions will not count towards your score and will not be eligible for payouts or bonuses.
For advanced users, check out Numerai Compute - a framework to help you automate your submission workflow.
Numerai measures performance based on the
rank_correlation between your predictions and the true targets.
# method='first' breaks ties based on order in arrayranked_predictions = predictions.rank(pct=True, method="first")correlation = np.corrcoef(labels, ranked_predictions)[0, 1]
Each day (for 4 weeks) the submission gets an updated correlation score showing how well it has done so far.
If you upload new submissions each week, you will get overlapping scores of multiple submissions as shown below. Notice that there are no scores on Sundays or Mondays. These gaps correspond to the weekends when markets are closed.
Here is how the example model performed over 10 weeks. Each colored line represents the correlation of a different submission. Notice how they are staggered.
We combine these overlapping scores into a single continuous score by taking the daily marginal change in
correlation score of each submission, and averaging it across all overlapping submissions. We call this
average_daily_correlation, and is the primary score that all payouts and bonuses are based.
Here is a graph of the daily marginal changes in
correlationshown above in colored dots and the
average_daily_correlation in solid black.
stake on your model to start earning daily payouts.
Staking requires you to lock up NMR in an Erasure smart contract agreement. This gives Numerai the ability to grief (aka burn) your stake if your performance is poor. This also known as having "skin in the game".
Your daily payout is a function of your
average_daily_correlation. For example, if your
stake_value is 100 NMR, and your
average_daily_correlation is 0.1, your payout will be +50% and so you will earn 50 NMR. If instead your
average_daily_correlation is -0.1, then your payout will be -50% and so you will lose 50 NMR.
Payouts occur every day scores are updated, and the payout curve is applied to each
average_daily_correlation score independently. All payouts are rolled into your stake balance, but they don't effect your
stake_value used for payout calculation until the following Thursday. For example, the payouts computed from the 11th to 17th use the initial
100 but from the 18th forward until the next command, payouts will use
150 as the
You can increase your stake at any time and it will apply next Thursday. Decreasing works similarly except it always takes an additional 4 weeks.
At the beginning of each Thursday, up to
100K NMR in stakes will be selected and eligible for payouts. If the total amount staked exceeds this, then all stakes will be selected pro rata.
Maintaining a high
average_daily_correlation over time earns you a place on the leaderboard and a large daily bonus.
rank on the leaderboard depends on your
reputation, which is the sum of your
average_daily_correlationover the past 100 days.
Any days with a missing
average_daily_correlation score will be filled with a
-0.005. We call this adjusted score
average_daily_correlation_penalized and will use this to compute your reputation instead. This means that new users start with
-0.5. This also means that if you have been submitting weekly, you would need to miss 4 submissions in a row to be penalized.
Your bonus is a function of your
rank amongst all staked models (otherwise known as
staked_rank) and your
stake_value at the beginning of the 100 day window. For example, if your
stake_value was 100 NMR at the beginning of the window and your
staked_rank is 1, then you will get a 5 NMR bonus.
Like payouts, bonuses are paid into your stake balance. The max bonus paid out per day is
250 NMR across all models. If the total bonus amount exceeds this, then all bonuses will be paid pro rata.
We reserve the right to refund your stake and void all earnings and burns if we believe that you are actively abusing or exploiting the payout rules.
Need help with anything?
Find us on RocketChat for questions, support, and feedback!