Overview

Everything you need to know to get started in under 5 minutes!

Introduction

Numerai is a data science competition where you build machine learning models to predict the stock market.

Data

Start with Numerai's free dataset made of clean and regularized financial data.
The dataset is obfuscated so that it can be given out for free and modeled without any financial domain knowledge.
Numerai's obfuscated dataset
Each row in the dataset corresponds to a stock at a specific point in time, represented by the era. The features are quantitative attributes (e.g P/E ratio) known about the stock at the time, and the target is a measure of stock market returns 20 days into the future.
See the Data section for more details.

Modeling

Your objective is to build machine learning models to predict the target.
Here is an example model in Python using LightGBM, but you can use any language or framework that you like.
import lightgbm as lgb
model = lgb.LGBMRegressor(
n_estimators=2000,
learning_rate=0.01,
max_depth=5,
num_leaves=2 ** 5,
colsample_bytree=0.1
)
model.fit(
training_data[[f for f in training_data.columns if "feature" in f]],
training_data["target"]
)
See the Models section for more examples.

Submissions

Every business day, new live features are released which represent the current state of the stock market. Your job is to generate live predictions and submit them to Numerai.
Here is an example of how you generate and upload live predictions in Python:
# Authenticate
napi = numerapi.NumerAPI("api-public-id", "api-secret-key")
# Get current round
current_round = napi.get_current_round()
# Download latest live features
napi.download_dataset(f"v4.2/live_int8_{current_round}.parquet")
live_data = pd.read_parquet(f"v4.2/live_int8_{current_round}.parquet")
live_features = live_data[[f for f in live_data.columns if "feature" in f]]
# Generate live predictions
live_predictions = model.predict(live_features)
# Format submission
submission = pd.Series(live_predictions, index=live_features.index).to_frame("prediction")
submission.to_csv(f"prediction_{current_round}.csv")
# Upload submission
napi.upload_predictions(f"prediction_{current_round}.csv", model_id="your-model-id")
This is what a submission looks like:
See the Submissions section for more details and examples.

Scoring

Submissions are scored against two main metrics:
  • Correlation (CORR): Your prediction's correlation to the target
  • True contribution (TC): Your prediction's contribution to the hedge fund's returns
Since the target is a measure of 20 day stock market returns, it takes 20 days for each submission to be scored.
See the Scoring section for more details.

Staking

When you are ready and confident in your model's performance, you may stake it with NMR - Numerai's cryptocurrency.
After the 20 days of scoring for each submission, models with positive scores are rewarded with more NMR, while those with negative scores have a portion of their staked NMR burned.
Behind the scenes, Numerai combines the predictions of all staked models into the stake-weighted Meta Model, which in turn is fed into the Numerai Hedge Fund for trading.
Staking serves two important functions:
  1. 1.
    "Skin in the game" allows Numerai to trust the quality of staked predictions.
  2. 2.
    Payouts and burns continuously improve the weights of the Meta Model.
See the Staking section for more details.

Support

Find us on Discord for questions, support, and feedback!
Last modified 21d ago