Definitions

Overview of all definitions and vocabulary used to speak about scoring

Described below are the vast majority of definitions for functions and statistical tools used to publish scores. Read the open-sourced code at numerai-tools/scoring. Install the package with:

> pip install numerai_tools

Statistics

tie-broken rank
- percentile rank a series
- break ties based on id / index
tie-kept rank
- percentile rank a series
- for each set of ties set their ranks to the average of that set's tie-broken ranks
correlation
- correlation coefficient between two series
spearman correlation
- spearman correlation coefficient between live target and predictions
- different than tie-broken-rank correlation b/c spearman ranking keeps ties by assigning mean rank
pearson correlation
- pearson correlation coefficient between live target and predictions
- different that other correlations b/c pearson does not use ranking
tie-broken-rank correlation
- correlation between live target and tie-broken ranked predictions (w/ sorted index, no nans)
- NOTE: This is a pearson correlation, but rank the predictions, so it behaves more like a spearman. It is impossible to achieve 1.0 correlation because targets still have ties but predictions do not.
variance normalize
- given vector s, normalize its standard deviation to 1
power x / pow x
- given vector s, exponentiate each value of s to some power x, ignoring sign
gaussianize
- given vector s, make s unit norm by dividing standard deviation of s
neutralization
- given vector s, find the orthogonal component s' WRT a matrix of neutralizers N:
- s' = s -(N dot (N_inverse dot s))
orthogonalize
- similar to neutralize, but is faster for 2 centered column vectors
- given vectors u and v, find the component of v that is orthogonal to u:
- v - (u ⦻ ( dot(transpose(v), u) / dot(transpose(u), u) )
numerai corr
- given prediction vector s and target vector t find the correlation between s and t:
  - s` = tie-kept rank, then gaussianize, then pow 1.5 vector s
  - t` = pow 1.5 vector t
  - calculate the pearson correlation of s` and t`
feature neutral corr
- given prediction vector s, matrix of features to neutralize F, and target vector t, find the correlation of s with t after neutralizing to F:
  - s` = tie-kept rank, then gaussianize s
  - s`` = neutralize s` to F, then variance normalize
  - calculate numerai corr of s`` and t
correlation contribution
- given target vector t, meta model vector m, and prediction vector s, find how much s contributes to m’s correlation with t:
  - m` = tie-kept rank then gaussianize m
  - s` = tie-kept rank then gaussianize s
  - s`` = orthogonalize s` with respect to m
- get the covariance of s`` and t

Factors & Features

Factors
- unencrypted (possibly cleaned/formatted/etc.) data from our data providers
- signals not given to users, but are very well-known in finance
- we always neutralize targets, portfolios, and the Meta Model to these
Features
- encrypted stock market signals given to users for use as machine learning features
- a dataset is made of several variations of a smaller set of features
- we usually penalize exposure to these, but are not always 100% neutral
V3 Features
- all features used in our v3 "supermassive" dataset
V4 Medium Safe Features
- there are 5 feature variations in the v4 dataset
- only 2 of those variations are included in this subset

Targets

weekdays
- Mon - Fri (20D = 20 Days = 4 weeks)
returns lag
- number of days skipped before starting returns calculations
- (2L = 2 Lag = skip 2 weekdays)
timeline XDYL
- scores over X weekdays with Y days of returns lag
neutralizers
- factors/features to which the target is neutral
bins=x
- values for the target are binned into x distinct bins
uniformity = x, y, z, …
- x% of values in outer 2 bins (e.g. 0 and 1)
- y% of values in next inner 2 bins (e.g. 0.25 and 0.75)
- z% of values in next inner bin(s) (e.g. 0.5)
- …
target_[name]_20
- timeline: 20D2L
- bins=5, uniformity=10%, 40%, 50%
- neutralizers: Common Factors and/or Features
target_[name]_60
- timeline: 60D2L
- bins=5, uniformity=10%, 40%, 50%
- neutralizers: Common Factors and/or Features

Meta Models

Meta Models aggregate submissions into a single signal that Numerai uses to trade:

Stake-Weighted Meta Model (SWMM)
- A stake-weighted average of Numerai submissions
- The Numerai Hedge Fund uses this for trading
Benchmark Meta Model (BMM)
- A stake-weighted average of Benchmark Models

Scores

data lag
- number of days it takes our vendors to process returns data
- scores start returns lag + data lag days after a round closes (usually 2+2=4 days)
MMC - Meta Model Contribution
- correlation contribution of a submission, SWMM, and target_cyrus_20
- timeline: 20D2L (+ 2 days data lag)
CORR20v2 - Correlation 20D2L v2
- numerai corr of a submission against target_cyrus_20
- timeline: 20D2L (+ 2 days data lag)
CORJ60 - Correlation Jerome 60D2L
- numerai corr of a submission against target_jerome_60
- timeline: 60D2L (+ 2 days data lag)
BMC - Benchmark Model Contribution
- correlation contribution of a submission, BMM, and target_cyrus_20
- timeline: 20D2L (+ 2 days data lag)
FNCV3 - Feature Neutral Correlation V3
- feature neutral corr of a submission, V3 Features, and target_nomi_20
- timeline: 20D2L (+ 2 days data lag)
CWMM - Corr w/ Meta Model
- s` = tie-kept rank, then gaussianize, then pow 1.5 a submission s
- calculate pearson correlation between s` and SWMM
- timeline: 4 days data lag / not dependant on returns
MCWNM - Max Corr w/ Numerai Models
- Maximum pearson correlation of a submission with any other Tournament submission
- only compared to other submissions made in the same round
- timeline: 4 days data lag)/ not dependant on returns
APCWNM - Average Pairwise Corr w/ Numerai Models
- Average pearson correlation of a submission with each other Tournament submission
- only compared to other submissions made in the same round
- timeline: 4 days data lag / not dependant on returns

PreviousScoring NextCorrelation (CORR)

Last updated 1 year ago