Definitions
Overview of all definitions and vocabulary used to speak about scoring
Described below are the vast majority of definitions for functions and statistical tools used to publish scores. Read the open-sourced code at numerai-tools/scoring. Install the package with:
Statistics
tie-broken rank
percentile rank a series
break ties based on id / index
tie-kept rank
percentile rank a series
for each set of ties set their ranks to the average of that set's tie-broken ranks
correlation
correlation coefficient between two series
spearman correlation
spearman correlation coefficient between live target and predictions
different than tie-broken-rank correlation b/c spearman ranking keeps ties by assigning mean rank
pearson correlation
pearson correlation coefficient between live target and predictions
different that other correlations b/c pearson does not use ranking
tie-broken-rank correlation
correlation between live target and tie-broken ranked predictions (w/ sorted index, no nans)
NOTE: This is a pearson correlation, but rank the predictions, so it behaves more like a spearman. It is impossible to achieve 1.0 correlation because targets still have ties but predictions do not.
variance normalize
given vector s, normalize its standard deviation to 1
power x / pow x
given vector s, exponentiate each value of s to some power x, ignoring sign
gaussianize
given vector s, make s unit norm by dividing standard deviation of s
neutralization
given vector s, find the orthogonal component s' WRT a matrix of neutralizers N:
s' = s -(N dot (N_inverse dot s))
orthogonalize
similar to neutralize, but is faster for 2 centered column vectors
given vectors u and v, find the component of v that is orthogonal to u:
v - (u ⦻ ( dot(transpose(v), u) / dot(transpose(u), u) )
numerai corr
given prediction vector s and target vector t find the correlation between s and t:
s` = tie-kept rank, then gaussianize, then pow 1.5 vector s
t` = pow 1.5 vector t
calculate the pearson correlation of s` and t`
feature neutral corr
given prediction vector s, matrix of features to neutralize F, and target vector t, find the correlation of s with t after neutralizing to F:
s` = tie-kept rank, then gaussianize s
s`` = neutralize s` to F, then variance normalize
calculate numerai corr of s`` and t
correlation contribution
given target vector t, meta model vector m, and prediction vector s, find how much s contributes to m’s correlation with t:
m` = tie-kept rank then gaussianize m
s` = tie-kept rank then gaussianize s
s`` = orthogonalize s` with respect to m
get the covariance of s`` and t
Factors & Features
Factors
unencrypted (possibly cleaned/formatted/etc.) data from our data providers
signals not given to users, but are very well-known in finance
we always neutralize targets, portfolios, and the Meta Model to these
Features
encrypted stock market signals given to users for use as machine learning features
a dataset is made of several variations of a smaller set of features
we usually penalize exposure to these, but are not always 100% neutral
V3 Features
all features used in our v3 "supermassive" dataset
V4 Medium Safe Features
there are 5 feature variations in the v4 dataset
only 2 of those variations are included in this subset
Targets
weekdays
Mon - Fri (20D = 20 Days = 4 weeks)
returns lag
number of days skipped before starting returns calculations
(2L = 2 Lag = skip 2 weekdays)
timeline XDYL
scores over X weekdays with Y days of returns lag
neutralizers
factors/features to which the target is neutral
bins=x
values for the target are binned into x distinct bins
uniformity = x, y, z, …
x% of values in outer 2 bins (e.g. 0 and 1)
y% of values in next inner 2 bins (e.g. 0.25 and 0.75)
z% of values in next inner bin(s) (e.g. 0.5)
…
target_[name]_20
timeline: 20D2L
bins=5, uniformity=10%, 40%, 50%
neutralizers: Common Factors and/or Features
target_[name]_60
timeline: 60D2L
bins=5, uniformity=10%, 40%, 50%
neutralizers: Common Factors and/or Features
Meta Models
Meta Models aggregate submissions into a single signal that Numerai uses to trade:
Stake-Weighted Meta Model (SWMM)
A stake-weighted average of Numerai submissions
The Numerai Hedge Fund uses this for trading
Benchmark Meta Model (BMM)
A stake-weighted average of Benchmark Models
Scores
data lag
number of days it takes our vendors to process returns data
scores start returns lag + data lag days after a round closes (usually 2+2=4 days)
MMC - Meta Model Contribution
correlation contribution of a submission, SWMM, and target_cyrus_20
timeline: 20D2L (+ 2 days data lag)
CORR20v2 - Correlation 20D2L v2
numerai corr of a submission against target_cyrus_20
timeline: 20D2L (+ 2 days data lag)
CORJ60 - Correlation Jerome 60D2L
numerai corr of a submission against target_jerome_60
timeline: 60D2L (+ 2 days data lag)
BMC - Benchmark Model Contribution
correlation contribution of a submission, BMM, and target_cyrus_20
timeline: 20D2L (+ 2 days data lag)
FNCV3 - Feature Neutral Correlation V3
feature neutral corr of a submission, V3 Features, and target_nomi_20
timeline: 20D2L (+ 2 days data lag)
CWMM - Corr w/ Meta Model
s` = tie-kept rank, then gaussianize, then pow 1.5 a submission s
calculate pearson correlation between s` and SWMM
timeline: 4 days data lag / not dependant on returns
MCWNM - Max Corr w/ Numerai Models
Maximum pearson correlation of a submission with any other Tournament submission
only compared to other submissions made in the same round
timeline: 4 days data lag)/ not dependant on returns
APCWNM - Average Pairwise Corr w/ Numerai Models
Average pearson correlation of a submission with each other Tournament submission
only compared to other submissions made in the same round
timeline: 4 days data lag / not dependant on returns
Last updated