Definitions

Overview of all definitions and vocabulary used to speak about scoring

Described below are the vast majority of definitions for functions and statistical tools used to publish scores. Read the open-sourced code at numerai-tools/scoring. Install the package with:

> pip install numerai_tools

Statistics

  • tie-broken rank

  • tie-kept rank

    • for each set of ties set their ranks to the average of that set's tie-broken ranks

  • correlation

    • correlation coefficient between two series

  • spearman correlation

    • spearman correlation coefficient between live target and predictions

    • different than tie-broken-rank correlation b/c spearman ranking keeps ties by assigning mean rank

  • pearson correlation

  • tie-broken-rank correlation

    • correlation between live target and tie-broken ranked predictions (w/ sorted index, no nans)

    • NOTE: This is a pearson correlation, but rank the predictions, so it behaves more like a spearman. It is impossible to achieve 1.0 correlation because targets still have ties but predictions do not.

  • variance normalize

    • given vector s, normalize its standard deviation to 1

  • power x / pow x

    • given vector s, exponentiate each value of s to some power x, ignoring sign

  • gaussianize

    • given vector s, make s unit norm by dividing standard deviation of s

  • neutralization

    • given vector s, find the orthogonal component s' WRT a matrix of neutralizers N:

    • s' = s -(N dot (N_inverse dot s))

  • orthogonalize

    • similar to neutralize, but is faster for 2 centered column vectors

    • given vectors u and v, find the component of v that is orthogonal to u:

    • v - (u ⦻ ( dot(transpose(v), u) / dot(transpose(u), u) )

  • numerai corr

    • given prediction vector s and target vector t find the correlation between s and t:

      • s` = tie-kept rank, then gaussianize, then pow 1.5 vector s

      • t` = pow 1.5 vector t

      • calculate the pearson correlation of s` and t`

  • feature neutral corr

    • given prediction vector s, matrix of features to neutralize F, and target vector t, find the correlation of s with t after neutralizing to F:

      • s` = tie-kept rank, then gaussianize s

      • s`` = neutralize s` to F, then variance normalize

      • calculate numerai corr of s`` and t

  • correlation contribution

    • given target vector t, meta model vector m, and prediction vector s, find how much s contributes to m’s correlation with t:

      • m` = tie-kept rank then gaussianize m

      • s` = tie-kept rank then gaussianize s

      • s`` = orthogonalize s` with respect to m

    • get the covariance of s`` and t

Factors & Features

  • Factors

    • unencrypted (possibly cleaned/formatted/etc.) data from our data providers

    • signals not given to users, but are very well-known in finance

    • we always neutralize targets, portfolios, and the Meta Model to these

  • Features

    • encrypted stock market signals given to users for use as machine learning features

    • a dataset is made of several variations of a smaller set of features

    • we usually penalize exposure to these, but are not always 100% neutral

  • V3 Features

    • all features used in our v3 "supermassive" dataset

  • V4 Medium Safe Features

    • there are 5 feature variations in the v4 dataset

    • only 2 of those variations are included in this subset

Targets

  • weekdays

    • Mon - Fri (20D = 20 Days = 4 weeks)

  • returns lag

    • number of days skipped before starting returns calculations

    • (2L = 2 Lag = skip 2 weekdays)

  • timeline XDYL

    • scores over X weekdays with Y days of returns lag

  • neutralizers

    • factors/features to which the target is neutral

  • bins=x

    • values for the target are binned into x distinct bins

  • uniformity = x, y, z, …

    • x% of values in outer 2 bins (e.g. 0 and 1)

    • y% of values in next inner 2 bins (e.g. 0.25 and 0.75)

    • z% of values in next inner bin(s) (e.g. 0.5)

  • target_[name]_20

    • timeline: 20D2L

    • bins=5, uniformity=10%, 40%, 50%

    • neutralizers: Common Factors and/or Features

  • target_[name]_60

    • timeline: 60D2L

    • bins=5, uniformity=10%, 40%, 50%

    • neutralizers: Common Factors and/or Features

Meta Models

  • Stake-Weighted Meta Model (SWMM)

    • A stake-weighted average of Numerai submissions

    • The Numerai Hedge Fund uses this for trading

  • Benchmark Meta Model (BMM)

    • A stake-weighted average of Benchmark Models

Scores

  • data lag

    • number of days it takes our vendors to process returns data

    • scores start returns lag + data lag days after a round closes (usually 2+2=4 days)

  • MMC - Meta Model Contribution

    • correlation contribution of a submission, SWMM, and target_cyrus_20

    • timeline: 20D2L (+ 2 days data lag)

  • CORR20v2 - Correlation 20D2L v2

    • numerai corr of a submission against target_cyrus_20

    • timeline: 20D2L (+ 2 days data lag)

  • CORJ60 - Correlation Jerome 60D2L

    • numerai corr of a submission against target_jerome_60

    • timeline: 60D2L (+ 2 days data lag)

  • BMC - Benchmark Model Contribution

    • correlation contribution of a submission, BMM, and target_cyrus_20

    • timeline: 20D2L (+ 2 days data lag)

  • FNCV3 - Feature Neutral Correlation V3

    • feature neutral corr of a submission, V3 Features, and target_nomi_20

    • timeline: 20D2L (+ 2 days data lag)

  • CWMM - Corr w/ Meta Model

    • s` = tie-kept rank, then gaussianize, then pow 1.5 a submission s

    • calculate pearson correlation between s` and SWMM

    • timeline: 4 days data lag / not dependant on returns

  • MCWNM - Max Corr w/ Numerai Models

    • Maximum pearson correlation of a submission with any other Tournament submission

    • only compared to other submissions made in the same round

    • timeline: 4 days data lag)/ not dependant on returns

  • APCWNM - Average Pairwise Corr w/ Numerai Models

    • Average pearson correlation of a submission with each other Tournament submission

    • only compared to other submissions made in the same round

    • timeline: 4 days data lag / not dependant on returns

Last updated