# Definitions

Described below are the vast majority of definitions for functions and statistical tools used to publish scores. Read the open-sourced code at [numerai-tools/scoring](https://github.com/numerai/numerai-tools/blob/master/numerai_tools/scoring.py). Install the package with:

```bash
> pip install numerai_tools
```

## Statistics

* tie-broken rank
  * [percentile rank](https://en.wikipedia.org/wiki/Percentile_rank) a series
  * break ties based on id / index
* tie-kept rank
  * [percentile rank ](https://en.wikipedia.org/wiki/Percentile_rank)a series
  * for each set of ties set their ranks to the average of that set's tie-broken ranks
* correlation
  * correlation coefficient between two series
* spearman correlation
  * [spearman correlation coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) between live target and predictions
  * different than tie-broken-rank correlation b/c spearman ranking keeps ties by assigning mean rank
* pearson correlation
  * [pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) between live target and predictions
  * different that other correlations b/c pearson does not use ranking
* tie-broken-rank correlation
  * correlation between live target and tie-broken ranked predictions (w/ sorted index, no nans)
  * NOTE: This is a pearson correlation, but rank the predictions, so it behaves more like a spearman. It is impossible to achieve 1.0 correlation because targets still have ties but predictions do not.
* variance normalize
  * given vector s, normalize its standard deviation to 1
* power x / pow x
  * given vector s, exponentiate each value of s to some power x, ignoring sign
* gaussianize
  * given vector s, make s unit norm by dividing standard deviation of s
* neutralization
  * given vector s, find the orthogonal component s' WRT a matrix of neutralizers N:
  * s' = s -(N dot (N\_inverse dot s))
* orthogonalize
  * similar to neutralize, but is faster for 2 centered column vectors
  * given vectors u and v, find the component of v that is orthogonal to u:
  * v - (u ⦻ ( dot(transpose(v), u) / dot(transpose(u), u) )
* numerai corr
  * given prediction vector s and target vector t find the correlation between s and t:
    * s\` = tie-kept rank, then gaussianize, then pow 1.5 vector s
    * t\` = pow 1.5 vector t
    * calculate the pearson correlation of s\` and t\`
* feature neutral corr
  * given prediction vector s, matrix of features to neutralize F, and target vector t, find the correlation of s with t after neutralizing to F:
    * s\` = tie-kept rank, then gaussianize s
    * s\`\` = neutralize s\` to F, then variance normalize
    * calculate numerai corr of s\`\` and t
* correlation contribution
  * given target vector t, meta model vector m, and prediction vector s, find how much s contributes to m’s correlation with t:
    * m\` = tie-kept rank then gaussianize m
    * s\` = tie-kept rank then gaussianize s
    * s\`\` = orthogonalize s\` with respect to m\`
  * get the covariance of s\`\` and t

### Factors & Features

* Factors
  * unencrypted (possibly cleaned/formatted/etc.) data from our data providers
  * signals not given to users, but are very well-known in finance
  * we always neutralize targets, portfolios, and the Meta Model to these
* Features
  * encrypted stock market signals given to users for use as machine learning features
  * a dataset is made of several variations of a smaller set of features
  * we usually penalize exposure to these, but are not always 100% neutral
* V3 Features
  * all features used in our v3 "supermassive" dataset
* V4 Medium Safe Features
  * there are 5 feature variations in the v4 dataset
  * only 2 of those variations are included in this subset

### Targets

* weekdays
  * Mon - Fri (20D = 20 Days = 4 weeks)
* returns lag
  * number of days skipped before starting returns calculations
  * (2L = 2 Lag = skip 2 weekdays)
* timeline XDYL
  * scores over X weekdays with Y days of returns lag
* neutralizers
  * factors/features to which the target is neutral
* bins=x
  * values for the target are binned into x distinct bins
* uniformity = x, y, z, …
  * x% of values in outer 2 bins (e.g. 0 and 1)
  * y% of values in next inner 2 bins (e.g. 0.25 and 0.75)
  * z% of values in next inner bin(s) (e.g. 0.5)
  * …
* target\_\[name]\_20
  * timeline: 20D2L
  * bins=5, uniformity=10%, 40%, 50%
  * neutralizers: Common Factors and/or Features
* target\_\[name]\_60
  * timeline: 60D2L
  * bins=5, uniformity=10%, 40%, 50%
  * neutralizers: Common Factors and/or Features

### Meta Models

Meta Models aggregate submissions into a single signal that Numerai uses to trade:

* Stake-Weighted Meta Model (SWMM)
  * A stake-weighted average of Numerai submissions
  * The Numerai Hedge Fund uses this for trading
* Benchmark Meta Model (BMM)
  * A stake-weighted average of Benchmark Models

### Scores

* data lag
  * number of days it takes our vendors to process returns data
  * scores start returns lag + data lag days after a round closes (usually 2+2=4 days)
* MMC - Meta Model Contribution
  * correlation contribution of a submission, SWMM, and target\_cyrus\_20
  * timeline: 20D2L (+ 2 days data lag)
* CORR20v2 - Correlation 20D2L v2
  * numerai corr of a submission against target\_cyrus\_20
  * timeline: 20D2L (+ 2 days data lag)
* CORJ60 - Correlation Jerome 60D2L
  * numerai corr of a submission against target\_jerome\_60
  * timeline: 60D2L (+ 2 days data lag)
* BMC - Benchmark Model Contribution
  * correlation contribution of a submission, BMM, and target\_cyrus\_20
  * timeline: 20D2L (+ 2 days data lag)
* FNCV3 - Feature Neutral Correlation V3
  * feature neutral corr of a submission, V3 Features, and target\_nomi\_20
  * timeline: 20D2L (+ 2 days data lag)
* CWMM - Corr w/ Meta Model
  * s\` = tie-kept rank, then gaussianize, then pow 1.5 a submission s
  * calculate pearson correlation between s\` and SWMM
  * timeline: 4 days data lag / not dependant on returns
* MCWNM - Max Corr w/ Numerai Models
  * Maximum pearson correlation of a submission with any other Tournament submission
  * only compared to other submissions made in the same round
  * timeline: 4 days data lag)/ not dependant on returns
* APCWNM - Average Pairwise Corr w/ Numerai Models
  * Average pearson correlation of a submission with each other Tournament submission
  * only compared to other submissions made in the same round
  * timeline: 4 days data lag / not dependant on returns


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.numer.ai/numerai-tournament/scoring/definitions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
