Correlation with the Meta Model (CWMM): Your prediction's correlation to the Meta Model (stake weighted average of all predictions).
Benchmark Model Contribution (BMC): Your prediction's correlation to the target after neutralizing against the stake-weighted Benchmark Models.
Within a single round, submissions will receive 20 score updates until the final score of the round is computed.
For example, here is what the scoring schedule looks like for a hypothetical weekend round opening on Saturday 6th and closing on Monday 8th. The first day of scoring is on Friday 12th, with daily updates every Tuesday-Saturday, until the final score 4 weeks later on Thursday 8th of the next month.
The reason why scoring is done over these 20 days is because the main target is built on 20 days of returns ignoring the first 2 days after round close. This is commonly referred to as "20D2L", where "20D" means "20 days of returns" and "2L" means "ignoring the first two days". Each score update is computed using an expanding window of returns.
For example, the first day of scoring on Friday 12th uses a 1D2L target, which includes returns from Wednesday 10th only. The second day of scoring on Saturday 13th uses a 2D2L target which includes returns from Wednesday 10th through Thursday 11th. The final day of scoring 4 weeks later on Thursday 8th of next month uses a 20D2L target which includes returns starting from Wednesday 10th through Tuesday 6th of the next month.
Only the final scores for rounds count towards a model's live performance.
The 1 year average score is also called reputation and your model's rank on the leaderboard is primarily based on your model's 1 year average TC score. This will change to MMC on Jan 2, 2024.
Diagnostics is a tool to help you compute and visualize your scores over the validation dataset.
An example diagnostics report
If you uploaded your model via Model Upload, then Numerai will automatically run your model over the validation dataset to generate diagnostics.
If you wish, you may also manually run diagnostics by heading over to numer.ai/scores and clicking on the Run Diagnostics button.
To note, all of the scoring code we use to generate diagnostics is also available in our example scripts repository if you wish to replicate this locally.
A word of caution: past performance is no guarantee of future performance. This is especially true in the domain of financial machine learning. Take care not to rely too heavily on validation metrics during your research process to avoid overfitting to the validation dataset. If you train on the validation dataset, then don't expect your in-sample validation metrics to generalize out-of-sample.