Tournament Design 2019

Share #feedback at RocketChat

Design Goals

  1. Open - Open to all to play with minimal barrier to entry.

  2. Fair - Payouts are divided amongst users proportional to their performance.

  3. Safe - Attacks and abuses are either impossible or economically inviable.

  4. Profitable - Long term expected value are positive for models with good performance.

  5. Dominion - Rules are clear and factors that determine payouts are under user control.

A summary of changes

In the current design, earning potential is a function of the stake amount (set by user), a dynamic payout factor (based on auction) and a fixed benchmark (set by Numerai). If you beat the benchmark, you stand to win your stake amount times the payout factor. If you miss the benchmark then your entire stake gets burned. The benchmark is the same week over week, but the payout factor is set by the market.

In the new design, earning potential is a function of stake amount (user determined), a dynamic benchmark (based on auction) and a payout curve based on performance relative to benchmark (set by Numerai). The more you beat the benchmark the more you earn (up to 100%). The more you miss the benchmark the more your stake gets burned (up to your entire stake). If you match the benchmark you neither earn nor burn. The payout curve is the same week over week, but the benchmark is set by the market.

That is the core idea. Let’s break it down in detail.

Scoring is assumed to be in AUC space based on this scoring metric analysis.

Current: fixed benchmark, market payout factor

In the current design, you are asked to bid an estimated probability p as part of your stake s. Your earning potential is given by e=s*1-PP where Pis the cutoff probability of the round. At the close of the staking window, Numerai computes the cutoff probability by selecting the group of stakes with the highest estimated probabilities such that the entire prize pool is depleted by the sum of the the group’s earning potential. Stakes that are not selected are returned immediately and not part of the game. The earning potential of your stake is locked in until resolution. If you beat the benchmark you get your entire stake back and get paid your earning potential. If you miss the benchmark your entire stake gets burned.

Below is a simple illustration of how payout can be impacted by the payout factor auction. The benchmark is the same week over week (eg 0.51). The payout factor is set by the market and can change week over week (e.g. in one round it can be +150% and +50% in another).

The higher you bid p, the more likely you will be part of the game but the games you can be part of have worse payout factors. The lower you bid p, the less likely you will be part of the game but the games you can be a part of have better odds. The optimal strategy is for you to bid at the odds you are willing to accept. If you make it into a game, then you are playing at odds that you accept or better. If the odds are worse than what you are willing to accept, then you will simply sit out.

Given that you have no additional reward for doing better than the benchmark, you are incentivized not to maximize your average AUC (which would be aligned with Numerai’s metamodel performance) but to maximize the frequency of beating the benchmark.

If the payout factor ever reaches above 2x you are also incentivized to construct a “1-p” attack where you stake equal amounts on two anticorrelated models profiting off the likely case that one burns but the other earns more than 100%. While this attack is easy to detect, it is problematic to address. Since anticorrelation does not necessarily imply bad models, we do not want to simply ban this behavior. Instead, we want a solution that makes this type of attack economically inviable.

Proposal: Market benchmark, fixed payout curve

In the proposed design, you will instead be asked to bid a target benchmark b as part of your stake s. Your payout or burn is given by =s*(AUC-B) where B is the benchmark of the round and is the payout curve. At the close of the staking window, Numerai computes the benchmark by selecting the group of stakes with the highest target benchmarks such that the entire prize pool is depleted by the sum of the group’s maximum earning potential. The range of is [-1, 1] which means you can at most double your stake and at worst lose it all. Like before, stakes that are not selected are returned immediately and not part of the game. The earning potential of your stake is again locked in until resolution. If you beat the benchmark you get your stake back and get paid . If you miss the benchmark you get burned .

Below is an illustration of how payout can be impacted by the new benchmark auction and payout curve. Here, the benchmark can change week over week (eg 0.5 in one week, 0.51 in another). The payout curve around the benchmark is the same though (eg beating the benchmark by 0.01 in both cases will get a +50% payout factor).

The higher you bid b, the more likely you will be part of the game but the games you can be part of are harder. The lower your target benchmark the less likely you will be part of the game but the games you can be a part of are easier. The optimal strategy is for you to bid at the benchmark you think you will beat on average. If you make it into the game, the benchmark will be what you think you can beat or easier. If the benchmark is harder than you think you can beat, you will simply sit out.

Since profitability is now proportional to average AUC, you are no longer forced to optimize for consistency. We hope that this freedom encourages more creativity in modelling and boosts performance. Go take some risks if you want! Just remember that while it is now much less likely to lose your entire stake, it is still possible.

The curve of the is set to symmetrically reward better-than-expected performance and punish worst-than-expected performance. Since the max payout has an upper bound of +100%, higher total staked amount will be accepted to deplete the same prize pool. We see this as an acceptable tradeoff given the safety against “1-p” attacks. To note, punishing bad performance does not directly help Numerai. Like before, the punishment is there to put skin in the game and prevent abuse.

Options for the payout curve

There are a few options for the payout curve. We propose to use clipped-linear for future rounds.

(1) Simple threshold

As it is now, the only thing that matters is whether you are above or below the benchmark: (x)=sign(x).

Below the benchmark you burn, above the benchmark you earn.

Staking strategy: it is profitable to bid up to your median AUC.

(2) Clipped linear

Payoff is linear, but bounded to (-1, 1): (x)=max(-1, min(1, x/0.02))

0.02 below the benchmark you burn 100%, 0.02 above the benchmark you earn 100%, and it is linear in between (eg: at the benchmark you neither earn nor burn)

Staking strategy: Assuming your AUC is typically in the linear range [0.49, 0.53] it is profitable to bid up to your average AUC.

To note, the band of ±0.02 is just for example.

(3) Sigmoid

Description: A smoother version of the above: (x)=tanh(x/0.02)

You never fully burn nor earn 100%, but your payout is strictly monotonic with your AUC.

Why this change is needed

We recognize that capping the max payout is bothersome, but now that we are close to deploying the change, we can explain why it has to be done.

Having a better than even payout factor opens up some perverse incentives, where a model with worse typical performance but higher volatility can be more profitable.

For example here is a model that tries only to have volatile accuracy per era. Either it or its opposite is going to do very well in each era, so even with a threshold of 0.53 and a p-cutoff of 0.3 it is profitable to submit both (which is better than the example predictions)

When thresholds are high, such models that aim only for volatility will always have an easier time than legitimate models, but hurt us overall.

The previous reward structure is also susceptible to strategies that “usually” perform above the benchmark, but occasionally perform incredibly badly making the average AUC bad overall.

Impact on Rewards

You can look at either the absolute returns or the relative returns.

One subtlety with the absolute returns is that while right now the tournament is competitive and the prize pool is being exhausted, if you kept the same stakes only around a third of the prize pool would be exhausted, since the max payout factor is 1. This means that the absolute returns will necessarily be smaller, so to compare them equally we try scaling the stakes by the rounds’ payout factor to make sure the prize pool is exhausted. The absolute returns end up fairly similar, and should increase as competitors adjust their approach to the new payoffs.

The relative returns are also lower, but we hope this is offset by

  • Volatility being proportionally lower

  • Fewer complete burns

  • Each 0.5% increase in AUC wins you 20% monthly returns, which is not bad at all!

The full results are at https://docs.google.com/spreadsheets/d/1HojQy9NUwRw0mQLri6wEFBAf0GQYRZ9qJurAmqjnYWw/edit?usp=sharing

Looking at absolute returns notice that profit_new_scaled is often higher than profit_old.

Looking at return_new_scaled/std_new and return_old/std_old the returns are lower with the new system, but so is the volatility.

Aside from some curious outliers, most users are better off.

Misc changes to rules

  • Submissions

    • Valid predictions range expanded from (0.3, 0.7) to (0, 1)

    • Predictions no longer required to be within 15 standard deviations of mean

    • Predictions are still required to have >0.2 pearson/spearman correlation with example predictions

  • Staking

    • Minimum stake confidence raised from 0.1 to 0.501 to match min benchmark value

    • Consistency >58% no longer a requirement to stake

    • Concordance is still a requirement to stake

  • Payments

    • In the short term, “partial burning” will be implemented as a “full burn” of your stake + a “return” on the remaining amount. This is because the NMR smart contract does not currently support partial burns.

    • In the future, “partial burning” will be implemented on the smart contract itself as an atomic transaction.

    • “Partial” stakes will still be locked up for the entire duration of the round, but only the “selected” amount will be considered in payout calculations, the rest will always be returned.

Addendum: Analysis code

Requires live_auroc_p_cutoff_stakes.csv

import numpy
import pandas
import matplotlib.pyplot as plt
%matplotlib inline
df = pandas.read_csv("~/Downloads/live_auroc_p_cutoff_stakes.csv")
B = 0.501
W = 0.025
L = 0.693
df["value_scaled"] = df.value * (1-df.p_cutoff) / df.p_cutoff
df["return_old"] = numpy.where(df.live_logloss < 0.693, (1-df.p_cutoff) / df.p_cutoff, -1)
df["return_new"] = ((df.live_auroc - B)/W).clip(lower=-1, upper=1)
df["profit_old"] = df.return_old * df.value
df["profit_new"] = df.return_new * df.value
df["profit_new_scaled"] = df.return_new * df.value_scaled
sums = df.groupby(
"username"
)[[
"value", "value_scaled", "profit_old","profit_new","profit_new_scaled"
]].sum()
stds = df.groupby(
"username"
)[[
"return_old", "return_new"
]].std().rename(columns={"return_old": "std_old", "return_new": "std_new"})
tdf = pandas.concat([sums, stds], axis=1).sort_values(by="profit_old", ascending=False)
tdf["return_old"] = tdf.profit_old / tdf.value
tdf["return_new"] = tdf.profit_new / tdf.value
tdf["return_new_scaled"] = tdf.profit_new_scaled / tdf.value_scaled
ax=tdf.plot(
kind="scatter",
x="profit_old",
y="profit_new_scaled",
figsize=(7,7),
alpha=0.5
)
ax.plot([-2000, 2000], [-2000, 2000])
tdf