Numerai Tournament

Search…

OHwA S01E10

From May 7, 2020 / Interview with Jason @Numerai

Arbitrage opened with a brief note on how Office Hours has grown since number one: we’ve had many people come back week after week, with lots of new faces each time. He noted that about half of the people who visit the Office Hours summaries actually read the post, and each week he sees lots of questions posted on Slido and sent to him directly on Rocket.Chat.

Arbitrage gave a shout-out to Patrick for his work on multi-model accounts, then in beta. This feature had been in the works for a while, first teased by Slyfox during Office Hours with Arbitrage S01E02.

Questions from Slido

“I want as much data as possible, but I don’t only want those very difficult, high correlation eras because that will bias to those kinds of events.” -Arbitrage

Following from that question, Arbitrage asked Joakim to expand on a previous conversation from Rocket.Chat about combining eras.

Joakim explained that he wanted a validation set and a test set that were representative of his training data so he can train his model to generalize on the validation data. To do this, he likes to include eras from a wider variety of different regimes in his training, validation, and test sets. He includes eras from early within the data set, as well as some from the Validation 2 set, so his model will generalize as much as possible.

According to Arbitrage, the quickest way to do a spot check on whether a strategy like Joakim’s (or really any training methodology) is working or moving in the right direction is to check the first score after submitting a model: if it shows very high correlation score (positive or negative), that suggests something has gone awry.

“You can post a big score, but that might be a spurious correlation. I don’t want to see a big score, I want something kind of average. But I want my MMC to be high, so I want my correlation with the meta-model to be low.” -Arbitrage

In Rocket.Chat, Joakim mentioned that if the eras are time-ordered, applying something early to a later period is not consistent with time-series studies. That would be true if the data Numerai gives the data scientists weren’t already adjusted for the time period.

Arbitrage frames it this way: each era is a bucket of time that’s been neutralized for whatever correlation exists throughout time. That means if you treat each bin separately and don’t train and test on the same bin, your model should be fine. Tournament participants know that some eras are tougher than others, and Arbitrage believes these represent different market regimes. You can’t adjust your model to compensate for the regimes, however, because we don’t know what the regimes are. The difficult regimes become valuable as additional training data and to test out of sample model performance.

For Arbitrage, he’s looking for around 2.5% MMC as a target (though he’s unsure whether or not that’s feasible), and is aiming to be above 1%. “That’s not science-based,” he said, “that’s just me picking a number.”

**Rappenlager** **asks: Many of the tournament data scientists use multiple accounts: do you know or have an idea of how many people are actually competing?**

Prior to the ten account rule, Arbitrage would take the total number of submissions (currently about 1,100 unstaked and 500 staked submissions per week, according to Mike P) and divide it by three (the previous account limit). While not precise, this would give a rough estimate. Arbitrage’s suspicion is that the number to divide by is probably around five now, to account for users who haven’t used up their ten model slots yet.

A quick poll of the audience revealed that most of the Office Hours regulars have ten (or close to ten) accounts.

Office Hours regulars

Bor mentioned that, with at least 100 people focused solely on data modeling, that would rank Numerai up with some of the top quantitative hedge funds in the world in terms of data resources.

On Richard Craib’s post on the Numerai forum about performance stationarity, Michael Oliver referenced smart sharpe, a modified sharpe calculation that accounts for autocorrelation.

The paper Michael Oliver referenced [*not endorsement*] posits that smart sharpe leads to better model performance out of sample, which would be an advantage for anyone competing in the Numerai tournament. Michael noted that in model selection, smart sharpe offered tighter parameters than traditional sharpe, meaning it was more clear which models performed best according to smart sharpe.

Arbitrage prefaced his answer with the caveat that Erasure Quant falls outside the purview of Office Hours but he would take a stab at it because of the likely overlapping interest. “Good data,” he said, “is by far the biggest challenge for Erasure Quant because good data is not free.”

Obtaining good daily or tick-level data on equities is extremely difficult. The best data source candidate, Arbitrage thinks, is the Quandl Wiki Daily data followed by Big Charts. Another recommended platform is Alpaca, and tournament data scientist Keno mentioned that doing his calculations within Quantopian has worked for him.

If you’re looking for signal using only pricing data, “you’re gonna have a bad time.”
Arbitrage said that in isolation, equities pricing data is probably the most mined data set in the world. Supplemental data, like weather data, is what gives you a predictive edge. Other sources could be sentiment data from something like Stocktwits or accounting data from financial statements like 10-Ks or 10-Qs.

This is a challenge Arbitrage is intimately familiar with. He explained that he likes how he built three of his models and they’ve been performant over time, so he trains them on different subsets of the data and lets them coexist even though they’re highly correlated. Arbitrage allocates his stake based on the models in which he has the most confidence.

Breaking it down: Arbitrage has three main models, each of which generates three sets of predictions (one for each subset of training data) for a total of nine submissions. His final slot, the tenth account, is the model he’s using to experiment with MMC. He uses a manual genetic process, killing off his lowest performing model roughly every ten weeks and replaces it with something new.

The longer you wait to stake, the longer it takes to get paid, so if you’re considering staking but are skittish about NMR price volatility, Arbitrage suggested one strategy might be easing in and trying dollar cost averaging, adding small amounts at regular intervals.

Arbitrage mentioned that he’s withdrawing a portion of the profits within his stake to mitigate some of his risk, but he pointed out that doing so also limits his potential future earnings. “That’s the tradeoff,” he said .

Highly unlikely- that data is very expensive.

Arbitrage’s first idea: MMC needs its own leaderboard with a rank to brag about and put on resumes. As a fund, Numerai wants stability, so Arbitrage’s suggestion is to reward data scientists for keeping their models consistent by having something like a quarterly payout based on MMC rep tied only to performance, not amount at stake. This would give unstaked models a chance to earn some NMR while keeping payouts non-linear.

No NMR for you

Validation 2 contains the Covid-19 market drawdown, so Arbitrage assumes the data starts around mid-February and ends mid-March which would include the highs from February and the selloff into March.

You get lower sharpes with Validation 2 because that data represents a significant anomaly in market time-series data.

There are too many legal and licensing issues with creating a data-repo like Quandl on Erasure.

But, there is merit to having a blockchain-validated repository of data where you can trust the quality of the data because people are staking on it. “Kind of like proof of stake for data,” Arbitrage said.

Where Arbitrage unexpectedly interviews Jason

Jason submitting to the tournament

When daily scores are delayed

Slyfox graciously accepting Jason's vote

Don’t miss the next Office Hours with Arbitrage : follow_ *Numerai on Twitter* *or join the discussion on* *Rocket.Chat* _for the next time and date.

****

Last modified 6mo ago

Copy link