It actually felt nice my Viper gave his life for me at some point. The Finals version plays with 17 heroes-we removed Lich because his abilities were changed significantly in Dota version 7.20. We believe these issues are fundamentally solvable, and solving them could be interesting in its own right.
![dota replay player dota replay player](https://i.ytimg.com/vi/R2FAtuyQoJA/maxresdefault.jpg)
Imagine how hard it is for a human to learn a new hero when everyone else has mastered theirs! We haven't yet had time to investigate why, but our hypotheses range from insufficient model capacity to needing better matchmaking for the expanded hero pool to requiring more training time for new heroes to catch up to old heroes.
DOTA REPLAY PLAYER PRO
Although they were still improving, they weren't learning fast enough to reach pro level before Finals. We spent several weeks training with hero pools up to 25 heroes, bringing those heroes to approximately 5k MMR (about 95th percentile of Dota players). We hypothesized the same would be true going to even more heroes, and after The International, we put a lot of effort into integrating new ones. We saw very little slowdown in training going from 5 to 18 heroes. To make this work, we've continued to flesh out our surgery tooling so that we can start from trained parameters even across substantial architecture changes. To the best of our knowledge, this is the first time an RL agent has been trained using such a long-lived training run. In each case, we were able to transfer the model over and continue training-something that is an open challenge for RL in other domains.
DOTA REPLAY PLAYER PATCH
The current version of OpenAI Five has been training continuously since June 2018, despite changes to the model size and the game rules (including some fairly large game patch updates and newly implemented features). The Finals version of OpenAI Five has a 99.9% winrate versus the TI version. In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months (up from about 10,000 years over 1.5 realtime months as of The International), for an average of 250 years of simulated experience per day. A steep slope after any of these indicates OpenAI Five adapting to that change depending on the change the evaluation may be unfair to the versions before. This graph evaluates all bots on the final game rules (1 courier, patch 7.21, etc)-even those trained on older ones. The graph is roughly linear, meaning that OpenAI Five benefited continually from additional compute (note this is a log-log plot, since the x-axis is logarithm of compute and TrueSkill corresponds roughly to exponential progress). OpenAI Five's TrueSkill as we've applied additional training compute, with lines demarcating major system changes (moving to single courier increasing LSTM size to 4096 units upgrading to patch versions 7.20 and 7.21 and starting to learn buyback). So we increased the scale of compute in the only way available to us: training for longer. But after The International, we'd already dedicated the vast majority of our project's compute to training a single OpenAI Five model.
![dota replay player dota replay player](https://7ckngmad.files.wordpress.com/2016/04/replay.png)
In many previous phases of the project, we'd drive further progress by increasing our training scale. OpenAI Five's victories on Saturday, as compared to its losses at The International 2018, are due to a major change: 8x more training compute. This isn't the end of our Dota work-we think that Dota is a much more intrinsically interesting and difficult (and now well-understood!) environment for RL development than the standard ones used today. We are retiring OpenAI Five as a competitor today, but progress made and technology developed will continue to drive our future work. But we think decreasing the amount of experience is a next challenge for RL. This limitation may not be as bad as sounds-for example, we used Rapid to control a robotic hand to dexterously reorient a block, trained entirely in simulation and executed on a physical robot. The surprising power of today's RL algorithms comes at the cost of massive amounts of experience, which can be impractical outside of a game or simulated environment. The results exceeded our wildest expectations, and we produced a world-class Dota bot without hitting any fundamental performance limits. To build OpenAI Five, we created a system called Rapid which let us run PPO at previously unprecedented scale.
DOTA REPLAY PLAYER CODE
It uses the same general-purpose learning code whether those numbers represent the state of a Dota game (about 20,000 numbers) or robotic hand (about 200).
![dota replay player dota replay player](https://gamesync.us/wp-content/uploads/2018/03/Dota-Plus-1-1080x675.jpg)
OpenAI Five sees the world as a bunch of numbers that it must decipher.