I’m a fan of Aligulac, a statistics and prediction website for professional StarCraft II. The site maintains a database of all professional tournament matches played since the game’s early days, searchable via a sleek and user-friendly UI. (Data is inputted by volunteers). You can easily drill down into head-to-head performance of particular players, historical tournament results, all kinds of good stuff. Plus, the owner is a pretty nice dude who was very cooperative with me when I used Aligulac as one of my data sources for an article on region locking.
Aligulac uses its tournament data to compute skill ratings for professional players. Helpfully, it breaks these ratings down by race, enabling viewers to see if players are experts in certain match-ups or have one match-up as an Achilles heel. This data is fascinating, revealing things like Serral’s relative weakness in ZvZ (a trait shared by several other Zergs, perhaps due to the volatility of the match-up) and Maru’s general consistency across all match-ups.
That the very best professional players can end up with large disparities in per-matchup ratings got me thinking about your average Joe ladder player. Would it make sense to break everyone’s rating down by match-up? And if so - would that data be useful beyond a fun statistic on one’s profile page?
My TvT is Terrible
Like many people, I have a match-up that I am particularly bad at - TvT. There are so many scenarios where the bottom just falls out from under me - doom drops, YOLO suiciding my army into tanks, getting locked down by a siege viking push, and so on. It’s funny, because while I find TvZ to be the hardest match-up to actually play out, I also find it to be the easiest to win at, because it’s strategically more straight-forward. I can often clear out opponents two or three hundred MMR above me; TvT, on the other hand, is a crapshoot.
I don’t think this is that crazy a state of affairs. Nowadays, when I grind StarCraft I really don’t play all that much, preferring to learn key builds and be efficient with my time. That doesn’t get a whole lot of mileage from the more volatile and build-order dependent TvT. And I think the reason I “punch above my weight” in TvZ is that my MMR is deflated from my TvT woes.
I’ve always wondered what it would look like if I (and everyone else) had a separate rating by match-up - different MMRs for TvT, TvZ, and TvP, respectively, in the case of my Terran. An upside of doing this is that it would result in players converging toward a 50% win rate by match-up, instead of a 50% win rate overall. I think a problem with the current system is that if you have a particularly weak match-up, it will anchor your overall rating downward, which will unintentionally inflate your win rates in the other match-ups. The matchmaker doesn’t take this into account when matching players, meaning it creates unnecessarily lopsided games.
Blizzard actually set a small precedent for this idea when it separated MMR by race back in 2016. That was very reasonable - it doesn’t make sense for your off-races to have the same MMR as your main race. I’d argue you could extend that logic to your best and worst match-ups having the same MMR, too.
I think this also ameliorates a couple of tricky edge cases about the ladder. One is players that instant leave a particular match-up that they don’t like. This artificially lowers their MMR and gives them a much higher win rate in the other two match-ups, making it essentially the same as inadvertent smurfing; smurfing is bad. Separate MMR by match-up would help with this by ring-fencing each match-up’s MMR.
But I would also say that I think one reason people do this is that a discrepancy between their weakest and strongest match-ups can cause them to have an unusually low win rate in their weakest match-up, making it a frustrating experience that they seek to avoid. And I think StarCraft is a sufficiently asymmetric game that this is not that uncommon, especially at lower levels where players are not as well-rounded. Splitting out MMRs and guiding players toward a 50% win rate in every match-up is likely to make each match-up feel fairer and more fun to the average ladder player, which might reduce this avoidance behavior.
Another nicety about separate MMRs by match-up is that when players are learning the game for the first time (or coming back from a long break), they can focus on one match-up at a time without much penalty. Instead of getting pummeled in match-ups they haven’t studied yet or trying to learn one build for all three match-ups (perhaps the most common /r/allthingsterran Reddit question of all time, anecdotally), they’ll get fair games across all match-ups even if one is significantly more polished than the other two.
Details, Details
I guess a natural follow-up question to separate-MMR-by-match-up would be to ask, why stop there? Why not separate MMR by map, time of day, or day of the week?
Well, I think the potential benefits of separate MMRs need to be judged against the north star goal of accuracy. MMR has built-in uncertainty when players play their placements and first few follow-up games; after enough playtime, this eventually hits zero (I think). But no rating system is perfect, and everyone fluctuates up or down from their “real” rating. The more ways you slice MMR, the more uncertainty you introduce, and the easier it is for someone’s rating(s) to get more skewed from reality.
I suggest per match-up as a break-down because it’s minimally disruptive (3 ratings instead of 1, for StarCraft) and with a lot of potential for real divergence - meaning, it’s likely that rating discrepancies after a good number of games reflect a real skill difference. If a game has, say, ten factions, then this would probably be unworkable, too, and a single overall rating would be better. Same goes for team games, where there’s too many match-up combinations.
How would placements (initial rating for new accounts) work? I think to maximize speed of placement - to make it as new-player friendly as possible - it might be better to start out on a single rating across match-ups, and split them into per-matchup ratings once a player has sufficient data. The process of a new player starting on the ladder of a new game and losing until they reach their correct rating is painful; I wouldn’t want to make that worse.
Queue times is a tricky question, too. What if one match-up is so much lower or higher than others that it queues noticeably faster, because there’s so many more players to match against? I actually don’t think this is that big of a problem - a couple hundred MMR point difference may be a large gap in skill, but anecdotally I don’t think there’s much of a queue time difference. That said, I think it would be good to collect data on this to see how it plays out in practice, and add guardrails as needed - like a reduced queue priority for a match-up if a player has too many of the same match-up in a row. (Honestly, I think such a feature would be useful even without separate-MMR-by-matchup).
There are also a number of game systems that depend on a single overall rating, like league and automated tournaments. I think a simple solution here would be to use the average of all the per-match-up ratings. Note that this brings up the somewhat related idea that per-match-up ratings are really just there to ensure higher match quality. They don’t even have to be publicly revealed; the game’s UI could just display an overall average, while under the covers the matchmaker uses per-match-up ratings to create more balanced games.
A North Star
I’m sure there are edge cases I’m not thinking about, and I’d love to hear about them to help shore up the rigor of my thinking around this idea. But I also want to make my argument in a different way, that’s less about plugging holes and more about the general problem statement of the matchmaker.
I’d argue that higher accuracy in matchmaking - a higher likelihood of placing players in evenly-matched games - generally leads to a better competitive experience. And I’d further argue that more accurate skill ratings are a reasonable proxy for accuracy in matchmaking. Thus, we should seek to make skill ratings as accurate as possible - and I’d argue that a separate skill rating by match-up is more accurate than a single overall rating.
I think there’s precedent for this in the industry, too. CS2 recently introduced ranking-by-map, which makes sense in the context of that game, where maps are longer-lived than they are in your average RTS, and where some players only play one or two maps. I don’t think the same division makes sense in RTS, where the map pool (ideally) changes multiple times a year; MMR-by-map would introduce more skew than accuracy, from my perspective. But the broader point is that other games are also moving away from a “single skill rating” notion to something that’s more accurate.
I get the notion that improving StarCraft’s matchmaker may be out of scope, given the franchise isn’t actively developed at this time. But I think any limited faction game would benefit from this; if Stormgate or Immortal or ZeroSpace end up with only a handful of races, then I’d make the same suggestion for them, too. I’m a big believer in the idea that there’s a lot of potential to plumb from improving matchmaking systems in RTS. I definitely don’t think this would be trivial to implement; but I think it’s an idea worth investing in, because the long-term payoff is worth it.
Until next time!
brownbear
If you’d like, you can follow me on Twitter, Facebook, and Instagram, and check out my YouTube and Twitch channels.