Moving On Up – Improving Planetary Annihilation’s Ranking System

Moving On Up – Improving Planetary Annihilation’s Ranking System

With thanks to: PAG_elodea, yaegz and lokiCML

Uber Entertainment introduced the Planetary Annihilation ladder at the end of October in build 74484. Since then new maps have been added to the pool and the player distribution across leagues has changed. However, the functionality of the ladder itself has remained fundamentally the same.

Recently a row blew up on the Uber forums regarding smurf accounts and it seemed like a good time to examine the ladder, how it works and how it can be improved. Planetary Annihilation is not the first game, or even the first RTS, to implement a ladder and lessons can be learned from other games.

Glossary of Terms

Rating – Your matchmaking score

MMR – The term Blizzard uses for your rating

Rank – Your place in the ladder standings

Ladder – A ranked list of every player who has used the matchmaking system

League – A sub-division of the ladder

Division – A sub-division of a league

Rating Systems

But first let’s look to understand rating systems themselves.

Almost everyone has heard of Elo, the rating system developed for Chess. It was created in a time before computers were commonplace, it was a system which made it easy to work out rating changes by hand. This simplicity is also one of Elo’s weaknesses, it doesn’t make use of the computing power available to us.

Enter Glicko, the system from which the Planetary Annihilation ladder is derived. It is the basis for almost all video game ranking systems, including Microsoft TrueSkill™.

Developed by Professor Mark Glickman in 1995 as an improvement to Elo, his primary contribution was to introduce a measurement of reliability. The idea behind this was to measure how accurate a player’s rating was, based on the time since it had last been tested. To this end he came up with Rating Deviation (RD) or standard deviation (in statistical terms) which is a measurement of uncertainty in a given rating. A high RD shows that the player’s rating is unreliable, either because they’re not playing frequently or are a beginner. A low RD shows that the player has recently been competing regularly.[1]

A player’s rating is changed by game outcomes, but RD is a reflection of the time period that lapses between them. The effect of this is that a player’s RD always decreases as they play games and their rating becomes increasingly certain, while it increases during the period they’re not playing.

Rating changes are not necessarily equal between players in the Glicko system, it is highly dependent on their RD. This is completely different than Elo where one player’s rating increases by an amount equal to the amount their opponent’s rating decreases by.[1]

Glickman states:

“The system does not conserve rating points – and with good reason! Suppose two players both have ratings of 1700, except one has not played in awhile and the other [is] playing constantly. In the former case, the player’s rating is not a reliable measure while in the latter case the rating is a fairly reliable measure. Let’s say the player with the uncertain rating defeats the player with the precisely measured rating. Then I would claim that the player with the imprecisely measured rating should have his rating increase a fair amount (because we have learned something informative from defeating a player with a precisely measured ability) and the player with the precise rating should have his rating decrease by a very small amount (because losing to a player with an imprecise rating contains little information). That’s the intuitive gist of my extension to the Elo system.

On average, the system will stay roughly constant (by the law of large numbers). In other words, the above scenario in the long run should occur just as often with the imprecisely rated player losing as winning.” [2]

In the Glicko system the player’s strength is supposed to be shown as a confidence interval. This is done by taking the player’s rating and subtracting it twice from the RD, which is used for the lowest value of the interval. For the highest value of the interval the player’s rating plus twice the RD is used.[1]

An example from Glickman:

“for example, if a player’s rating is 1850 and the RD is 50, the interval would go from 1750 to 1950. We would then say that we’re 95% confident that the player’s actual strength is between 1750 and 1950. When a player has a low RD, the interval would be narrow, so that we would be 95% confident about a player’s strength being in a small interval of values.”
[1]

“Each player can be characterized as having a true (but unknown) rating that may be thought of as the player’s average ability. We never get to know that value, partly because we only observe a finite number of games, but also because that true rating changes over time as a player’s ability changes. But we can *estimate* the unknown rating. Rather than restrict oneself to a single estimate of the true rating, we can describe our estimate as an*interval* of plausible values. The interval is wider if we are less sure about the player’s unknown true rating, and the interval is narrower if we are more sure about the unknown rating. The RD quantifies the uncertainty in terms of probability…” [2]

Glicko uses a rating period which is a collection of games over a period of time, with ratings being calculated at the end of this period. A rating period could be several minutes or months, it is entirely at the discretion of adminstrators. The prior period is used in the calculation for the next rating period and Glicko works best when the number of games in a rating period is an average of 5-10 per-player.[1]

Glicko-2 was a further developed of Glicko and introduced rating volatility (σ).

“Every player in the Glicko-2 system has a rating, a rating deviation, RD, and a rating volatility. The volatility measure indicates the degree of expected fluctuation in a player’s rating. The volatility measure is high when a player has erratic performances (e.g., when the player has had exceptionally strong results after a period of stability), and the volatility measure is low when the player performs at a consistent level.” [3]

The volatility measure is calculated at the end of each rating period.

“The Glicko-2 system works best when the number of games in a rating period is moderate to large, say an average of at least 10-15 games per player in a rating period. The rating scale for Glicko-2 is different from that of the original Glicko system. However, it is easy to go back and forth between the two scales.” [3]

Thus a full blown Glicko-2 implementation consists of:

  • Rating – how good is this player
  • Rating deviation (RD) – how certain are we of that rating
  • Rating volatility (σ) – how consistent are they
  • Rating period – the time over which a rating change is measured

Limitations Of Glicko

  • Designed for 1v1: it does not handle teams or free-for-alls well. This was one of the reasons Microsoft developed their TrueSkill™ rating system.
  • Does not score draws.
  • RD grows in a linear fashion between games played, though this may not map well to the impact downtime has on a player’s skill level.
  • No built-in mechanism for rating decay, players have to play for a loss of skill to be represented in their rating and thus it works best where there are incentives for players to play.
  • Ratings are only updated at the end of a rating period and Mark Glickman’s recommendation is that a rating period should contain 5-15 games (depending on the version of Glicko in use). This removes the immediate feedback a system such as Elo provides.
  • People may avoid playing so as to build up their RD and thus boost their ranking upon their return. Of course, if they lose this can lead to a steep drop as well.
  • Suffers from rating overrun and underrun, where the player’s rating is no longer aligned with their playing ability. Elo is subject to this as well, but a large RD can magnify the problem.

What’s The Point Of a Ladder?

At the end of the day, despite what you may think a ladder exists for, it’s really about getting you matches which take you as close to a 50% win/loss ratio as possible. Your rating is not something that you “earn”; it is something you discover about yourself.​

Glicko belongs to a family of rating systems that are probabilistic (“best guess”) in nature. Elo, Glicko, and TrueSkill™ are all Bayesian-based rating systems. All these systems have an inherent degree of uncertainty to them, which means one should not obsess too much over ranks; the ladder exists to get you challenging games. If you want to prove you’re the best player the game has to offer then you should be entering tournaments, or events like King of the Planet.

King of the Planet

The Planetary Annihilation Ladder

The Planetary Annihilation ladder uses a derivative of Glicko. No further information has been provided by Uber on how it works or which version of Glicko is in use. It’s possible that due to this inaccuracies will exist within this article, but I have done my best to validate the information present.

Planetary Annihilation Uber Matchmaking Ladder Ranking

How it works

Players are first required to complete five placement games. These will be played against players in their placement games and those already ranked, impacting the rating of both accordingly. Following completion of placement games the player will be entered into one of five leagues: bronze, silver, gold, platinum and uber.

Players are not distributed evenly between the leagues, rather the largest population exists in silver while the smallest is in uber. There appear to be no barriers between leagues. You simply need to increase your hidden Glicko rating to a point where you fall within the distribution of another league to be placed immediately within it.

Planetary Annihilation League Distribution

Matchmaking and ranking appear to both be a construct of the same rating. An opponent as close to  your level as skill as possible is found, with the search parameters widening the longer you’re looking for a game. The resulting point loss or gain is a direct outcome in the difference between your ratings and your relative RDs. Therefore the top player in uber has the highest rating while the bottom player in bronze has the lowest.

Problems With The Planetary Annihilation Ladder

1. Ratings are inaccurate

Given that ranks are updated following every game it would appear that Uber are using a very small Glicko rating period. This is suboptimal for accuracy in a Glicko system, but is likely due to the immediacy of feedback being a necessary reward to keep a player interested and delays in ratings and league placement were raised during the PTE period. Tying ranks to ratings so directly is potentially impacting the quality of the matchmaking system and the accuracy of ratings.

2. Lack of visible competition

There is no means by which you can see the ratings of players other than the top ten of each league and yourself. This focuses your attention primarily on gaining ranks as there is no means by which you can see the status of other players, either out-of-or-in-game. This can make for a somewhat lonely ladder experience as your rank goes up, but you don’t see any names of individuals you’re beating or competing with. You don’t even know what league your opponent is in.

3. The only way to win is not to play

There is no rating decay or removal of inactive players (UPDATE: since publication Uber Entertainment have introduced the hiding of players who have not played a ladder game within two weeks). This becomes glaringly obvious in this shot of the top 10 players, taken on the 31st December 2014. Despite not having played for two weeks, yaegz remains the top player. This removes the incentive of players to play on the ladder once they have achieved a satisfactory rank, it then creates frustration in players below them as the primary way to drive up your rating (and thus your rank) is by beating players with a higher rating than yourself.

Yaegz is literally in a position where the only way to win is not to play.

Planetary Annihilation Ladder Top 10

4. Top 10s hold no meaning

There appears to be nothing dividing the leagues beyond a distribution ratio; only the uber top 10 holds any real meaning or consistency. A top player of any other league is liable to move at a moments notice, they only need a small number of players to join the ladder lower than them, or to outscore the player directly above them on the next league. This makes it hard for the game to build any names or personalities that people care about.

5. No rewards beyond rank

There’s no information about you available beyond your rank. It means that rank is the sole carrot offered for playing on the ladder. Given that you could lose rank as easily as you gain it, this can lead to ladder anxiety being a big problem in Planetary Annihilation.

The Starcraft 2 Ladder

Starcraft 2 Logo

Starcraft 2 was built from the ground up for esports and has been a leader in how a game builds a system to encourage players to keep gaming. Indeed, League of Legends abandoned its initial Elo driven ladder and moved to a league and division system very similar to that used by Starcraft 2.

We’ll cover how things worked in 2014.

Players must first play five placement matches (with an option for up to 50 practice matches on simplified maps) and are then put into one of seven leagues: bronze, silver, gold, platinum, diamond, master and grandmaster. Each league is compromised of multiple divisions, all equal to one another, but each consisting of up to 100 players. For example, the first hundred players to qualify for silver would be put into silver 1, after which the division silver 2 was formed. Each division actually has its own unique name, but you get the idea.

The grandmaster league has special rules for entry and is limited to 200 players. Inactivity leads to swifty being dropped from it.

Starcraft 2 has three values of interest: points, bonus pool and matchmaking rating (MMR).

Starcraft 2 1v1 Diamond Division Void Ray IndigoPoints are what drive rankings within a division; you win games you gain points, you lose games you lose points. All of this is relative to the number of points your opponent has, and it’s heavily weighted towards giving you a lot more points for wins than get taken for losses. On top of this is the bonus pool, which is a set of bonus points that accumulate over time with a cap on the amount you can get each week. When you win a game you can double the points received with bonus pool points, and when you lose you can remove those points from your bonus pool rather than your points total. It’s a system which encourages people to play at least enough to tap out their bonus pool, and it means players returning from an absence have an opportunity to quickly make strides up their division.

From a player’s perspective they are competing with 100 other players for promotion to the next league, but the matchmaking system ignores the leagues and divisions entirely (in the same way PA does) and matches according to their hidden MMR. The MMR is almost certainly a Glicko derivative.

Behind-the-scenes Starcraft 2 is actually working in a similar fashion to any other ladder. Your MMR is really all that matters, with the correlation between points and MMR being tenuous at best; certainly your rank within your division doesn’t necessarily tell you a lot about your standing in the wider league. MMR is the driver for everything, but Blizzard have put a lot of chrome on top of it that makes it feel like you’re always progressing and have a reason to play. There are a manageable number of visible individuals you feel you are competing with, you’re almost never in danger of losing more points than you gain, the very act of playing the game rewards you almost regardless of how well you do.

Promotions will happen only after your MMR has stabilised for a period in a range suitable for  a higher league than the one you are currently part of. It is possible to skip leagues entirely if your MMR increases rapidly enough. There is also a confidence buffer in place, it is not enough to be slightly better than the lowest ranked members of the league above you to take their place. Demotions cannot happen past the middle of the season so as to encourage people to keep competing rather than playing it safe.

Starcraft 2 XP rewardsBeyond the ladder itself additional rewards are offered through earning XP with games played, which unlock rewards such as new portraits. Likewise, achievements can be earned for things like winning a specific amount of games. It provides additional goals for a player to keep aiming for from a company which has proven itself a master of reward cycles with World of Warcraft.

Fixing the Planetary Annihilation Ladder

So how do we resolve these issues and encourage people to play on the ladder on a more consistent basis?

1. Seasons

Currently the standings as they are will only be subject to a major change if the balance were to turn on its head, or there was a major influx of new players with a wide range of skills. Barring that the ladder will likely become increasingly static over time, and a rash of losses early on can make it feel like you’re simply fighting an uphill battle for the rest of your life.

League of Legends Morgana Victorious SkinThe answer to this is the introduction of seasons. Starcraft 2 has eight week seasons, while League of Legends has an annual season that lasts several months. Both have the same thing in common: they offer everyone a fresh start and the feeling that everything is to play for. League of Legends uses the time between seasons to introduce major changes to the game.

People love achievements, halls of fame, and blank slates that allow them to ‘try again’ for the top ranks. League of Legends is an excellent example of how to make this work, rewarding players with unique badges, league specific borders visible to other players during loading, etc. It gets people playing so they don’t miss out on the rewards and can help reinvigorate the player base at a time they might be flagging.

Planetary Annihilation has the ability to offer rewards such as:

  • Unique commander skins
  • Custom menu backgrounds
  • Loading screen items visible to your opponent, such as borders
  • Season badges

Such rewards could vary depending on your performance in a season. For example, your loading screen could reflect the division you ended the last season in, while players who ranked gold or higher might get a special commander skin (note skin, not model) that they could use.

Rewards should be kept simple enough that they can be easily produced on a rolling schedule, while being visible enough that players will want them.

2. Unlink ranking from rating

What makes Starcraft 2 work so well is that the entire mechanism for rankings is hidden and the player base broken into divisions. As it’s harder to make direct comparisons between players, people feel less threatened by them and think less about their rank and more about their league. Going from 50 to 1 is a good way to measure that you’re doing better, but the true goal is to make it from silver to gold.

Starcraft 2 Bonus Points PoolBy adopting a points system similar to that used in Starcraft 2 you provide a means to allow players to always feel like they’re making progress. You can introduce bonus pools so people aren’t as threatened by losing and players who have a more casual schedule aren’t put off playing entirely by the insurmountable lead the more hardcore players have. This results in players almost always gaining points each time they play, which helps them feel rewarded for playing and encourages them to play more. Think of the bonus pool like the free chips you get at a casino to encourage you to play.

It also means you can assign a better rating period to your ratings as rank is no longer linked to rating, thus ratings become more accurate. And as people have divisions in which to feel like they’re achieving something, you can be more conservative with your promotions and demotions, allowing for a better certainty that someone should move up or down, making it all the more meaningful when it happens.

This also solves the problem of inactive players. If someone isn’t playing then they’re going to be overtaken by people who are active and earning points. That the inactive player has a higher rating is irrelevant because it’s not visible anywhere and is being used purely for matchmaking, not scoring, and RD gets a chance to kick in and sort out their rating should they return.

3. Divisions

Planetary Annihilation’s ladder has all its players spread between five leagues which can make things a little impersonal, and for those at the bottom the top can seem a long way off. It also makes the ranking system look more accurate than it is, where there’s a #1 in each league.

Starcraft 2 Leagues

By breaking bronze, silver, gold and platinum into divisions you remove this false sense of accuracy from the rankings. More importantly, you give players a goal that feels obtainable. Instead of needing to claw their way past nearly one thousand players, now it’s more like a hundred. You have your own little self-contained area with people you feeling you’re competing with and can directly compare your performance to, all without impacting matchmaking which completely ignores the division structure.

This change would obviously need to be tied into improved visibility. I want to see my entire division, not just the top 10. At this time there isn’t even an API we at eXodus can hook into to pull this information for you.

4. Player profiles

Planetary Annihilation player profile concept

Players need carrots to keep them going, something more than a rating, they need goals. Achievements for the 500th win, for earning a promotion, for winning with nothing but ships. Everything about them needs to be gathered into a place where they can view what they’ve achieved and give themselves further goals.

Right now there’s nothing in Planetary Annihilation that collects all this, and the information that is available is scattered between the armory, replays and leaderboard screens. A player can get more information about themselves from eXodus eSports than they can from the game. Just check my eXodus profile, you get information on win rates, winning streaks, time taken to win, links to games I’ve played, full stat breakdowns, and VODs where they exist. Anyone who wants to pull this information on me can, and it gives me a way to show off any achievements I might have. PAStats Ladder even provides me a nice signature I can use.

Uber are in a much better position to do this. They can put in more stats, more achievements, show off those badges locked away in the armory; they could make all this information accessible from in-game from the point you’re in a lobby with that player. Want to know how often they win on the map you’re about to play? Click their name. Stats breakdown of their last game on this map? Click their name. And they get to show off to you that Kickstarter backer badge at the same time.

It makes the game more than just ranking up. Countless games and companies have shown that players love the ability to decorate their own home or profile. Link it into playing the game and it becomes an addictive reward cycle that encourages further play.

5. Map Vetoes

PAStats map pool veto

There’s always going to be that map you can’t stand, the one you groan when you see it has been selected for your next game. Continually being forced to play on something you really, truly dislike, is a surefire way to put you off playing.

This one is possibly a little premature given that only six maps exist in the ladder pool. However, that pool will continue to expand and so this is a feature that should already be being looked into. It was part of the PA Stats matchmaking system, I hope we’ll see it in the official system soon.

Conclusion

The Planetary Annihilation Ladder is not bad, indeed it works very well for considering it’s the first implementation. Uber have shown a good understanding of competitive gaming with the maps they’ve created too. Yet there’s a lot of room for improvement still and I hope that this article helps point them in the right direction and aids in creating a strong, active, multiplayer community.

If you feel there are further areas for improvement that I didn’t list here please raise them in the comments section below.

Further Reading

Reference

1. Glickman, M. (n.d.). Glicko Ratings. [online] Mark Glickman’s World. Available at: http://www.glicko.net/glicko/glicko.pdf [Accessed 16 Dec. 2014].

2. Vek/Glickman. (2008). FICS Help: glicko. [online] Freechess.org. Available at: http://www.freechess.org/Help/HelpFiles/glicko.html [Accessed 16 Dec. 2014].

3. Glickman, M. (2013). Example of the Glicko-2 system. [online] Mark Glickman’s World. Available at: http://www.glicko.net/glicko/glicko2.pdf [Accessed 16 Dec. 2014].

Published: 5th January, 2015

About the author

Queller AI

Quitch is the creator of the Queller AI, a smarter and more humanlike AI for Planetary Annihilation. He's also very active in the community and likes to put together resources designed to help new players get more from the game. He can regularly be found on PA Chat.

Comments

    Uber recently introduced a change whereby you will be delisted from the ladder if you haven’t played in the last seven days. I don’t believe this is a good long-term solution. It’s both simultaneously too short and too long a period.

    To someone at the top of Uber it’s no effort at all to play one game a week to maintain position. To a casual player it’s incredibly harsh to lose your rank just because you didn’t play. It also means rankings become unstable and less meaningful because you’re constantly bobbing about as other players get caught in the filter. Will people get promoted and demoted off the back of this to maintain distribution? That would really ruin the significance of the leagues. Furthermore it makes the leagues look smaller and the multiplayer community less healthy, decreasing new player interest.

    At the end of the day it doesn’t resolve the issue. Indeed, by providing a timeframe you tell a player exactly how little effort they need to put in to hold #1 once they get there. Players really need a reason to play as often as possible, this leads to a healthier community and helps drive the long-term lifespan of the game.

Leave a Reply