Using the TrueSkill Server

[Home] [Acknowledgements]

TrueSkill is one of the most lauded methods of rating and ranking game players developed to date.

This TrueSkill server provides access to the TrueSkill updates via either a form entry or URL.

In either instance understanding the TrueSkill parameters will be essential.

TrueSkill Parameters

TrueSkill models a players skill with two distinct parameters, each of which is measured in arbitrary units we can call skill points:

The beauty and importance of this is that TrueSkill starts everyone off with a given measure of skill (mean, µ) and high uncertainty (standard deviation, σ).

Every player is given the same initial value of mean and standard deviation and each time a game is recorded, the means are adjusted standard deviations will drop (confidence rises). Specifically TrueSkill will take points away from the losers and given to the winners so that they total number of points among all players in a game remains the same. These are labelled:

A third number is associated with each player, which is a conservative measure of skill. The mean (µ) is a measure of the presumed skill, but the standard deviation (σ) describes how uncertain we are about that. The conservative measure of skill then is calculated as a single number from the mean and standard deviation. This is the single number used for ranking players and comparing their skill level and represents the level of skill we are confident they have, at least. It could be higher, and the more games they play the lower the standard deviation will fall as uncertainty drops and confidence rises.

This TrueSkill server defines the conservative skill measure as:

The rating at outset is thus by definition, 0:

With this understanding you can now interpret all of the TrueSkill parameters.

Side note: The TrueSkill documentation never gives this conservative measure of skill a name or symbol, but loosely implies that it is called the true skill. This server was developed specifically as a first step towards supporting leaderboard rankings for a competitive gamers group calling themselves CoGs (for Competitive Gamers), and so true skill is named rating (in the manner of chess ratings) and measured in teeth (cogs have teeth) alluding also to the voracity of any given player (like the teeth of a lion) and the Greek symbol eta (η) chosen as intentional pun on teeth (eater) and conveniently resembling the letter n (for number of teeth).

Player Parameters (for a given game result)

These parameters are specific to the the players (and/or teams) playing the game.

Place A number indicating the placing. The first player would be 1, the second 2, the third 3 etc. Ties can be entered so if two players tied in second place for example they would both have a Place of 2. The actual number is not relevant to the TrueSkill calculation, players are simply sorted by Place to determine who beat whom. The winners taking points away from the losers respectively.
Name The name of the player. Each player must have a unique name but otherwise, it's not relevant to the TrueSkill calculation excepting that the results are listed by name again. This is your identifier for each player.
Mean (µ) The current mean (as described above) which the player has before the game you're recording was played. TrueSkill will undertake to calculate a new mean based on this players placing in the game.
Standard Deviation (σ) The current standard deviation (as described above) which the player has before the game you're recording was played. TrueSkill will undertake to calculate a new standard deviation based on the game parameters you specified.
Partial Play Weighting (ω) A weighting for partial play. Let's say a given player could not play the game to completion because the wife called and was having a baby. Well, you can estimate what portion of the game they did play and this will moderate how much TrueSkill adjusts their mean and standard deviation results. The default of 1, is playing to completion, 0 in turn means they didn't even start, their mean and standard deviation won't be touched, why are you enterig them at all? And 0.5 would mean they played for half the game, 0.75, 3/4 of the game and 0.25, 1/4 of the game and so on.

Team Play

TrueSkill supports team play (in fact individual play is just modeled as teams of 1 internally). In which case there is Place and Name for each team, and each player has a Name but no Place. In other words all the players in one team score together the same placing.

Game Parameters

These parameters are specific to the game being played. Each game will have it's own unique set of parameters, though TrueSkill provides some convenient defaults.

Initial Mean (µ0) The initial value of the mean as described above. The default value in TrueSkill is 25 points. This is completely arbitrary and you could use any value. Important only is that for any given game, league or leaderboard you are managing that all players who enter do so with the same initial mean - to be consistent and fair.
Initial Standard Deviation (σ0) The initial value of the standard deviation as described above.

The default value is one third of the initial mean, or 25 ÷ 3, which is 81/3 or 8.333... if you prefer.

This is a completely arbitrary but a value that worked empirically for the TrueSkill developers in their initial environment (the Xbox matchmaking system).

You can use any value you like, but the server constrains you to use one that is not greater than the initial mean (said another way, let us not begin with uncertainty greater than the measure).

Skill Factor (ß) Described in the literature as the difference in µ values that would map to approximately 80% chance of the higher µ holder of winning. Meaning: The Trueskill authors argue that this reflects the balance of skill and luck in a game. If ß is high more luck is modeled, and if it is low, more skill.

The default value is one sixth of the initial mean, or 25 ÷ 6, which is 41/6 or 4.166... if you prefer.

This again is completely arbitrary but a value that worked empirically for the TrueSkill developers in their initial environment (the Xbox matchmaking system).

It turns out they did some subsequent research comparing actual game outcomes against Trueskill predictions to tune ß. They arrived at the following values as guidelines for Xbox games:

  • 3.33 for Golf (a game of almost pure skill)
  • 5.00 for Car racing
  • 20.8 for UNO (a game of chance)
Dynamics Factor (τ) The Dynamics Factor (τ) models the uncertainty in a players skill that is acquired due to absence between games. Technically it tries to model the fact that while not playing, a player might not be practicing and losing skill, or might be practicing and gaining skill, we don't know, all we know is that our confidence that we know their skill diminishes.

Left alone TrueSkill will bring the standard deviation closer and closer to 0 with every game you play, getting more and more confident.

The Dynamics Factor (τ) is a measure of residual uncertainty in this process. It is added to the standard deviation (σ) on each update. Thus the higher τ is, the less confidence in any given µ there is and the lower τ is the more confidence in any given µ there is.

Technically of course, this might be a reasonable model if most or all players have similar spans between games, i.e. are consistently playing. If players have very disparate periods of time between plays it fails in any way to model any difference in their loss of or our uncertainty in, their skill

The default value in TrueSkill is one hundredth of the initial standard deviation (σ0), or 25 ÷ 3 ÷ 100, which is about 0.08333...

Draw Probability (p) The probability (0 to 1, meaning 0% to 100%) of a draw occurring in this game.

This affects the interpretation of draws by TrueSkill. In essence if a draw is recorded between two or more players TrueSkill will tend to move their means (µ) toward each other. If a draw is very likely however, this result says little about the individual player skills, and the adjustment to µ will be lower, and if a draw is very unlikely then scoring a draw suggest much more strongly that the players are well matched and the adjustment to µ is accordingly higher.

The default value is 0.1 (or 10%) which is completely arbitrary.

You can of course, with enough game data recorded measure the probability of a draw in retrospect. This is another reason to independently keep a record of all game results, so that you could if desired, recalculate all rankings by running each played game through the server once again with a different draw probability.

Delta (δ) Delta is the convergence parameter. TrueSkill runs itself in circles adjusting players µ and σ values trying to find values that best predict the recorded outcome. The smaller δ is the longer it will take but more accurate the result will be, and the higher δ is the faster the calculation and the less accurate the result will be.

The default value is 0.0001, an arbitrarily small value.

Not likely you'll need to play with this, but you can experiment.

Using the Form

The form should be fairly self explanatory. It is simple and asks you to enter the game, player (and/or) team parameters, after which you can click Calculate New Ratings to produce results.

Things to note when using the form:

Using the URL

As a web service the TrueSkill server is designed to feed results to a client in a convenient format.

URL parameters can be set to define any or all the TrueSkill parameters above and request a calculation and/or the format of the results desired.

Parameters are as follows:

URL Parameter Description
Game Info Parameters
iMu Initial Mean (µ0)
iSigma Initial Standard Deviation (σ0)
Beta Skill Factor (ß)
Tau Dynamics Factor (τ)
pDraw Draw Probability (p)
Delta Delta (δ)
Game Results Parameters

Each of these may be specified as a comma separated list or as a list of parameters or a combination thereof. If the parameter is repeated all repeat entries are joined with a comma in between them before  further processing. So that these are all identical in effect:

?Players=Jack,Jill,Bonnie,Clyde

?Players=Jack&Players=Jill&Players=Bonnie&Players=Clyde

?Players=Jack,Jill&Players=Bonnie,Clyde

Players A comma separated list of player names.
Ranking A comma separated list of of rankings (places), which maps 1 to 1 to the list of Players in the same order.
Mus A comma separated list of of means (μs), which maps 1 to 1 to the list of Players in the same order.
Sigmas A comma separated list of of sigmas (σs), which maps 1 to 1 to the list of Players in the same order.
Weights A comma separated list of of weights (ωs), which maps 1 to 1 to the list of Players in the same order.
Teams Optionally, if Team Play is in effect.

A comma separated list of of team names.

PlayerTeams Optionally, if Team Play is in effect.

A comma separated list of of team names, which maps 1 to 1 to the list of Players in the same order and which maps each player into one of the teams listed in the Teams parameter.

Control Parameters
Go If present and set to true, yes or 1, will cause an immediate TrueSkill calculation to be performed and results presented. The equivalent of clicking Calculate New Ratings on the form
Format If present can specify a format to present the results in:
CSV
A comma separated list of results is returned. For simple parsing by a client, and/or viewing in any CSV viewer like a spreadsheet. A header line is included.

XML
An XML version of the results is returned. Again for simple parsing by an XML aware client or viewing in an XML viewer.

HTML
A clean HTML table, without the entry form. Primarily for visual rendering in the browser.

If not specified then the results are displayed after the entry form as an HTML table.

URL Examples

Jill beat Jack, display the form.

Jill beat Jack, calculate results and display them in CSV format.

Two teams, guys versus gals, gals won, calculate results and display in form.

Same teams but specify players current ratings and some game info.