Difference between revisions of "User:Flixx/Game Theory for Recruiting"

From The Battle for Wesnoth Wiki
(Description)
 
(4 intermediate revisions by the same user not shown)
Line 20: Line 20:
 
  The Freelands flix rusher 1422   47.82 %   50.28 %
 
  The Freelands flix rusher 1422   47.82 %   50.28 %
 
  The Freelands rusher flix 1426   41.09 %   57.36 %
 
  The Freelands rusher flix 1426   41.09 %   57.36 %
 +
The Freelands rusher default 1156   62.02 %   36.59 %
 +
The Freelands default default 1654   47.40 %   48.97 %
  
 
==Description==
 
==Description==
Line 40: Line 42:
  
 
  3 * 40 - 1 * 30 = 90
 
  3 * 40 - 1 * 30 = 90
  3 *  0 + 1 * 90 = 90.
+
  3 *  0 + 1 * 90 = 90
 
(Then Unit X and unit Y are both -90 effective in the battlefield)
 
(Then Unit X and unit Y are both -90 effective in the battlefield)
  
Line 167: Line 169:
  
 
The Algorithm how to find this mixed Strategy Nash Equilibrium was described by J.D.Williams and can be found in [http://www.rand.org/content/dam/rand/pubs/commercial_books/2007/RAND_CB113-1.pdf The Compleat Strategyst] from 1986 on on Site 219. (It can be read like a cookbook - I implemented it exactly like described.)
 
The Algorithm how to find this mixed Strategy Nash Equilibrium was described by J.D.Williams and can be found in [http://www.rand.org/content/dam/rand/pubs/commercial_books/2007/RAND_CB113-1.pdf The Compleat Strategyst] from 1986 on on Site 219. (It can be read like a cookbook - I implemented it exactly like described.)
 +
 +
For the Code see [https://github.com/flixx/wesnoth-old/tree/game_theory/src/ai/flix my GitHub Fork]

Latest revision as of 13:55, 18 May 2013

(Part of a GSoC Proposal 2013)

This site is an explanation how the Recruitment Phase could make use of Game Theory. This can be especially useful in context of a Counter-Counter Strategies. This is the question "What units should Player 1 in the first turn recruit so that Player 2 cannot counter those units easily".

This problem can be solved by finding a "Mixed Strategy Nash Equilibrium". I already implemented a Proof of Concept. It gives acceptable test-results against the default AI and "Alarantalara's recruiter".

Test-results

For my Test's I modified the maps I used (I enlarged the castles to 10 free hexes and removed all villages) and set the start-gold to 200 to have a perfect "Counter-Counter situation". Side 1 always starts.

Map		Side 1		Side 2		Games	Side 1 won	Side 2 won
-----------------------------------------------------------------------------------
Weldyn Channel	rusher		default		1099	  62.51 %	  31.67 %
Weldyn Channel	flix		default		1634	  63.89 %	  32.31 %
Weldyn Channel	default		flix 		1583	  22.11 %	  72.71 %
Weldyn Channel	flix		rusher		821	  41.41 %	  54.57 %
Weldyn Channel	rusher		flix		810	  38.52 %	  56.42 %
The Freelands	flix		default		1004	  66.73 %	  30.68 %
The Freelands	default		flix		1000	  27.00 %	  71.60 %
The Freelands	flix		rusher		1422	  47.82 %	  50.28 %
The Freelands	rusher		flix		1426	  41.09 %	  57.36 %
The Freelands	rusher		default		1156	  62.02 %	  36.59 %
The Freelands	default		default		1654	  47.40 %	  48.97 %

Description

The Algorithm will find the theoretical best mix to recruit for maximizing own payoffs by only knowing the recruitment-list of the enemy. It is meant to be a part of the overall solution and can later be weighted against the other algorithms.

Each Unit Combination will get a Effectiveness Score (e.g. unit A is 200 effective against Unit B. This implies that unit B is -200 effective against Unit A)

After P1 recruited a mix one could sum those Effectiveness Scores up and then P2 could say about own units something like "My unit X is now 1400 effective in the battlefield, unit Y is -200 effective, ...").

We want to achieve that those summed scores are equal.

Example:

P1/P2	X	Y
A	40	0
B	-30	90

The best thing to do for P1 is here to recruit A and B in proportion 3:1. Because

3 * 40 - 1 * 30 = 90
3 *  0 + 1 * 90 = 90

(Then Unit X and unit Y are both -90 effective in the battlefield)

Game Theory gives us nice tools to find best mixes and *the optimal strategy* aka Nash Equilibrium.

Note that this whole thing is heavily influenced by the algorithm that gives us those Effectivness Scores. I use a slightly improved version of

int recruitment_phase::average_resistance_against(const unit_type& a, const unit_type& b).

I tested the different versions (old average_resistance_against against my average_resistance_against):

Map		Side 1		Side 2		Games	Side 1 won	Side 2 won
Weldyn_Channel	flix (improved)	flix (old)	449	43.43 %		50.11 %
Weldyn_Channel	flix (old)	flix (improved)	465	36.34 %		53.55 %

For now I altered the way the different attacks are weighted against each other and included a weight for the cost of a unit. But I'm sure that this function can be improved further.

(I will do it in context of improving Combat Analyses. And refactor it, the variable names are kind of confusing...)

Often it is not possible and even unnecessary to make a equal mix. See the following example:

P1/P2	X	Y
A	100	-5
B	-10	-10

So obviously P1 should only recruit Unit A, even if P2 can "easily counter" with unit Y. But recruiting a mix would be not the best thing to do here. So in my algorithm it will sometimes happen that only one unit will be recruited. This is maybe not something one would expect for a good Counter-Counter strategy. But if we assume that those effectiveness-scores are accurate then recruiting only one unit is sometimes the theoretical best thing to do.

This is how a full table will look like (example output of my algorithm).

P1/P2			|Elvish Archer|Elvish Fighter|Elvish Scout|Elvish Shaman| Mage  |Merman Hunter|Wose  
Dwarvish Fighter	|    88.75    |    31.32     |    140     |   216.2     |132.2  |   156       |  24.91       
Dwarvish Guardsman	|    38.38    |     0.5865   |    102.8   |   124.9     | 44.21 |    81.05    | -70.89       
Dwarvish Thunderer	|    24.65    |   -31.76     |    122.7   |   158.4     | 52.51 |    85.95    | -83.43       
Dwarvish Ulfserker	|   -28.75    |   -84        |     41.24  |   109.6     |  1.92 |    27.15    | -78.8       
Footpad			|   -48.32    |  -118.4      |     12.94  |    84.77    |-78.36 |     2.714   |-135       
Gryphon Rider		|     9.75    |   -46.45     |     73.88  |   133.3     | 17.18 |    69.67    | -83.28       
Poacher			|     6.681   |   -58.5      |    121.2   |   164.2     | 36.99 |    75.7     |-156.6       
Thief			|   -54.84    |  -143        |     21.39  |   112.2     |-90.87 |    10       |-112.2

And acctually according to this table P1 have to assume that P2 will always recruit a Wose because a Wose would be the best choice against all units. And then we should decide only to recruit Dwarvish Fighter (because they are best against Woses). One could now say that P1 could do better when P2 will recruit not only Woses. This is definitely true.

The important thing is that P1's worst case is minimized. The worst case is still that P2 will only recruit Woses. If P1 would recruit other units than Dwarvish Fighters the worst case would alway be ... well ... worse.

Note that this Game Theory algorithm I implemented for finding the Nash Equilibrium cannot be improved as such. The Game represented by a table like above just have its deterministic Nash Equilibrium. It's well defined. (Zero-Sum Games always have one or more Equilibrias with the same Value).

Note also that we cannot always recruit like we should according to the mixed strategy. For example if the strategy is 13.4% Unit A and 86.6% Unit B then most likely we'll have to round. Rounding is bad when dealing with Nash equilibria. See below for more thoughts on this.

I will now try to explain the concept of a Nash Equilibrium. (Game Theory is cool stuff, take a course if you can ;) )

Game Theory and Nash Equilibrium

I will provide 2 brief general examples to help understanding the concept of Nash Equilibrium and mixed strategies.

The whole thing is about 2 Players Playing a Game. Each Player have different options available and to each option combination there is a payoff for each player. The Players will play simultanously. So they don't know what the opponent have played. (Recruitment does't work like this because P2 can see the recruits of P1, but below is a explanation why this can be a good model for wesnoth)

Games can be shown in tables like this:

P1/P2	X	Y	Z
A	-3	-1	-2
B	10	-5	3
C	5	1	2

This is a example for a zero-sum game. (Usually there are two entries for each combination. One entry for the payoff of each player.)

For example when Player one Plays B and Player 2 plays Z, Player 1 will get a Payoff of 5 and Player 2 a payoff of -5.

The question is now what should player 1 play to maximize his payoff? He could play B and hope P2 will play X to get the 10. But most likely P2 will not play X because other option seem nicer for P2. The best thing P1 could do is to play C and go for the 1 Payoff.

The Stategy P1:C / P2:Y is called Nash Equilibrium.

One could say a Nash Equilibrium is achieved when after the Game both players can say that they did the best choice they could knowing the enemy's choice now. If P2 had known that P1 will play C then Y would still be the best choice. And if P1 had known that P2 will play Y then C would still be the best choice.

Now let's take a look at this example:

P1/P2	X	Y
A	1	-1
B	-1	1

Now what? It seems like there is no Nash Equilibrium, because one player can always say "if I had known that my opponent plays this, I had done something different". The right thing to do is actually to randomize over the options. So P1 will play 50% A and 50% B. This is called a mixed strategy. This is a Nash Equilibrium as well because P2 could say "now that I know that P1 will randomize with 50% the best thing to do is to randomizing with 50% as well".

So while recruiting we try to find a mixed strategy.

Advanced stuff and some drawbacks

One could argue, that finding a Nash Equilibrium is not the appropriate way to find the best strategy, because P2 will know what we have recruited. (When finding a Nash Equilibrium it is usually assumed that both players choose a option without knowing the opponents choice) So one could argue that this is actually something whats called a perfect information Game. And those Games are usually solved by a minmax approach.

A minmax algorithm would indeed give us the best result but it would be expensive because one would have to iterate over all possible recruiting mixes.

But a Nash Equilibrium is actually a nice way to approach the perfect mix. Let's model a Game of wesnoth in a different perspective to make this clear: Each player can recruit let's say 100 units at the beginning (P2 will see the choices of P1). Now each player can send only one random unit at a time into the battlefield. The two units will fight and the players get some payoffs.

This is actually not a to bad model for wesnoth:

Assume P1 has a unit A and B, P2 has a unit X which is good against A and bad against B and a unit Y which is good against B and bad against A. Now what? The odds of the games will decide which units will fight against each other. So it should be not to bad when we assume that the units on the battlefield will be chosen at random for a fight.

So by recruiting we are not playing the Game itself but only telling the opponent with what probabilities he will encounter which unit in the battlefield. So one could say our recruited mix as the mixed strategy we've chosen. The enemy can know our strategy without being in a perfect information game.

Now a drawback: If we only have little money available (and can only recruit 1 or 2 units) than our strategy gets inaccurate. Look at this example

P1/P2	X	Y
A	50	0
B	-3	2

The best strategy for P1 is to recruit A and B in proportion 1:10. But if we only have one unit to recruit (and not 11) one could think that B is a good choice. But then P2 could easily counter with X. This is a extrem example. But it shows us that if the Nash Equilibrium strategy can be bad when rounded.

So one could think about actually running minmax when we can only recruit 1 or 2 units.

For those who are interested

The Algorithm how to find this mixed Strategy Nash Equilibrium was described by J.D.Williams and can be found in The Compleat Strategyst from 1986 on on Site 219. (It can be read like a cookbook - I implemented it exactly like described.)

For the Code see my GitHub Fork

This page was last edited on 18 May 2013, at 13:55.