GSoC Proposal - Position Evaluation
For this year's summer of code season I would like to develop a position evaluator. Reliable position evaluation will improve the current AI in two different ways: In the short term it will be an important tool for developers in order to test their own move planning algorithms. As such it is part of the proposed AI changes for 1.7 (see Dave's forum post). In the long run, it might be possible to couple the position evaluation with move evaluation in order to allow for multi-move forward planning.
Contents
About myself
My name is Henning and I am a 27 year-old PhD student from Zurich, Switzerland. I am currently in my final year of post-graduate study at the Institute of Neuroinformatics where I model biological neural networks and study them theoretically.
The brain has some unique solutions in dealing with complexity in the environment and my aim for the near future is to try to apply some of those principles in real-world applications. Summer of code gives me a great opportunity to test out some of my ideas in a friendly environment.
Contact details
- Email: henning at ini dot phys dot ethz dot ch
- Forums: henning
- gna: henning
- IRC: henning__
- skype: jp0186
Project proposal
Parameter definitions
The factors determining the value of a given position in wesnoth are
- Material
- Development
- Synergy
Material is the basic element determining the value of a wesnoth position and also the most easy to evaluate. The player with more units will have an advantage over the player with less units (if the units are comparable in strength). Similarly, the player with an experienced unit will generally have an advantage over the player with a weaker unit (again, only if the number of units is comparable). Similarly the health of the units under a player's control has a direct influence on the value of his material. Another important factor determining a player's material is gold, both present and future (in the sense of current income).
Development refers to the position of material on the battlefield. It is easily imaginable that a player with a material advantage is in an inferior game position because he hasn't utilized his material to be useful for his purposes. This utilization can involve reaching a strategically valuable position, maneuvering close to a village that can potentially be captured, being able to attack an enemy unit, or blocking the way against an enemy attack. Evaluating this utilization is non-trivial as it depends on overall strategical considerations as well as complex interactions between units.
Synergy refers to the interactions between some friendly units that are not directly captured by the concept of Development. This can refer both to directly to special abilities or indirectly to some desirable combinations of unit specialities. For example, a healing unit in combination with a high damage, low health unit. As with Development these connections are difficult to evaluate within a classical approach.
While Material is certainly very important in determining the strength of a position in wesnoth it has one major flaw in its predictive power - it is not deterministic. Development and Synergy on the other hand can be predicted well with the exception of units being killed.
Egocentric position evaluation
The 10^10000 possible positions in a typical game of wesnoth (as discussed here) do not lend themselves well to a classical tree-search approach with material as the main evaluation factor. Furthermore, the outcome of each move is undeterministic due to the chance involved in fighting between units.
I propose to represent each position in a massively simplified state space which is based on egocentric representation of each piece of material and corresponding areas of influence. This can be done efficiently and intuitively by representing material as nodes in a network. Connections between nodes represent units having a direct beneficial or detrimental influence on each other. Units have a certain internal activity value (roughly corresponding to the unit's HP). Interactions between units can be expressed in terms of excitatory or inhibitory connections that either increase or decrease a units activity. This can evaluate the strength of a players position in terms of development and synergy. The overall energy of a players network consisting of material, development and synergy values is then a correlate of the strength of his position. Evaluating the development and synergy values has the added advantage of providing deterministic position evaluation before any fighting has been done, potentially warning the AI that a certain move weakens the structure of ones units significantly.
I have compiled some example formulae for this in the example below.
Example
Check out this example that will hopefully shed some light on my rather cryptic description above.
General considerations
This scheme has the following advantages:
- It massively simplifies the space of possible moves and provides a simplified context in which moves can be represented.
- It takes into account the overall structure of the position making use of synergetic effects between units.
- It will be able to account for half-moves to maximize network energy.
- It can take into account special abilities by shaping a units areas of influence.
- It could potentially lead to more advanced strategical considerations by identifying the value of each unit in the context of its position in the game.
- A modular position evaluator can either be implemented statically into the current AI or serve as a tool within formula AI, depending on how successful the approach proves to be.
Benchmarking and parameter search
In order to test an evaluation algorithm empirically as well as search for suitable parameter combinations of the network connections, I propose to first build a database of games between two human players (if these players so consent). Armed with this it will be possible to cycle through the games and evaluate each move with a prototype of my move evaluator. The predictive power of an evaluator can be tested by measuring convergence of evaluations. Alternatively data about single-player campaigns could be used. In any case a large database of played games will yield more accurate results in evaluating the position evaluator.
Estimated timeline
In general the aims for the 3 phases are as following:
Time period | Description |
---|---|
20 Apr - 23 May | Phase 0 - Create a dataset of games. Prototyping |
23 May - 6 Jul | Phase 1 - Development of algorithm in C++ with implementation optimized for performance and simplicity. |
6 Jul - 10 Aug | Phase 2 - Integration of algorithm into code base. Collection of feedback. |
There are two main milestones to be achieved during this project:
- The algorithm needs to be developed so that it converges towards the end of the game as one side begins to gain advantage. At the very least the algorithm should have more predictive power than an evaluation based on material in the game alone. I plan to measure the prediction of the algorithm at every stage of the game. If my algorithm is able to predict the outcome of the game either earlier or more reliable than the control then this part of the project will have succeeded.
- Secondly, if milestone one has been achieved the algorithm needs to be implemented in a fashion that is intuitive and useful to both users and developers:
- I will make both the algorithm and test database accessible to the public
- I will write a function to provide in-game feedback to the players (if they so wish).
- Most importantly, I will create an interface between the algorithm and formula AI providing a function that can calculate evaluations during move planning.
Phase 0
- 20-26 Apr: Compile wesnoth. Create game export function.
- 27 Apr-3 May: Create database scenarios and collect data. Start implementing prototype algorithm.
- 4-10 May: Add complexity - fighting. Come up with solution for probability distributions and make algorithm understand chance to kill.
- 11-17 May: Add complexity - special abilities.
- 18-23 May: Add complexity - terrain.
All through phase 0 I will familiarize myself further with the code base in order to be up to speed for phase 1.
Phase 1
- 25-31 May: Develop specification for interface between algorithm and codebase.
- 1-21 Jun: Implement algorithm in C++
- 22-28 Jun: Benchmark algorithm.
- 29 Jun-5 Jul: Write report.
Phase 2
- 6-19 Jul: Integrate algorithm into code base.
- 20-26 Jul: Collect feedback from community
- 27 Jul-2 Aug: Update algorithm according to feedback.
- 3-9 Aug: Write final report.
Phase 3
After the summer of code, if the project has been successful I would like to continue by looking at move generation in combination with the position evaluation algorithm.
Programming experience
Although I do not have a classical engineering background I consider myself a decent self-taught programmer. My languages are C++, perl and matlab. I know how to use build-tools and SVN. Since only half of this proposal is concerned with the actual implementation I am sure I will have enough time to familiarize myself with the code base before having to commit actual changes.
Gaming experience
I am an enthusiastic gamer both on my computer as well as my PS3. I generally, enjoy games that either push the boundaries of what is technologically possible, or surprise me in some way or other. Games I have enjoyed recently are Fallout3, World of Warcraft, and Grand Theft Auto 4.
I used to play a lot of Wesnoth back in what must have been 2005. It is impressive to see how much the game has matured and developed since then. Getting back on the saddle has proved to be quite easy.
Communication skills
I am a native German speaker and speak English fluently (I studied in the UK for 4 years).
Practical considerations
I am mostly done with my thesis and expect to be working on this full-time during the Summer of Code period. I am awake 7 - 23 UTC and should be available on IRC from 7 - 17.