GsocAIProjectProgress

From The Battle for Wesnoth Wiki

GSOC AI Project Progress

Basic Idea

The basic idea is to develop libraries of FormulaAI moves based on eval functions. The C++ part will then be able to adjust weights on these moves to 'tweak' them in each scenario.

For example, consider a recruitment example:

Eval functions in FormulaAI will be provided which will associate a 'score' to each unit the AI can recruit. Based on these scores, the AI will recruit particular units. Along with each of these 'scores' is an associated weight, originally set to 1. Now suppose due fog of war or some other event, the AI had no knowledge of an enemy tactic or unit and a certain unit which the Formula AI ranked very high begins to be massacred due to this incomplete knowledge. The weight associated with that particular unit can now be adjusted to some value < 1 by the AI. This will decrease the chance of the AI choosing these units, even though the FormulaAI scores them fairly high.

The point of this learning system is that deterministic rules are sufficient for many situations, but they cannot predict everything. There will always be some area of incomplete knowledge, i.e. we cannot code into the deterministic rules complete knowledge of the game and every situation the AI will ever face. In fact, experienced players may even be able to infer some of the deterministic rules and use this knowledge to their advantage. The learning system will give the AI the ability to tweak it's strategy on the fly to cope with player strategies that the deterministic rules cannot deal with. Recruitment is less complex then a full fledged AI and the learning system may not be as necessary as it will be in a full fledged AI, but recruitment serves as a useful illustration.

Questions

Some of the main questions that come up from this idea:

  1. ) How does the FormulaAI provide the move and evals to the C++ learning system?
  2. ) How will the C++ learning system evaluate game state and know if it is in a similar situation to one it has been in before (i.e. how does it know which weights to associate to which moves)?
  3. ) How do we evaluate the success/failure of a move?
  4. ) How much should we adjust the weights based on this evaluation?

1.) For this I think the formulaAI can provide a list of <move,score> pairs. This provides a simple way for the AI to choose a move, simply pick the move with the highest associated score. This also makes it easy to apply weights, before choosing a particular move, it will apply the weight associated to each move to it's associated score. This leads to the second question, how will the AI know if a certain move is the same as one it has seen previously?

2.) The search space is too complex to keep any kind of state table. I think the best way to associate weights to moves is to 'categorize' the moves. For instance, in the recruitment example the categorization is easy, we simply categorize each recruitment move by the unit that move recruits. The C++ learning system can then keep a list of the categories and the associated weights.

For the more complex examples of non-recruitment moves, I think we can extend this idea of categorization. A very simple high level move categorization is: attack, move, defend. We might be able to further categorize attack moves by tuples (attacking unit, defending unit, terrain). Move moves might be categorized by the tuple (moving unit, terrain), etc.

I think this idea will allow us to associate weights with moves via these categories. There will always be a finite number of these tuples and we need not track the entire state of the space for the purpose of associating weights to moves.

The deterministic eval functions which associate the initial scores to moves are what evaluate the state space in terms of the overall team goals, etc. Even if a < 1 weight is associated with a move, if the eval functions determine the move to be the best move by a large margin, it still has a high chance of being selected. If this move is chosen and executed and is found to be successful, the associated weight will be increased. If the move is deemed to be a failure, the weight will be further decreased. In this way, we won't throw out very strong moves based on one failure. To offset the chance of a very strong move being chosen, it most have been found to be a failure more then once or perhaps an epic failure one time (i.e. loss of a key team member, something fairly catastrophic). This makes the learning system somewhat robust in terms of flukes in the RNG, etc.

3., 4.) This brings up another question(s), how to adjust the weights. We don't want to adjust the weights too fast, or simple RNG fluke could make it so that a strong, viable move is never chosen. We don't want to adjust to small, or the learning system will not be effective over a single scenario. We can call this weight step size alpha, and finding a good alpha value as well as functions to evaluate the success and failure of the moves are additional questions to be answered. In my AI experience, finding a good alpha value is more or less a trial and error process. We can make a close initial guess, but play testing and fine tuning will most likely be necessary to find the optimal alpha.

The success/failure of a move will also be an important component. This might also change depending on the scenario. One possible idea is to also write this in Formula AI, thus allowing it easily to be changed on a scenario by scenario basis. The basic idea is that after a move is taken, the new situation is evaluated and a score is given. We might even be able to use the already defined eval functions to produce this score, but we want to come up with a delta value, i.e. the difference of the score before the move and after the move. Then this delta value will be applied to the weight associated with the move as such:

weight_new = weight_old + delta * alpha

Where alpha is our step size as discussed earlier. In this way, a negative delta will cause the weight to decrease since the move resulted in a poorer game state and vice versa for a good move. Again, it's important to stress the importance of the alpha size and delta values to be calculated carefully. A poor move might be better in the long run (or a very good move might look bad due to a bad RNG round) and we don't want to punish moves too severely for causing a negative delta. On the other hand, we don't want to adjust weights too slowly.

The other nice thing is the learning system can be turned off quite easily, perhaps for an initial experimentation stage, difficulty setting, or if the scenario designer wishes the AI to strictly followed a supplied FormulaAI script, by simply using a flag to determine if the weights should be calculated or not. Since they are initialized to 1, it's a trivial thing to turn this system on and off.

AI API

AI API: Formula AI

Here we define how Formula AI should be written (Formula AI scripts, Formula AI libraries, etc.)

AI API: Formula AI & WML

Here we define how the formula AI will be included in WML and how the WML will be processed. This layer should be transparent to the AI itself, but will be the one mainly used by scenario designers and AI designers.

WML Tags

Currently formula can be included with formula=. The AI will need to know what the purpose of some formula is, i.e. we might have special formula for recruitment, candidate moves selection, etc. We also wish to return an integer which would be an evaluation for the proposed move.

The following syntax is proposed to accomplish this:

[register_candidate_move]
 name= <name of move>
 type= movement/recruitment/etc
 action= <action returning formula>
 evaluation= <int returning formula>
[/register_candidate_move]

AI API: Formula AI -> C++

Here we define how the Formula AI interfaces with the C++ part (i.e. how should moves be supplied, how the moves will be selected and weights applied)

This page was last edited on 26 May 2008, at 17:59.