Difference between revisions of "SoC2014 kevin AI"

Latest revision as of 03:43, 17 April 2014

This page is related to Summer of Code 2014

See the list of Summer of Code 2014 Ideas

This is a Summer of Code 2014 student page

Description

Kevin Xi - AI: Improve AI by implementing global attack/retreat decision

A rule-based expert system style solution utilizing dynamic programming to make decisions

Idea

General Idea

All decisions of AI have three global strategic goals to achieve: hit enemy as heavy as possible, take villages as much as possible, protect self as cautious as possible. These three goals ideally should be achieve in every single action but they may conflict with each other, so these goals need to be associated with priority.(p1: hit enemy, p2: grab villages, p3: protect self). The results of these three goals is a bunch of actions, for example, to achieve "hit enemy" goal, the "actions to deal with enemy"(this is a CA) will be executed, that may contains "find a good place to attack", "just stand there but not attack to prevent the enemy escape next turn", "recruit unit suit for battle", etc. And the actions to deal with village may contains "recruit scout-like unit", "let the resilient unit to guard the village", "let the unit with heal ability stand behind", "block enemy from near the village", etc. The actions to deal with protecting may contains "give up village if I have more than enemy", "stand behind the resilient unit", "retreat to the zero power-projection contour line", etc.

The priority is not constant, it is a function of the battlefield situation. For now I focus on time-space-economy(that is, time of day-terrain-gold and villages) because I think these are the three factors that influence all units in the battlefield and tightly associated with three goals. See the "Details of Priorities" part for details.

If in this turn, p1+p2-p3>0, the AI think it is worth to attack, otherwise it retreats.

Actually, I hope to implement CAs that form strategies by its own without hard-coding, only given the battlefield situation. The closest approach as far as I can see is to find patterns inside the rules, which is related to the three strategic goals.

The Attack/Retreat Positioning

When I consider the question on"where to retreat", I am thinking about a kind of "Damage contour map", that is, each location of battlefield is associated with the damage the unit may bear at this location, that can be calculate by power_projection(I hope it can combine with defense of certain unit type at this terrain), Then circle the roughly same value to get a contour map and let unit to stand by a contour line that they can bear the damage on that line. Hopefully if enemies form a line, the same value positions on contour map may also be a line, so AI can form a line in this way. But when some of them punch into the middle of the AI's line, let units retreat to one single side which is near to their leader(so they will not run in every direction, that will leave the path free to its leader), and see if it is good to abandon unit that be surrounded, according to the HP, EXP and gold. So it hopefully form a skirmish line without hard-coding the rules about forming line.

Details of Priorities

Attention: Following part contains some formulas, if they cannot be displayed correctly, please follow https://docs.google.com/document/d/1tHMdXp5-yh4eNymOVk3gQxvCJ26DZtaC9_1xC6cGwmo/edit?usp=sharing

My two goals of constructing priorities function are: a. Covering as much human players' experience rules as possible. b. Giving meaning to the values of priority, making them related to real game value and comparable under one criterion.

The "one criterion" I choose is gold. Every decisions made in the game have something to do with gold. If you use your unit to attack enemy, you lose some HP and your enemy lose some HP, but what you actually do is to use some of your golds to consume enemy's golds, because you use golds to recruit/recall and maintain unit, whose HP can be converted into gold, the same as the enemy. When you retreat, you try to prevent enemy to take your golds more than his consume. If you let some of your units to take a village, you are actually grab the opportunity of getting golds in future turns from your enemy. And usually the goal of a scenario is to minimize your enemy's gold and maximize yours.

So the priorities generally are "how much profit can a CA bring back". The higher priority, the more golds can a CA bring from enemy to you

Priority of village CA

The priority of village CA (p_v)should satisfy the following conditions:

At the beginning of a scenario, the priority should be higher than others.
The priority decreases as the number of own villages increases.
The priority should become lower than other CA when own half of villages, since if the map is balance, the enemy would take another half at same time.
The "half" mentioned above should be changed according to the time of day, this allow ghosts to give up one or two villages when it is sunny.
The priority gets lower as it gets closer to the end of the scenario, since only a few turn remained to let villages generate gold, the sum of golds villages generate is smaller.

Here is the function of priority:

Explanation:

we get villages, enemy cannot get, so we get two times profit. v_c is the number of villages we can get in current turn, g_v is the gold one village can generate per turn, t_s is total number of turns, t_c is current turn, so t_s-t_cis how many turns remain(the total number of turn can not be sure, but it can estimate by the size of map---from my experience, its value is a half of the larger one between the wide and height of map, say, a 40*30 map takes 40/2=20 turns)

This front part of function show us golds our villages generate totally, it can satisfy bullet 1 and 5. To satisfy bullet 2 to 4, I apply sigmoid function to it. v_o is the number of current own villages, T is time of day modifier, v_s is the total number of villages, T+v_s/2 move the inflection point of sigmoid function to satisfy bullet 3 and 4, divide by 5(haven’t check this parameter) because I want to make the curve smoother.

If we fix the front part to 20, as the number of own villages increases, we get this(total village=20 and time of day modifier is -2):

So it seems to satisfy bullet 2 to 4.

Priority of Attack/Defence/Retreat CA

The calculating part base on current combat CA, we simulate an attack inside “sandbox” and convert our lose and enemy’s lose into golds.

The priority, or golds we get is:

g_e is enemy’s lose and g_o is our lose. If in some situation g_o is much greater than g_e, that means, if we attack in this turn, we lose more; if we just stand there, we lose more when next turn enemy attack us, so we retreat to “get” the golds we lose if we stand there.

Summary

Use gold as criterion make priority of CAs comparable and give priority real sense related to the game. We actually make three kinds of decision in every turn, use (gain_of_enemy, gain_of_us) to represent:

And the goal of a scenario is to minimize your enemy's gold and maximize yours, this remind me dynamic programming, maybe I can build something on this.

Elements of Dynamic Programming

Stage and Stage Variable

A stage contains two turns, it start from the current turn and end when the following enemy’s turn finish.

Use class stage_state to record information of stages. Use stage_no_ as stage variable.

State and State Variable

The state is the delta of “gold” between own and enemy.

The state variable is the difference of own_state_ and enemy_state_.

The state variable calculated in the constructor of class stage_state, given the current state.

Decision and Decision Variable

Use class decision to represent decision.

Method decision::calc_decision() take state as argument and calculate the new state of specified decision.

Policy and Optimal Policy

Optimal policy is the sequence of decisions that maximize the state of final stage.

Calculate by calc_optimal_policy().

Equation of State Transition

Add initial state and gain of each stage's decision.

The stage_state hold the result of this value.

Objective Function and Optimal Objective Function

Maximize the state variable.

Details of Prototype

Flow of Prototype

Use a queue to contain the stage_state. First initialize stage_1 as current stage, push it into the queue, then pass the queue to calc_decisions(). After that, we get all the final stage_state in the queue(for now, we look ahead 3 stages, or 6 turns, we have 2 decisions of each stage, so we have 1 stage_state for stage 1, 2 stage_state for stage 2, 4 stage_state for stage 3). Finally, calc_optimal_policy() will find the highest state_value among all 4 stage_state for stage 3, and output informations the stage_state hold in the member decision_.

Example Output

http://pastebin.com/wwzcHE3n

Issues

The state rise high when units die
- Partly because totally 6 turns of state of villages weight too much. I have dropped 0.1 weight for each turn because the further we look ahead, the rougher the state value.
- The very last hitpoint of a unit should be more valuable than first few lost. This make sense because we want injured unit retreat. I am trying multiple functions to show the rise of value for each hitpoint.
- “you can convert gold into units, but this takes some time - 1 turn to recruit, and 1-2 more turns to move to the frontline. So, there is 'current' gold (units on the frontline) and 'future' gold (income and units not yet on the frontline)” as crab_ mentioned.
- Experience of units should take into consider
Currently the intermediate value of hitpoints lost in battle simulation is store in the int unit::hitpoint_, need new double variable to store the expectation of the hitpoints lost, or just run a real simulation. Find out which way is good.
The balance of each part of state(units, villages)
Current version do not have much difference with brute force search. How to show the advantage of dynamic programming, that is, the usage of intermediate results?
The class stage_state may cost too much
Multiple leader: most part of prototype is designed for multiple leaders but some part with “TODO: multiple leader” don’t work for multiple leaders.

Benchmarks

Win 70% games against default RCA AI

Its behaviour achieving the goal of 'analyze and make decisions globally', e.g. sometimes it give up local optimal to achieve global optimal.

Project

Timeline

Time	Tasks
21 Apr - 27 Apr	Prepare step Familiar with the overall structure of Wesnoth project Familiar with the AI part
28 Apr - 11 May	General framework setup step Design a general framework for dynamic programming be applied to battlefield Write design document for it implement dynamic programming framework
12 May - 25 May	Dynamic programming elements setup step Learn more on dynamic programming, understand the advantages and of it Refine the value of state(villages, units), make them comparable under one standard Implement the stage_state utilizing the formula figured out last week Maybe Machine Learning can help to decide the value of state
26 May - 1 Jun	Test environment setup step Find a method to test the result of new AI(maybe AI arena?) Implement it if do not have
2 Jun - 15 Jun	CAs setup step Design and write document for the design of CAs Implement and test corresponding CAs, test them individually Refine the value of state(villages, units) again according to the performance of CAs
16 Jun - 29 Jun	CAs "design-implement-test" iterations step Make the CAs interact with state Plug CAs into the general framework Test and deal with feedback Machine learning may help
30 Jun - 6 Jul	test AI, analyze result, balance argument is there a automatic way to balancing?
7 Jul - 13 Jul	maybe public to players and collect data and respond
14 Jul - 27 Jul	judge if it 'global' enough
28 Jul - 3 Aug	write wiki page, documents debug
4 Aug - 10 Aug	adjust AI according to the responds of community
11 Aug - 17 Aug	test, debug
next	stay here, see if there is someone taking care of the whiteboard part of Wesnoth

Plan

The "general framework" mentioned above: To implement my idea, I will likely to create new stage which is very similar to the current candidate_action_evaluation_loop stage, but utilize dynamic programming to figure out the decision by the states of current turn and a few turns ahead.

The CAs: I plan to work in a "design-implement-test" iterations. The performance of CAs will be improved at the end of each iteration.

The structure of new CA is also very similar to the current CA, hopefully they will be compatible. The new CA will be notified which decision the framework part made, and only the corresponded CAs(CAs designed for specific decision) will be add to the evaluation queue.

IRC

Kevin_Xi

Questionnaire

1) Basics

1.1) Write a small introduction to yourself.

Hi, I am Kevin, a junior Software Engineering student in Beihang University. I love open source, I use and learn from it all the time. I played The Battle of Wesnoth for about a year. In my free time(besides coding), I like taking some MOOC course about computer, game design, music and thinking. I also like watching Star Trek and enjoying Jazz, the great works of Horace Sliver make me feel relaxed.

1.2) State your preferred email address.

kevin.xgr [at] gmail.com

1.3) If you have chosen a nick for IRC and Wesnoth forums, what is it?

Kevin_Xi on IRC, FAKevin on forum.

1.4) Why do you want to participate in summer of code?

I think become part of something bigger than you(open source) is a kind of intrinsic motivation of human being and my heart is always vibrated to the great moment that a software come to life. I believe "Learning by Doing" and want to improve my skills, both on coding and communicating, by participate in a real-life project. The GSoC project is a good trigger for me to get my hands dirty.

1.5) What are you studying, subject, level and school?

I am a third year student on Software Engineering, Beihang University. The course I have finished are: Object-oriented programming with C++, Principles of Computer Organization and Architecture, Software Engineering, Database System Concepts, Network, Operating System, Compiler, System Analysis and Design and so on.

1.6) What country are you from, at what time are you most likely to be able to join IRC?

China. 8:30~10:00am, 1:30~12:00pm UTC+8.

1.7) Do you have other commitments for the summer period ? Do you plan to take any vacations ? If yes, when.

I would like to take this as a full time job, so I don't arrange other things.

2) Experience

2.1) What programs/software have you worked on before?

Part of my individual project is available here: https://github.com/Kevin-Xi?tab=repositories. Many of them are tools for my daily life. For teamwork, the latest project I am working on is a Raspberry Pi based smart home system. I develop web server use Django on Raspberry Pi.

2.2) Have you developed software in a team environment before? (As opposed to hacking on something on your own)

Yes, I teamed up with my classmates on C++, C#, Java, Django project several times. We often follow agile development methods and use git to manage our project.

2.3) Have you participated to the Google Summer of Code before? As a mentor or a student? In what project? Were you successful? If not, why?

No. I just know it this year from my RSS reader

2.4) Are you already involved with any open source development projects? If yes, please describe the project and the scope of your involvement.

No. But I want to get involved. Sometimes I modify Linux kernel and re-compile it to learn it.

2.5) Gaming experience - Are you a gamer?

Yes. I play a lot, enjoy it and feel there are many things to learn from it, so I taked gamification course at Coursera. I do well in class and get a 10 out of 10 in a recently design work.

2.5.1) What type of gamer are you?

According to the Bartle Player Type Model, I am an archiver who like complete all levels and explorer who like finding out all possibilities in game.

2.5.2) What type of games?

Role-playing(Roguelike, nethack is my favourite!), Alternate reality, Educational, Puzzles, Strategy.

2.5.3) What type of opponents do you prefer?

Clever and polite. The most important is I can learn from my opponents.

2.5.4) Are you more interested in story or gameplay?

It depends on the type of game. I like fantasy background in RPG, gameplay in puzzles.

2.5.5) Have you played Wesnoth? If so, tell us roughly for how long and whether you lean towards single player or multiplayer.

I played it for about 1 year. I played all official campaigns and like to be a observer of online battles.

2.6) If you have contributed any patches to Wesnoth, please list them below. You can also list patches that have been submitted but not committed yet and patches that have not been specifically written for GSoC. If you have gained commit access to our repository (during the evaluation period or earlier) please state so.

https://github.com/wesnoth/wesnoth/pull/143

https://github.com/wesnoth/wesnoth/pull/144

https://github.com/wesnoth/wesnoth/pull/152

3) Communication skills

3.1) Though most of our developers are not native English speakers, English is the project's working language. Describe your fluency level in written English.

There are four kinds of abilities one should develop when learning languages: Listening, Speaking, Reading and Writing. For listening, I am able to understand MOOC class videos without subtitles. For Speaking, although I do not have so much chances to talk directly to a native speaker, I sometimes practice my speaking skills by read after some English podcast online. For Reading, I choose origin English version textbook and read technical papers for information I want. For Writing, I write programming log as diary in English.

3.2) What spoken languages are you fluent in?

Chinese, English

3.3) Are you good at interacting with other players? Our developer community is friendly, but the player community can be a bit rough.

Yes, I am a rational and friendly person.

3.4) Do you give constructive advice?

Yes. Sometimes I also give simple ideas to begin brainstorm among team members.

3.5) Do you receive advice well?

I have a systematic approach to take advice and learn from it .

3.6) Are you good at sorting useful criticisms from useless ones?

I think this depend how deep I understand the system. For wesnoth, since I don't familiar now, I wouldn't say I can. But I will try to ask developers and get familiar on it.

3.7) How autonomous are you when developing ? Would you rather discuss intensively changes and not start coding until you know what you want to do or would you rather code a proof of concept to "see how it turn out", taking the risk of having it thrown away if it doesn't match what the project want

As a Software Engineering student, I consider both. But the way I like most is develop a prototype(that allow you to test your idea and fail quickly if it don't work) and test it.

4) Project

4.1) Did you select a project from our list? If that is the case, what project did you select? What do you want to especially concentrate on?

Yes, I choose "Improve AI by implementing global attack/retreat decision". I want to concentrate on figuring out in what situation the AI should change its strategy, in a quantifiable way with the help of fuzzy logic(to be efficient, economic) and other mathematical principles.

4.2) If you have invented your own project, please describe the project and the scope.

No.

4.3) Why did you choose this project?

I like AI since I was a freshman, I have some experiences on AI and got a 100 out of 100 in my Machine Learning course on MOOC. Since AI of wesnoth should include strategy analysis on terrain, time, arms, special abilities of units and many other aspects, I read the famous ancient Chinese book The Art of War by Sun Tzu since I was young, so I like and good at analysing these elements.

4.4) Include an estimated timeline for your work on the project. Don't forget to mention special things like "I booked holidays between A and B" and "I got an exam at ABC and won't be doing much then".

See "Plan". I may need one week off for final exam, it usually happend at the end of June, and another week to return home from university and spend time with my parents, at the beginning of summer vacation.

4.5) Include as much technical detail about your implementation as you can

See the description of prototype part and the code(https://github.com/Kevin-Xi/wesnoth/tree/prototype) of prototype for details.

4.6) What do you expect to gain from this project?

Familiar with the workflow of an open source community and keep working as a part of community. Improve my skill on programming and communication. The joy of accomplishment of the project. And I want to get the paycheck for tuition fee of the next semester.

4.7) What would make you stay in the Wesnoth community after the conclusion of SOC?

Creative, smart and friendly people in the community. Working as a part of something bigger than me.

5) Practical considerations

5.1) Are you familiar with any of the following tools or languages?

Git (used for all commits): I manage almost all of my code, either individual or team work, on git

C++ (language used for all the normal source code): I feel comfortable with the basic and the advanced features of C++, and code a lot.

STL, Boost, Sdl (C++ libraries used by Wesnoth): I use STL, but don't familiar with Boost and Sdl

Python (optional, mainly used for tools): Python is my daily life tool. I catch podcast and process text file with it. And I use Django to set up my web server.

build environments (eg cmake/scons): I use g++ and make

WML (the wesnoth specific scenario language): It "make coding accessible"

Lua (used in combination with WML to create scenarios): No, I don't have experience on Lua. But I can be a fast learner with the experience of C++, python, Common Lisp, Java and so on.

5.2) Which tools do you normally use for development? Why do you use them?

Vim, g++, gdb, make, git. Because these tools are build-in and powerful, and I can learn and modify them since open source. I will use IDE for some big project.

5.3) What programming languages are you fluent in?

C++, Java, Python, basic part of Common Lisp

5.4) Would you mind talking with your mentor on telephone / internet phone?

Not at all, I will send my phone number to my mentor if I am accepted.

@@ Line 129: / Line 129: @@
 **The very last hitpoint of a unit should be more valuable than first few lost. This make sense because we want injured unit retreat. I am trying multiple functions to show the rise of value for each hitpoint.
 **“you can convert gold into units, but this takes some time - 1 turn to recruit, and 1-2 more turns to move to the frontline. So, there is 'current' gold (units on the frontline) and 'future' gold (income and units not yet on the frontline)” as crab_ mentioned.
-*Experience of units should take into consider
+**Experience of units should take into consider
+*Currently the intermediate value of hitpoints lost in battle simulation is store in the int unit::hitpoint_, need new double variable to store the expectation of the hitpoints lost, or just run a real simulation. Find out which way is good.
 *The balance of each part of state(units, villages)
 *Current version do not have much difference with brute force search. How to show the advantage of dynamic programming, that is, the usage of intermediate results?
@@ Line 299: / Line 300: @@
 https://github.com/wesnoth/wesnoth/pull/144
+https://github.com/wesnoth/wesnoth/pull/152
 '''3) Communication skills'''