GSoC sachith500 Proposal

From The Battle for Wesnoth Wiki


This page is related to Summer of Code 2014
See the list of Summer of Code 2014 Ideas



This is a Summer of Code 2014 student page


Description

Sachith Seneviratne - Multiplayer Data Analysis Proposal

Although Multiplayer games are being archived currently, no data is saved regarding the winner(s). Recording and presenting this information would allow for easier balancing and feedback for content creators. Specifically, this project will add a means by which players can indicate who won, as well as parse the saved data and store in a database as much useful information regarding the games as possible.

Deliverables

Priorities are divided among
Optional - may not get implemented, but can be a future use of having this project implemented.
High Priority - will most probably be implemented.
Critical - Will definitely be implemented.

Provide winner of game whenever possible

  • [Critical] A button for players to resign with. On click, it will print "Good Game!" in chat, and auto-quit the game. At the end of the game it will display the winners and this part will also be included in the replay.

Provide a database with easy extensibility to support different future types that may need to be entered.

  • [Critical] A MySQL / SQLite database with a connector which can store generated stats.

Stats and a way for players/content creators to gain access to them.

  • [Critical] Generate and save a checksum(hash) that would allow for eras to be uniquely identified.

Current checksums:
Multiplayer OOS checksums in mp-debug mode
https://github.com/wesnoth/wesnoth/blob/master/src/unit.cpp#L3217 - unit checksums for recruitment validation

  • [High Priority] Set up the data webpage.

Two of the main priorities in terms of design for this project, are extensibility and reliability.
Extensibility is very important due to the ever changing nature of the game content. By incorporating a focus on extensibility into the design, future enhancements and changes would be easily supported.
Reliability is important because although guidelines have been provided for WML, they may not always be followed. The system should never break while trying to go out of it's way to account for such issues. In fact, one possible outcome of this project is that content creators may be more motivated to stick to guidelines, as it will let them receive more accurate feedback.

Implementation Details

This section will describe the major components as well as the design decisions and implementations which are a part of this project. The components will also be developed with maximum extensibility in mind (So that changes to WML, such as new tags etc, can be easily supported in the future). The implementation will feature a top-down approach to

Winner Detection

Implement a means of identifying winners in multiplayer games. This would additionally involve ensuring that the replay would contain the part leading upto the players resigning and immediately after that as well.

  • Step 1 : Identify winners by having players declare a winner at the end of the game.
  • Step 2 : Implement advanced winner detection, specially with respect to custom games. (this will be implemented on the parser, most probably)

Data Analysis

Core Components

  • WML Parser
  • Database to store results of analysis

This mainly involves implementing a parser (Currently decided language is C++) as well as a database (MySQL) that stores parsed information.

Parser

The parser will target WML and attempt to derive useful information. This will involve mostly standard string processing within the tags. Certain tags like the end-level tag will be of particular interest with regards to winner detection especially for custom scenarios. The parser will be designed so as to support future tags as well as easy changes to the nature of tagging. Certain data may not need to be parsed due to already being available in-game (such as unit recruit/advancement/death information). Modules that currently do such processing could send that information to the parser.

Database

Since the database is going to be used for a lot of lookup, the current plan is to keep it sufficiently denormalised to optimize performance, while at the same time allow for enough normalisation to reduce needless redundancy.
The idea is to save as much basic data about a game as possible so as to allow a wide variety of statistical techniques to be applied on them.
One possible issue that may come up is uniquely identifying eras (and thereby players etc). The solution to this problem is to implement a checksum(hash) to uniquely identify each era + version. This could then be used as a primary key in the database.
Important: Note that columns with a "!" at the end may require further decisions as it relates to certain things like collecting player stats.

Schema

The schema is a work in progress and is subject to change over the course of the project. Below is a list of the main tables and a short description listing the expected structure and logic behind it.


  • Game Table
    • Stores overall information about games (game_id,num_turns,num_players,winning_team,duration, ...)
  • Player Entry table
    • Stores information regarding the performance of each individual side in a game. Note that a single player may control 2 sides of the same race,leader combination, so (Game, player, race, leader) need not necessarily form a candidate key.
    • Will include data such as game_num,player_num(!),faction,leader,gold_gathered,gold_spent,damage_dealt,damage_taken,healing_done, ...)
  • Unit Table
    • This table will contain details such as, unit_name,num_recruited,num_died,num_advanced,xp,damage_dealt,healing_done, ...
    • It might also be interesting to store details on an individual unit basis (could allow for unit trait analysis), but this would probably be too much data.
    • Saving data on a per unit basis may provide too expensive in terms of space. In this case the next level would be to provide aggregate stats of unit types per game (which would use fewer details). On the other extreme it is also possible to just collect aggregate stats over all games for different unit types.

Note- if the Leader Table is implemented as below, leader details would not be included in the unit table.

  • Leader Table
    • Stores data about the performance by leaders in various games. This will give valuable stats about how different leaders perform on different maps,against different other factions and even other leaders.
    • This table will provide interesting insights into how leaders affect the game as a whole.
    • It may be more viable to store the leader details in the unit table itself, if individual unit details are saved. Otherwise, leader details will probably need its own table.
Data Visualisation

The idea is to expose the stats that are generated by the 2 other components, to players and content creators. Due to the collection of data being on the most basic level possible, it should be possible to support a large amount of customization to the queries that can be run at the user's request.
Note that this component has lower priority overall compared to the above components. However, the project will attempt it's best to make at least a basic presentation of the collected stats available to everyone.

Some interesting stats that maybe provided:

  • Race vs Race analysis
  • Era stats
  • Race combinations effectiveness analysis
  • Unit effectiveness
  • Leader efficacy

Timeline

The timeline has been devised to get working components as soon as possible, and then to iteratively improve them to the desired level. This ensures that the deliverable are guaranteed, and the iterations allow for easier management of the work required on each individual component.

Week Major Tasks Subtasks

Week 1
( May 19-26 )

Implement Resign Button

  • Implement Button for players to Press
  • Include the player resign data in the stream to the server

Week 2
(26 May-02 June)

Basic Replay Parser

  • Implement support for checking for resigned players
  • Collect some of the data that will be added to persistent storage later.
  • Test basic functionality

Week 3
(June 2 – 9)

Set-up basic database controller, and add data generated so far into database.

  • Implement Database (SQL)
  • Implement extensible design to allow for future data

Week 4
(June 9 – 16)

Set-up connectivity between all new components, test implemented functionality.
Implement checksum for era,version.

  • Test extensibility of the database/dbcontroller design by adding new datatypes to the parser
  • Write a few unit tests.
  • Implement the checksum to uniquely identify (era + version) combinations.

Week 5
(June 16 – 23)

Implement Additional parser functionality

  • Add several more fields to the database and implement corresponding query mining on the server stream.

Week 6
(June 23 – 30)

Implement advanced victory detection
Prepare for Mid Evaluations

  • Implement better victory detection in addition to the resign button.
  • Scrub code and complete documentation in expectation of mid evaluations.

Week 7
(June 30 – July 7)

Optimize Communication between server and parser.

  • Provide parser with only relevant data from the server data stream
  • Implement basic aggregation views etc in preparation of providing presentable data.

Week 8
(July 7 – 14)

Scrub Code and improve parser functionality.

  • Implement some additional data gathering functionality in parser.
  • Get up-to-date on documentation and tests.

Week 9
(July 14 – 21)

Start work on presenting generated results/analysis

  • Provide complete set of aggregation methods/ statistical analysis
  • Provide access to stats a few stats on a webpage (basic victory stats)

Week 10
(July 21 – 28)

Provide more details on webpage

  • Add era-wise,race-wise analysis of results
  • Add capabilities for custom queries

Week 11
(July 28 – August 4)

Add additional features to parser/database to check extensibility

  • Add a few more fields to check how extensible system is
  • Improve extensibility further

Week 12
(August 4 – 11)

Scrub all code and complete testing/documentation

  • Add a few more fields to check how extensible system is
  • Improve extensibility further

About Me

Hello everyone! I'm a 22 year old guy studying Computer Science and Engineering at the University of Moratuwa, Sri Lanka. This is my first year taking part in GSoC and I definitely intend to take part in it next year as well. If you wish to contact me I hope the following details will be sufficient. Please feel free to leave me any feedback regarding this proposal, or anything else! :D

IRC

sachith500

Email

sachith500@gmail.com

Github

https://github.com/sachith500

Experience

I've worked on a number of projects in various languages. I've compiled a small collection of them that is relevant to this project here.

Programming Contests

Language used: C++
I have about 5 years of experience with programming contests. I've listed links to my profiles below.


Topcoder

Language : C++

Codeforces

Language : C++

Project Euler

Language : C++

Robotgame AI game

Language : Python

Google AI Challenge 2010

Language : Java

Decision Support System for SoilTech

  • PHP
  • MySQL

Customer Managing DBMS

Database Management System that uses MySQL for storing as well as controlling access to customer information.

  • PHP
  • MySQL
  • Symfony2
  • Doctrine

DiscoverLanka -Maching Learning, Query Mining Android App

A Location based, context aware app that employs query mining using an ontology database in order to learn about users as well as the places they are interested in visiting.

Technologies used:

  • Android

Contribution to Wesnoth

Patches

I started work on Wesnoth even before it got officially selected for GSoC :D Description is colour coded by acceptance status.
GREEN indicates the patch was accepted.
RED indicates the patch is pending approval.

Bug# Description Submission Date

#21358

Help link for special ability plague was not linking back to the units

2014.02.15

#21486

Team labels were clearing non-team labels on the map

2014.02.15

PR#136

Added cumulative density function to damage calculations

2014.03.31

Brainstorm

THIS SECTION WILL BE REMOVED FOR THE FINAL PROPOSAL. This section is just used for brainstorming various half formed ideas, so they can be iterated upon.

VICTORY DETECTION

  • Possible Cases to consider
    1. Everyone disconnects : no clear winner.
    2. All leaders on all sides except one are dead : Side with leaders remaining win.
    3. All players except 1 disconnect : TODO- Decide if game is still multiplayer
  • At the end of the game, attempt to identify the winner using available information.
  • Current work plan :
    1. Implement very basic winner detection.
    2. Implement some basic database features.
    3. Repeat until satisfactory.


A suitable database connector would be incorporated into the wml parser to allow for saving of collected data into a database (MySQL perhaps?). A separate class could then be used in conjunction with the connector to provide an extensible design for the database. [High Priority] Provide stats regarding game, (units used, advancements made etc)
[Optional] Provide stats on the pages of units.wesnoth.org. eg:- http://units.wesnoth.org/1.10/mainline/en_US/Vampire%20Bat.html could have stats on vampire bats, % appearing in games, of those offer a breakdown of how many (1,2,3 bats etc). How many advancements etc.

Feedback

THIS SECTION WILL BE REMOVED FOR THE FINAL PROPOSAL.

  • [Critical] A button for players to resign with. On click, it will print "GG" in chat, and auto-quit the game. Provide a database with easy extensibility to support different future types that may need to be entered.

And save a replay of the game including the the outcome message. If possible, outcome message would be better as "Good Game! Winners: <playername>, <playername>,...

  • This deliverable shall deliver a means of identifying

Redundant, perhaps "Delivers a means of..." or "Implement a means of..."

  • Unit Table

I agree, it might be interesting to store stuff on a per-unit basis. That said, it may end up as too much data that rarely gets used. Is it possible to remove later easily if that turns out to be the case? If we can scrap it later than I would be in favor of trying it to start, assuming it does not add to the complexity of the implementation. It's a low priority, but possibly interesting idea.

One other thing regarding units, some UMC does a LOT with units and WML. For example: http://forums.wesnoth.org/viewtopic.php?f=15&t=18919 or http://forums.wesnoth.org/viewtopic.php?f=15&t=40073 or http://forums.wesnoth.org/viewtopic.php?f=15&t=21234

So we can't really expect to design something that is going to handle every case perfectly. The priority should be on taking what useful info we can and not breaking under "crazy" conditions. Documenting how things work will also be useful and some of these authors might be REALLY interested in seeing unit stats and willing to work toward complying with guidelines for how to handle their units and WML so that they can be parsed usefully or helping improve how this works in the future. Food for thought.

  • Leader Table

I would like to see leader stats, that could be interesting. One thought: if the unit table above is implemented, perhaps just tagging leaders as units and keeping them in there (to be filtered out later by queries) is an option? As you have it outlined is fine too. This is another lower priority but fun idea.

  • Brainstorm
  • Everyone disconnects : no clear winner.

I think we have to treat this case as an unfinished game and throw it out. Many times MP games are saved to continue later.

  • All leaders on all sides except one are dead : Side with leaders remaining win.

In a regular MP game, yes. In a survival style perhaps there are no leaders on the other side but some other WML yet to happen or a turn limit? I forget if this is possible or not these days.

  • All players except 1 disconnect : TODO- Decide if game is still multiplayer

This is a tricky case. Usually this would be a game over situation but sometimes not. Every once in a while one side surrenders but the winning or losing player wants to play on a bit to see if it's REALLY lost. I would guess we have to throw these out because it's really hard to tell what should happen if one player continues on in a game.


OTHER NOTES:

Not everything is going to work as nicely as default. As I continue to think about this, it is clear that some UMC will be broken or may not parse well for some reason. There are guidelines to prevent such problems, but it is fair to assume that mistakes will be made and not everyone is going to follow the guidelines perfectly. We should be careful to handle stuff in a way that broken or crazy content doesn't creep in and pollute regular data or cause unforeseen problems.

What happens when two different eras made by different authors have created a WML ability that has different outcomes? For example, two authors implement an ability called improved_magic, but one gives hits 80% of the time and one gives hits 90% of the time. How this is stored and retrieved should be done so as not to gloss over any problems as an apples to apples comparison. Since we would rarely, if ever, want to take data from ALL eras for any given map but we might want to take data from the default era and perhaps a few others to look at them together, this might be best addressed on the storage side or the query side ("duplicate ability/unit, are you sure you wish to continue?" Or even just a warning somewhere for the user to be careful not to combine eras that could produce such results), I'm not really sure.

This page was last modified on 2 April 2014, at 16:48.