Difference between revisions of "SummerOfCodeProposal cjhopman"

From The Battle for Wesnoth Wiki
(WML Profiling)
m (Hide historical instance of "SVN" from mediawiki search to avoid false positives.)
 
(29 intermediate revisions by 2 users not shown)
Line 3: Line 3:
  
 
# Write a small introduction to yourself.  
 
# Write a small introduction to yourself.  
#* My name is Chris Hopman, I'm a student of computer science and mathematics. I'm graduating from the University of Wisconsin - Madison in May and will be attending the computer science PhD program here in the fall.
+
#* My name is Chris Hopman, I'm a student of computer science and mathematics.
 
# State your preferred email address.  
 
# State your preferred email address.  
 
#* cjhopman@gmail.com
 
#* cjhopman@gmail.com
Line 11: Line 11:
 
#* Participating in summer of code will give me an opportunity to spend a lot of time doing something that I enjoy. I will get to work on an interesting project and will get to make a major contribution to a great project.
 
#* Participating in summer of code will give me an opportunity to spend a lot of time doing something that I enjoy. I will get to work on an interesting project and will get to make a major contribution to a great project.
 
# What are you studying, subject, level and school?  
 
# What are you studying, subject, level and school?  
#* I am currently an undergraduate studying mathematics and computer science at the University of Wisconsin - Madison. I will be graduating in May and will be attending the PhD program here in the fall.
+
#* I am currently an undergraduate studying mathematics and computer science at the University of Wisconsin - Madison. I will be graduating in May and will be attending the computer science PhD program here in the fall.
# If you have contributed any patches to Wesnoth, please list them below. You can also list patches that have been submitted but not committed yet and patches that have not been specifically written for Wesnoth. If you have gained commit access to our SVN (during the evaluation period or earlier) please state so.  
+
# If you have contributed any patches to Wesnoth, please list them below. You can also list patches that have been submitted but not committed yet and patches that have not been specifically written for Wesnoth. If you have gained commit access to our S­­V­­N (during the evaluation period or earlier) please state so.  
#* I received commit access to the Wesnoth SVN early in 2008. There is a list of some of my [[#My contributions to Wesnoth|contributions]] below.
+
#* I received commit access to the Wesnoth S­­V­­N early in 2008. There is a list of some of my [[#My contributions to Wesnoth|contributions]] below.
  
 
===Experience===
 
===Experience===
Line 68: Line 68:
 
#* I expect to gain experience working with and redesigning a complex data structure. Also, I will get to be mentored by a person with more experience programming (and working on larger projects) which can only improve my own skills.
 
#* I expect to gain experience working with and redesigning a complex data structure. Also, I will get to be mentored by a person with more experience programming (and working on larger projects) which can only improve my own skills.
 
# What would make you stay in the Wesnoth community after the conclusion of SOC?  
 
# What would make you stay in the Wesnoth community after the conclusion of SOC?  
#* A cookie, chocolate chip preferably.
+
#* I already know and like the Wesnoth community. I will stay regardless of anything that happens for SOC.
  
 
===Practical considerations===
 
===Practical considerations===
  
# Are you familiar with any of the following tools or languages? Subversion, C++, Python, build environments
+
# Are you familiar with any of the following tools or languages? Sub­­version, C++, Python, build environments
#* I am very familiar with C++. I have used subversion enough that I can do the basics, I am not very familiar with creating and merging branches. I only have a little experience with both Python and build environments. With either of these, I currently can only do the most basic tasks.
+
#* I am very familiar with C++. I have used sub­­version enough that I can do the basics, I am not very familiar with creating and merging branches. I only have a little experience with both Python and build environments. With either of these, I currently can only do the most basic tasks.
 
# Which tools do you normally use for development? Why do you use them?  
 
# Which tools do you normally use for development? Why do you use them?  
 
#* Currently, I do almost all of my development in Linux, with a text editor, gdb and other command-line tools. Linux in general just makes development so much easier and the tools it provides are very powerful. In Windows (which I haven't really developed in in almost a year) I use Visual Studio 2005/2008. For debugging I feel it is even better than the tools available in Linux, and it has a few other nice features that may improve my productivity.
 
#* Currently, I do almost all of my development in Linux, with a text editor, gdb and other command-line tools. Linux in general just makes development so much easier and the tools it provides are very powerful. In Windows (which I haven't really developed in in almost a year) I use Visual Studio 2005/2008. For debugging I feel it is even better than the tools available in Linux, and it has a few other nice features that may improve my productivity.
Line 86: Line 86:
  
 
==Technical details==
 
==Technical details==
The in-memory storage of WML currently uses a significant amount of memory. Also, the various data structures used often make small memory allocations which can lead to heap fragmentation and even worse memory-efficiency. There are several different ways that we can significantly improve the memory-efficiency of loaded WML.<br>
+
The in-memory storage of WML currently uses a significant amount of memory. Also, the various data structures used often make small memory allocations which leads to even worse memory-efficiency. There are several different ways that we can improve the memory-efficiency of loaded WML.<br>
There are basically three areas that I intend to optimize for this gsoc project. First is the [[#String Representation|representation of strings]] in loaded WML. Second, the [[#Config Representation|representation of the config class]]. And third, implement a [[#Lazy Work|framework]] for both lazy loading and lazy construction of config objects. These three things should all improve the memory-efficiency of loaded WML, but only if some assumptions hold true. For this reason I will also do some in-depth [[#WML Profiling|WML profiling]].<br><br>
+
There are basically three areas that I intend to optimize for this gsoc project. First is the [[#String Representation|representation of strings]] in loaded WML. Second, the [[#Config Representation|representation of the config class]]. And third, implement a [[#Lazy Work|framework]] for both lazy loading and lazy construction of config objects. These three things should all improve the memory-efficiency of loaded WML, but only if certain assumptions hold true. For this reason I will also do some in-depth [[#WML Profiling|WML profiling]].<br><br>
  
 
===String Representation===
 
===String Representation===
 
A large part of the memory usage comes from strings(including t_strings). It's highly likely that a lot of these strings are the same (for example, "Id", "Name", etc). If enough of these are shared, then a lot of memory could be saved by sharing the string representations. The basic idea here is to have all strings in some global area accessible by an index or pointer. For example, a naive approach would be to just have an unsorted vector of strings and then the actual string representation in the config class would be an index into the vector. The obvious problem with that is that lookup for new strings is slow. Basically, our needs are low memory overhead, fast lookup of new (unknown) strings, fast lookup by "index", and fast insertion.<br>
 
A large part of the memory usage comes from strings(including t_strings). It's highly likely that a lot of these strings are the same (for example, "Id", "Name", etc). If enough of these are shared, then a lot of memory could be saved by sharing the string representations. The basic idea here is to have all strings in some global area accessible by an index or pointer. For example, a naive approach would be to just have an unsorted vector of strings and then the actual string representation in the config class would be an index into the vector. The obvious problem with that is that lookup for new strings is slow. Basically, our needs are low memory overhead, fast lookup of new (unknown) strings, fast lookup by "index", and fast insertion.<br>
 
I have a couple ideas of how to do this.<br><br>
 
I have a couple ideas of how to do this.<br><br>
My primary idea is to use a hash table to store the strings. The difficulty with this is that the basic linear-probing hash table suffers poor performance as the load factor gets high and requires rebuilding with more buckets. The real problem is that a basic hash table is not very memory-efficient. There are a couple of other implementations that would be more memory-efficient. Each of these implementations are efficient at high load factors.<br><br>
+
My primary idea is to use a hash table to store the strings. The problem is that a basic linear-probing hash table performs very poorly at high load factors and so is not very memory-efficient if we want decent performance. There are a couple of other implementations that would be more memory-efficient. Each of these implementations are efficient at high load factors.<br><br>
 
'''Blocked cuckoo hashing'''<br>
 
'''Blocked cuckoo hashing'''<br>
One variant that would be good for this project is blocked cuckoo hashing, a variant of [http://en.wikipedia.org/wiki/Cuckoo_hashing| cuckoo hashing] where each position in the table can hold some fixed amount of keys. There are several benefits to this implementation. At a very high ( > 99.9% ) load factor this implementation is still efficient and has very little memory overhead. The storage needed for blocked cuckoo hashing can be allocated as one large contiguous block. One downside is that the table has a fixed size and resizing the hash table would be difficult as it requires updating all the indices (the easiest way to do this adds an actra pointer per distinct string). A better option than resizing the table would be to have a backup stash. This could be a simple vector, but it would likely be better for it to be a smaller hash table that more gracefully handles dynamic size requirements.<br>
+
One variant that would be good for this project is blocked cuckoo hashing, a variant of [http://en.wikipedia.org/wiki/Cuckoo_hashing cuckoo hashing] where each position in the table can hold some fixed amount of keys. There are several benefits to this implementation. At a very high ( > 99.9% ) load factor this implementation is still efficient and has very little memory overhead. The storage needed for blocked cuckoo hashing can be allocated as one large contiguous block. One downside is that insertions can move elements to different buckets and so our "indices" will need to be updateable. This likely means that they would need an extra level of indirection so they can point to something that doesn't move. Also, with blocked cuckoo hashing the table has a fixed constant size, if it gets too large we would have to resize it which can be quite slow. A better option than resizing the table would be to have a backup stash. This could be a simple vector, but it would likely be better for it to be a smaller hash table that more gracefully handles dynamic size requirements.<br>
 
''More info available [http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V1G-4N56BWF-5&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=067aef2a68139cd8694dcdab5d847db3 here] (I was unable to find a free version of this paper).''<br><br>
 
''More info available [http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V1G-4N56BWF-5&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=067aef2a68139cd8694dcdab5d847db3 here] (I was unable to find a free version of this paper).''<br><br>
 
'''Move-to-front chaining'''<br>
 
'''Move-to-front chaining'''<br>
In this variant the hash table is an array of linked lists. When we do lookups, the node that we find is moved to the front of its respective list. This move-to-front heuristic makes the hash table very efficient when there is a skew in the distribution of looked up words (as is likely the case with WML). The downside to this method is that it has more memory overhead per distint string (due to the linked list pointer). We could allocate a deque or vector of listnodes so that they aren't each a small allocation (though this makes removing them and reusing that space a bit more complex).<br>
+
In this variant the hash table is an array of linked lists. When we do lookups, the node that we find is moved to the front of its respective list. This move-to-front heuristic makes the hash table very efficient when there is a skew in the distribution of looked up words (as is likely the case with WML). The downside to this method is that it has more memory overhead per distinct string (due to the linked list pointer). We would likely hold a large block of listnodes in a vector or deque so that they don't all require a small memory allocation.<br>
 
''More info on efficiency of this implementation [http://goanna.cs.rmit.edu.au/~jz/fulltext/ipl01.pdf].''<br><br>
 
''More info on efficiency of this implementation [http://goanna.cs.rmit.edu.au/~jz/fulltext/ipl01.pdf].''<br><br>
 
'''Chaining with dynamic array'''<br>
 
'''Chaining with dynamic array'''<br>
 
Here we replace the linked list with a dynamically sized array. This has a bit less memory overhead but the allocated memory will not be in a single contiguous allocation. Also, in this method is it more difficult to use the move-to-front optimization as it would require updating the indices. Without that optimization, this is likely the least cpu-efficient of the three implementations.<br><br>
 
Here we replace the linked list with a dynamically sized array. This has a bit less memory overhead but the allocated memory will not be in a single contiguous allocation. Also, in this method is it more difficult to use the move-to-front optimization as it would require updating the indices. Without that optimization, this is likely the least cpu-efficient of the three implementations.<br><br>
 
I believe that the best for this project will be one of the first two (though once one is working it will be quite simple to drop in another as the interfaces are the same).<br><br>
 
I believe that the best for this project will be one of the first two (though once one is working it will be quite simple to drop in another as the interfaces are the same).<br><br>
For this project, if profiling shows that there is a lot of non-distinct strings as is expected, I will implement at least one of the first two variants. The first has an overhead of at least 32 bits per distinct string(assuming 32 bit size_t) and the second at least 64. Again, profiling will show if it will be worth it.<br><br>
+
For this project, if profiling shows that there is a lot of non-distinct strings as is expected, I will implement at least one of the first two variants.<br><br>
  
 
Note: Another option for this would be a B-tree with large branching. This will likely have more overhead than the hashing options. I do not currently plan to implement this during gsoc, though again it would have the same (or very similar) interface as the hash tables and should be easy to drop in and profile.<br><br>
 
Note: Another option for this would be a B-tree with large branching. This will likely have more overhead than the hashing options. I do not currently plan to implement this during gsoc, though again it would have the same (or very similar) interface as the hash tables and should be easy to drop in and profile.<br><br>
Line 115: Line 115:
 
     vector<pair<map<..>::iterator, size_t> > ordered_children;
 
     vector<pair<map<..>::iterator, size_t> > ordered_children;
 
  };
 
  };
This looks like the equivalent of approximately 5 pointers overhead for each child_config, 2 for each attribute. Also, each config has overhead of about 3 pointers and 3 ints for the maps and vector. On my system, I estimate that is about 70 bytes of overhead per config object and about 15-20 per attribute. This is just a quick estimate, I have not done in-depth profiling yet to determine this.<br>
+
This is a significant amount of overhead per child config and attribute.<br>
We can probably just replace this with two sorted vectors (basically, vector<pair<string_index, t_string> > and vector<pair<string_index, config*> >, that is). This should cut the overhead per config object in half, and there should be almost no overhead per attribute. Without further profiling, I believe that the sorted vectors is the best approach. This change would have less memory overhead and would allocate the memory that it does use in larger blocks than the current method.<br><br>
+
We can probably just replace this with two sorted vectors (that is, basically, vector<pair<string_index, t_string> > and vector<pair<string_index, config*> >). This should cut the overhead per config object in half, and there should be almost no overhead per attribute. Without further profiling, I believe that the sorted vectors is the best approach. This change would have less memory overhead and would allocate the memory that it does use in larger blocks than the current method.<br><br>
  
 
===Lazy Work===
 
===Lazy Work===
 
Finally, we load a lot of WML (and build the corresponding config objects) that we don't need to. I intend to implement a framework that will allow us to lazily load WML from disk, and that will possibly allow us to lazily construct config objects from the loaded WML.<br><br>
 
Finally, we load a lot of WML (and build the corresponding config objects) that we don't need to. I intend to implement a framework that will allow us to lazily load WML from disk, and that will possibly allow us to lazily construct config objects from the loaded WML.<br><br>
  
'''Lazy Loading'''
+
'''Lazy Loading'''<br>
The most aggressive lazy optimization would be to leave WML on disk until needed. This could save us a lot of memory, but could require us to minimally parse the WML multiple times as there is no way to tell when a WML block ends. One other similar possibility is to load and parse the WML, and then write it to a temporary cache. This would be simpler to do and its possible that cached WML is already in a format that would allow for easy lazy loading (I am not sure of the format of cached WML).<br><br>
+
The most aggressive lazy optimization would be to leave WML on disk until needed. This could save us a lot of memory, but could require us to minimally parse the WML multiple times as there is no way to tell how long a WML tag/attribute is. One other similar possibility is to load and parse the WML, and then write it to a temporary cache. This would be simpler to do and its possible that cached WML is already in a format that would allow for easy lazy loading (I am not sure of the format of cached WML).<br><br>
  
'''Lazy Construction'''
+
'''Lazy Construction'''<br>
 
A slightly less aggressive, but still useful, optimization would be to not fully construct some config objects. With the previously discussed string representation, a config object becomes simply a stream of ints and associated types. In fact, the associated type identifiers could be packed into the high bits of a 32 bit index into the string hash table. <br>
 
A slightly less aggressive, but still useful, optimization would be to not fully construct some config objects. With the previously discussed string representation, a config object becomes simply a stream of ints and associated types. In fact, the associated type identifiers could be packed into the high bits of a 32 bit index into the string hash table. <br>
 
While lazy loading will cut memory usage of a config object by, basically, 100%, I expect lazy construction to cut it by as much as 50%. The benefit of lazy construction would be that construction of the object once it is needed will not have to go to disk and so will be faster.<br><br>
 
While lazy loading will cut memory usage of a config object by, basically, 100%, I expect lazy construction to cut it by as much as 50%. The benefit of lazy construction would be that construction of the object once it is needed will not have to go to disk and so will be faster.<br><br>
  
If we wanted to get really aggressive, we could have either of these lazy schemes actually store a bit more information so that construction would not require full parsing. In particular, I am thinking that we could have each element store its length, then when we need to build a thing we can do it much more quickly. One benefit of this is that it would allow us to reasonably unload WML that we think won't be needed.<br>
+
If we have these lazy schemes actually store some more information so that construction would not require full parsing. In particular, I am thinking that we could have each tag stored with its length, then when we need to build a thing we can do it much more quickly. This would allow us to aggressively unload WML that we think won't be needed.<br>
 
This optimization actually has the potential to greatly reduce memory usage. Implementing this will likely hit several parts of the code (config, config_cache, parser, ...).<br><br>
 
This optimization actually has the potential to greatly reduce memory usage. Implementing this will likely hit several parts of the code (config, config_cache, parser, ...).<br><br>
 +
 +
'''EDIT''' 9/4/09 00:40 GMT <br>
 +
Lazy optimizations is not really a good title for these as I actually intend for config objects to move back and forth between these states. That is, we should be able to, when constructing a config object, tell it which of the three states to be in. And we should be able to tell it at any time to go into any of the three states.<br><br>
  
 
===WML Profiling===
 
===WML Profiling===
Before doing any optimizations, we need to know whether or not it has a possibility of making an improvement. Basically the assumptions that we have made are that there a lot of strings, a lot of these strings are not unique, and there are a lot of config objects. These are reasonable assumptions, but it is one reason for me to do some intrusive profiling.<br>
+
EDIT: I have started doing some of this [[cjhopman_wml|here]].<br><br>
The second reason to do profiling is to determine the benefit of optimizations that we do write. For this reason I have already set up a heap profiler so that I can see some higher level memory usage profiles (see [[Cjhopman profiling]]). I also plan to do some more intrusive programming, basically I want to be able to see the structure of the graph of config objects (with some info like size of current, size of all children, etc.). Also, I will want to see the distribution of strings in tag names, attribute keys and attribute values. This will be the first thing that I do for this project, currently scheduled to be done before the actual start of coding time.
+
Before doing any optimizations, we need to know whether or not it has a possibility of making an improvement. Basically the assumptions that we have made are that there a lot of strings, a lot of these strings are not unique, and there are a lot of config objects. These are reasonable assumptions, but investigating them is one reason for me to do some extensive profiling.<br>
 +
The second reason to do profiling is to determine the benefit of optimizations that we do write. For this reason I have already set up a heap profiler so that I can see some higher level memory usage patterns (see [[Cjhopman profiling]]). I also plan to do some more intrusive programming, basically I want to be able to see the structure of the graph of config objects (with some info like size of current, size of all children, etc.). Also, I will want to see the distribution of strings in tag names, attribute keys and attribute values. This will be the first thing that I do for this project, currently scheduled to be done before the actual start of coding time.
  
 
==Timeline==
 
==Timeline==
Line 150: Line 154:
 
April 23rd - May 23rd<br>
 
April 23rd - May 23rd<br>
 
Finish up in-depth profiling of loaded WML in preparation for coding to begin.<br>
 
Finish up in-depth profiling of loaded WML in preparation for coding to begin.<br>
Make changes to config class to hide internals as described [http://dave.wesnoth.org/?p=9|here].<br><br>
+
Make changes to config class to hide internals as described [http://dave.wesnoth.org/?p=9 here].<br><br>
  
 
May 10th - May 16th<br>
 
May 10th - May 16th<br>
Line 196: Line 200:
  
 
'''Profiling and some optimization'''<br>
 
'''Profiling and some optimization'''<br>
I have [http://code.google.com/p/google-perftools/ google performance tools] working with Wesnoth to do some profiling. Originally I set this up as being able to profile memory usage and cpu usage will be important to this project. I have some results of this up at another [[cjhopman_profiling|page]].<br>
+
I have [http://code.google.com/p/google-perftools/ google performance tools] working with Wesnoth to do some profiling. Originally I set this up because being able to profile memory usage and cpu usage will be important to this project. I have some results of this up at another [[cjhopman_profiling|page]].<br>
 
While doing some profiling I found a couple of bottlenecks in the code.<br>
 
While doing some profiling I found a couple of bottlenecks in the code.<br>
 
First, image::locator::locator() was using ~8-10% of the time in-game. I rewrote its lookup to use a hash-based map and cut the time the function used in half. [http://svn.gna.org/viewcvs/wesnoth?rev=34388&view=rev]<br>
 
First, image::locator::locator() was using ~8-10% of the time in-game. I rewrote its lookup to use a hash-based map and cut the time the function used in half. [http://svn.gna.org/viewcvs/wesnoth?rev=34388&view=rev]<br>

Latest revision as of 03:31, 21 March 2013

Questionnaire

Basics

  1. Write a small introduction to yourself.
    • My name is Chris Hopman, I'm a student of computer science and mathematics.
  2. State your preferred email address.
    • cjhopman@gmail.com
  3. If you have chosen a nick for IRC and Wesnoth forums, what is it?
    • My nick is cjhopman pretty much everywhere.
  4. Why do you want to participate in summer of code?
    • Participating in summer of code will give me an opportunity to spend a lot of time doing something that I enjoy. I will get to work on an interesting project and will get to make a major contribution to a great project.
  5. What are you studying, subject, level and school?
    • I am currently an undergraduate studying mathematics and computer science at the University of Wisconsin - Madison. I will be graduating in May and will be attending the computer science PhD program here in the fall.
  6. If you have contributed any patches to Wesnoth, please list them below. You can also list patches that have been submitted but not committed yet and patches that have not been specifically written for Wesnoth. If you have gained commit access to our S­­V­­N (during the evaluation period or earlier) please state so.
    • I received commit access to the Wesnoth S­­V­­N early in 2008. There is a list of some of my contributions below.

Experience

  1. What programs/software have you worked on before?
    • The largest program that I have worked on is definitely Wesnoth. Other than that, and other than projects just for classes, I have worked on two other projects. The first is autoscanner. The goal of this program is to do automatic 3d reconstruction of a statue from a short video clip. I did this project with another student for a professor that we worked for. The second project is more of a library of various functions. It can be found here. It is meant primarily to be useful for algorithm competitions (TopCoder, ICPC, etc.) and so is mostly graph theory, computational geometry, linear algebra, and other similar stuff.
  2. Have you developed software in a team environment before? (As opposed to hacking on something on your own)
    • Other than Wesnoth, the two projects I just mentioned were both done in groups of 2-3. So, primarily my experience of developing software in a team environment is just that from my contributions to Wesnoth.
  3. Have you participated to the Google Summer of Code before? As a mentor or a student? In what project? Were you successful? If not, why?
    • I have not participated in Google Summer of Code before.

Open Source

  1. Are you already involved with any open source development projects? If yes, please describe the project and the scope of your involvement.
    • Wesnoth is the only open source project that I am involved in (actually both of my projects on code.google.com are also open source but it's a whole different level). Most of my contribution to Wesnoth has been bug fixes and other minor work, though I have had several larger contributions. There is a list of these below.

Gaming experience - Are you a gamer?

  1. What type of gamer are you?
    • I am a diverse gamer. I've been playing games for more than 80% of my life (wow, just realized how long it has been)
  2. What type of games?
    • I like pretty much all genres. Yet, my favorites tend to be strategy games or rpgs. And those that find a good blend of the two are great. For example, Battle for Wesnoth (some others, too--Final Fantasy Tactics comes to mind).
  3. What type of opponents do you prefer?
    • Smart ones. I love the challenge of trying to outplay a smart player.
  4. Are you more interested in story or gameplay?
    • It depends. Generally when I am playing single-player games story and gameplay are both important though I am more likely to accept below average gameplay for an above average story than vice-versa. Playing multiplayer, particularly competitive multiplayer, gameplay is much more important.
  5. Have you played Wesnoth? If so, tell us roughly for how long and whether you lean towards single player or multiplayer.
    • I have played Wesnoth for a bit over a year. I had focused on single player campaigns but in the last two months have shifted to almost only multiplayer.

Communication skills

  1. Though most of our developers are not native English speakers, English is the project's working language. Describe your fluency level in written English.
    • I am fluent in written English. I had better be as I am definitely not in any others.
  2. Are you good at interacting with other players? Our developer community is friendly, but the player community can be a bit rough.
    • Yes, I am good at interacting with other players and with other people in general.
  3. Do you give constructive advice?
    • I think that I give constructive advice. I have been a tutor in math and computer science for a couple years and a sailing instructor for longer. Both of these have definitely improved my ability to give constructive advice.
  4. Do you receive advice well?
    • Yes, I do. I feel that a factor in this is that I am always interested in learning more and advice often offers an opportunity to do that.
  5. Are you good at sorting useful criticisms from useless ones?
    • Yes.

Project

  1. Did you select a project from our list? If that is the case, what project did you select? What do you want to especially concentrate on?
    • I chose to work on the optimization of wml for memory usage problem.
  2. Why did you choose this project?
    • I chose this project because I enjoy working with algorithms and data structures, and, in particular, I enjoy the challenge of finding better ways of doing things. I think that this project will be a chance for me to do that.
  3. Include an estimated timeline for your work on the project.
  4. Include as much technical detail about your implementation as you can
  5. What do you expect to gain from this project?
    • I expect to gain experience working with and redesigning a complex data structure. Also, I will get to be mentored by a person with more experience programming (and working on larger projects) which can only improve my own skills.
  6. What would make you stay in the Wesnoth community after the conclusion of SOC?
    • I already know and like the Wesnoth community. I will stay regardless of anything that happens for SOC.

Practical considerations

  1. Are you familiar with any of the following tools or languages? Sub­­version, C++, Python, build environments
    • I am very familiar with C++. I have used sub­­version enough that I can do the basics, I am not very familiar with creating and merging branches. I only have a little experience with both Python and build environments. With either of these, I currently can only do the most basic tasks.
  2. Which tools do you normally use for development? Why do you use them?
    • Currently, I do almost all of my development in Linux, with a text editor, gdb and other command-line tools. Linux in general just makes development so much easier and the tools it provides are very powerful. In Windows (which I haven't really developed in in almost a year) I use Visual Studio 2005/2008. For debugging I feel it is even better than the tools available in Linux, and it has a few other nice features that may improve my productivity.
  3. What programming languages are you fluent in?
    • I am very fluent in C++, and significantly less so in Java. I have some experience with C and Scheme.
  4. What spoken languages are you fluent in?
    • English.
  5. At what hours are you awake and when will you be able to be in IRC (please specify in UTC)
    • I am generally awake from 1:00pm to 5:00 am UTC and will be available most of that time.
  6. Would you mind talking with your mentor on telephone / internet phone? We would like to have a backup way for communications for the case that somehow emails and IRC do fail.
    • I would not mind that at all.

Technical details

The in-memory storage of WML currently uses a significant amount of memory. Also, the various data structures used often make small memory allocations which leads to even worse memory-efficiency. There are several different ways that we can improve the memory-efficiency of loaded WML.
There are basically three areas that I intend to optimize for this gsoc project. First is the representation of strings in loaded WML. Second, the representation of the config class. And third, implement a framework for both lazy loading and lazy construction of config objects. These three things should all improve the memory-efficiency of loaded WML, but only if certain assumptions hold true. For this reason I will also do some in-depth WML profiling.

String Representation

A large part of the memory usage comes from strings(including t_strings). It's highly likely that a lot of these strings are the same (for example, "Id", "Name", etc). If enough of these are shared, then a lot of memory could be saved by sharing the string representations. The basic idea here is to have all strings in some global area accessible by an index or pointer. For example, a naive approach would be to just have an unsorted vector of strings and then the actual string representation in the config class would be an index into the vector. The obvious problem with that is that lookup for new strings is slow. Basically, our needs are low memory overhead, fast lookup of new (unknown) strings, fast lookup by "index", and fast insertion.
I have a couple ideas of how to do this.

My primary idea is to use a hash table to store the strings. The problem is that a basic linear-probing hash table performs very poorly at high load factors and so is not very memory-efficient if we want decent performance. There are a couple of other implementations that would be more memory-efficient. Each of these implementations are efficient at high load factors.

Blocked cuckoo hashing
One variant that would be good for this project is blocked cuckoo hashing, a variant of cuckoo hashing where each position in the table can hold some fixed amount of keys. There are several benefits to this implementation. At a very high ( > 99.9% ) load factor this implementation is still efficient and has very little memory overhead. The storage needed for blocked cuckoo hashing can be allocated as one large contiguous block. One downside is that insertions can move elements to different buckets and so our "indices" will need to be updateable. This likely means that they would need an extra level of indirection so they can point to something that doesn't move. Also, with blocked cuckoo hashing the table has a fixed constant size, if it gets too large we would have to resize it which can be quite slow. A better option than resizing the table would be to have a backup stash. This could be a simple vector, but it would likely be better for it to be a smaller hash table that more gracefully handles dynamic size requirements.
More info available here (I was unable to find a free version of this paper).

Move-to-front chaining
In this variant the hash table is an array of linked lists. When we do lookups, the node that we find is moved to the front of its respective list. This move-to-front heuristic makes the hash table very efficient when there is a skew in the distribution of looked up words (as is likely the case with WML). The downside to this method is that it has more memory overhead per distinct string (due to the linked list pointer). We would likely hold a large block of listnodes in a vector or deque so that they don't all require a small memory allocation.
More info on efficiency of this implementation [1].

Chaining with dynamic array
Here we replace the linked list with a dynamically sized array. This has a bit less memory overhead but the allocated memory will not be in a single contiguous allocation. Also, in this method is it more difficult to use the move-to-front optimization as it would require updating the indices. Without that optimization, this is likely the least cpu-efficient of the three implementations.

I believe that the best for this project will be one of the first two (though once one is working it will be quite simple to drop in another as the interfaces are the same).

For this project, if profiling shows that there is a lot of non-distinct strings as is expected, I will implement at least one of the first two variants.

Note: Another option for this would be a B-tree with large branching. This will likely have more overhead than the hashing options. I do not currently plan to implement this during gsoc, though again it would have the same (or very similar) interface as the hash tables and should be easy to drop in and profile.

The last thing with string representation is that we can optimize the low level representation of the string. I think the best option for this would be to use or adapt an already written lightweight string. A more important aspect of the low-level string representation is that it does a lot of small memory allocations. To solve this we can use some type of pool allocation (possibly Boost.Pool) for our lightweight strings.

Config Representation

Moving a level up from strings we get to the representation of the config class in memory. Currently it is basically

struct config {
    map<string, t_string> attributes;
    map<string, vector<config*> > children;
    vector<pair<map<..>::iterator, size_t> > ordered_children;
};

This is a significant amount of overhead per child config and attribute.
We can probably just replace this with two sorted vectors (that is, basically, vector<pair<string_index, t_string> > and vector<pair<string_index, config*> >). This should cut the overhead per config object in half, and there should be almost no overhead per attribute. Without further profiling, I believe that the sorted vectors is the best approach. This change would have less memory overhead and would allocate the memory that it does use in larger blocks than the current method.

Lazy Work

Finally, we load a lot of WML (and build the corresponding config objects) that we don't need to. I intend to implement a framework that will allow us to lazily load WML from disk, and that will possibly allow us to lazily construct config objects from the loaded WML.

Lazy Loading
The most aggressive lazy optimization would be to leave WML on disk until needed. This could save us a lot of memory, but could require us to minimally parse the WML multiple times as there is no way to tell how long a WML tag/attribute is. One other similar possibility is to load and parse the WML, and then write it to a temporary cache. This would be simpler to do and its possible that cached WML is already in a format that would allow for easy lazy loading (I am not sure of the format of cached WML).

Lazy Construction
A slightly less aggressive, but still useful, optimization would be to not fully construct some config objects. With the previously discussed string representation, a config object becomes simply a stream of ints and associated types. In fact, the associated type identifiers could be packed into the high bits of a 32 bit index into the string hash table.
While lazy loading will cut memory usage of a config object by, basically, 100%, I expect lazy construction to cut it by as much as 50%. The benefit of lazy construction would be that construction of the object once it is needed will not have to go to disk and so will be faster.

If we have these lazy schemes actually store some more information so that construction would not require full parsing. In particular, I am thinking that we could have each tag stored with its length, then when we need to build a thing we can do it much more quickly. This would allow us to aggressively unload WML that we think won't be needed.
This optimization actually has the potential to greatly reduce memory usage. Implementing this will likely hit several parts of the code (config, config_cache, parser, ...).

EDIT 9/4/09 00:40 GMT
Lazy optimizations is not really a good title for these as I actually intend for config objects to move back and forth between these states. That is, we should be able to, when constructing a config object, tell it which of the three states to be in. And we should be able to tell it at any time to go into any of the three states.

WML Profiling

EDIT: I have started doing some of this here.

Before doing any optimizations, we need to know whether or not it has a possibility of making an improvement. Basically the assumptions that we have made are that there a lot of strings, a lot of these strings are not unique, and there are a lot of config objects. These are reasonable assumptions, but investigating them is one reason for me to do some extensive profiling.
The second reason to do profiling is to determine the benefit of optimizations that we do write. For this reason I have already set up a heap profiler so that I can see some higher level memory usage patterns (see Cjhopman profiling). I also plan to do some more intrusive programming, basically I want to be able to see the structure of the graph of config objects (with some info like size of current, size of all children, etc.). Also, I will want to see the distribution of strings in tag names, attribute keys and attribute values. This will be the first thing that I do for this project, currently scheduled to be done before the actual start of coding time.

Timeline

April 3rd - April 15th
Convince Wesnoth mentors to accept my proposal.
Finish up some cpu optimizations that I am working on for Wesnoth.
Begin more in-depth profiling of loaded WML.

April 17th - April 23rd
I get to go to Stockholm for a week.

April 20th
Accepted proposals announced on GSOC site.

April 23rd - May 23rd
Finish up in-depth profiling of loaded WML in preparation for coding to begin.
Make changes to config class to hide internals as described here.

May 10th - May 16th
I have a couple of finals. I will likely actually have more time this week as I won't have class every day.

May 23rd
Coding begins. Start with string optimizations.

June 20th
String optimizations finished, tested, and documented.
Begin wml representation optimizations.

June 27th
WML representation optimizations finished, tested, and documented.
Begin working on lazy loading/construction.

July 6th
Submission of midterm evaluations begins.

July 27th
Framework for lazy loading/construction finished, tested, and documented.
Begin pass of intensive debugging and documentation on all three parts. Each of the three parts have already been tested and documented so this is just a sort of final cleaning pass.

August 3rd
Begin migration of current code to use lazy loading/construction where appropriate.

August 10th
"Pencils down" date.
Everything should actually be completed by this time, and this final week is a sort of buffer to ensure that everything is very well-tested and documented.

August 17th
End of GSOC.
Final evaluation submission begins.

My contributions to Wesnoth

Larger patches and other substantial contributions

unit_map
Originally redesigned this in early 2008 to have iterators that would be updated as a unit is moved and that would know when they become invalidated. Also added some more powerful iterators/accessors.[2][3] Recently I have refactored this a bit, templatizing the iterators. Even better, I greatly improved the documentation of the interface. [4]

gui2 text box history
I added support for text box history to gui2. [5]

Profiling and some optimization
I have google performance tools working with Wesnoth to do some profiling. Originally I set this up because being able to profile memory usage and cpu usage will be important to this project. I have some results of this up at another page.
While doing some profiling I found a couple of bottlenecks in the code.
First, image::locator::locator() was using ~8-10% of the time in-game. I rewrote its lookup to use a hash-based map and cut the time the function used in half. [6]
Second, I changed the algorithm that we used to determine what rectangles to redraw. This shows significant gains in some graphics-intensive situations (as much as 3x fps).[7]
Third, I am doing some work on the tokenizer used for parsing wml that shows a 10-20% speedup of WML loading at the cost of significant complexity. I'm not sure yet that this is worth it without first optimizing the underlying preprocessor stream that the tokenizer gets its input from.

Smaller changes and bug fixes

Report an error when a macro is not resolved and is not a filename.[8]
Very simple bug fix.[9]
Prevent duplicate advances_from entries.[10]
Fix bug #13003.[11]
Fix bug #12990.[12]
Began to refactor the attack class.[13]
Fix bug #11031.[14]
Fix bug with KNOCKBACK and similar wml.[15]
Simplified unit_map lookup by id.[16]
Improved error message when attempting to dereference invalid iterator.[17]

This page was last edited on 21 March 2013, at 03:31.