Cjhopman wml

From The Battle for Wesnoth Wiki

I've started doing more intrusive profiling of the config class to go along with my soc proposal.

I've collected some interesting info, but haven't waded through it all yet.
(if you want to see how info was collected, http://pages.cs.wisc.edu/~hopman/wml_collect.patch, but I warn you, it's not pretty)

To start off with, here is a list of strings used, frequency, and length. The strings are all cut off at 20 characters(or the first newline) and a +x is appended to show this. The first number in the file is the number of unique strings. These are all strings from attribute keys and values and child config keys. It's not exactly accurate as I just converted the t_strings to std::strings ignoring some of that structure.

So, let's do some calculations. First, let's assume that our current string is something like:

struct old_string {
    size_t len;
    char* str;
};

And also, let's say that size_t and char* are both 4 bytes. Then, not considering any other overhead we have 11,972 KB used for storing strings. The question is, how much could we save through string-sharing like I discussed in my proposal?

We will consider just the move-to-front hash map that I described. The stuff needed to implement it will look something like the following:

struct listnode {
    char* str;
    size_t len;
    size_t next_node;
    size_t ref_count;
};
struct hashmap {
    static deque<listnode> nodepool;
    size_t table[N]; // or vector<size_t> or similar
};
struct new_string {
    size_t node_index;
};

So, for each unique string we will have an overhead of 3 size_t and 1 char* or 16 bytes for the listnode. Let's say we have 1 bucket per string(I expect this to be more like 2-4 strings per bucket average, but whatever). Then we have further overhead of about 4 * #unique_strings. Also, each actual string in a config class has overhead of 4 bytes. With these assumptions, our memory usage for storing strings would drop to 3570 KB. That is, about a 75% reduction.

Again, this is a bit simplified as it ignores memory allocation overhead and such. I expect that this makes the sharing method even better. That is, adding 4 bytes per string of further overhead adds ~1700KB to the current method, and only ~50 KB to the shared method.

attribute map
We store about 190000 key-value attribute pairs. In an std::map that is 190000 nodes each with 3 pointers (left, right, parent) at 4 bytes per pointer that is over 2MB of overhead just from map nodes (again ignoring memory management overhead).

loaded WML structure graphed

So, starting to parse some more of the output gathered. Here is a graph of a lot of the loaded structure (Warning: it is only a 1.3 MB file but is a 20000x1700 pixel image, so it may take a bit to load). So, children with the same key are merged together (otherwise the graph gets huge), a +x is appended to the name to show this. length indicates the total length of all the child keys, attribute keys and attribute values of the node and all its children. That is, length is a good indicator of memory usage. total children is basically a sum of all the nodes that have this node in the path to them. The root nodes don't have names as the config class only stores names for its children. Edge thickness is determined by the percentage of the parents length that the child is.

With a bit more time I will make this visualization much more powerful. The biggest thing I plan to do is to allow making a graph focusing on particular nodes. Focusing on smaller parts of the graph we will also be able to display it without compressing it (e.g. wouldn't have to merge children, ...).
The goal with this type of information collection is primarily to help focus lazy loading/construction/etc. efforts.

This page was last modified on 7 April 2014, at 04:45.