ValidationDesign

From The Battle for Wesnoth Wiki

There are two levels of design:

  • abstract, which defines the Absract Base Class, with a few methods which are called when parsing the file.
  • realisation. For now exists one realization, named "Schema validation".

Abstract

Validator schould have access to line number and file name of config file to print human-readable and understandable warnings with exact position of errors. This information is stored in tokenizer while parsing input file.

File is validated while parsing. Every time parser opens|closes tag, it calls appropriate methods of validator. This is nessessary to help validator know about what tag is read at the moment. It is assumed that validator is stack-based, since parser has stack-based architecture. Every time parser reads a key, it calls validate_key().

Also there is a special method, that checks if all mandatory keys|tags are present. This method is supposed to call when all subtags and keys are read. It is called before closing tag.

To see exact signature of validation methods see devdocs.

At this moment the situation is next. Every time parser opens tag ( both [tag] and [+tag] ) it calls open_tag(). Every time parser closes tag ([/tag]) it calls validate() and close_tag(). When parser finished reading key it calls validate_key().

In current design is one problem. - Parser don't know whether [/tag] is closing [+tag] or [tag]. Also while validating [tag] ... [/tag] parser does not know, if tag will be extended with [+tag] ... [/tag] section.

!Note to developer who wants to write his realization of validator: the following situation is possible: if in [tag] ... [/tag] section some mandatory key is missing probaly some error will be printed by validator, and that error will be printed despite the fact that mandatory key was present in next [+tag] [/tag] section. Also if mandatory key is missing in [+tag] section the second errpr will be printed. (Schema validation manages with that using mapping of errors by adress of config object which is unique for all [tag] and is the same for [tag] and [+tag] ).

Realization

Schema validation

Schema is a kind of a tree, where nodes stores information about possible tags and keys.

At this moment the next information is stored:

  • tag: name and minimal required and maximum allowed number of occasions in context of his parent tag.
  • key: name, type and default value of the key. If key have no default value - this key is mandatory, and tag without that key is wrong.

Schema validation consists of two part: validation and schema generation.

Validation part

Schema validation requires schema file.

Schema validator is stack-based. There is a three different stacks:

  • Stack of opened tags. Tag on the top is currently validated tag.
  • Stack of counter maps. Each tag can have a lot of siblings. Every time tag is opened, validator increments counter of its occasions. Counters are mapped by name and maps are organized in stack.
  • Stack or error-cache maps. Error messages are added to a list of messages. Lists are mapped be adress of config object they belong. (Errors occured while validatong that config). Maps are organized in a stack. Caching by adress of validated object helps to deal with [tag] [+tag] case.


Let's review what do main methods:

  • open_tag(). Looks if a sibling of tag on the top of the stack with such name is allowed here, if no - prints error, and puts NULL to stack of opened tags. If yes - increases counter and puts pointer to schema tag to the stack. Also pushes counter_stack with new counter_map and message_stack with message map (to manage with siblings of just-opened tag).
  • validate(). Checks if all mandatory keys are present, and verifies numbers on tags present and required/allowed.Before validating closes previous error_map (with errors of siblings), and checks if this config object was validated before. If so, clears the error list.
  • validate_key(). Checks the key, if it is allowed, checks if it value is correct (if value matches type regex)
  • close_tag(). Pops from stack of tags, and counter-stack.

The developer creates schema_validator object, initializes it with schema file and then send to parser.

If validator was not initialized it throws exception from the constructor.

Error messages can be send as WML-exceptions or printed to "validation" log. Printing to validation log is enabled by default. You can make validation errors fatal, using wesnoth --strict-validation.

Schema generation part

If you are interested in exact format of markup, please visit WML_Annotation_Format

Special info, named "annotations" or "schema markup" is stored in Wiki-macros in C++-source files. Schema generator tool, running from console parses list of source line by line and collects all available information, creates a tag tree, validates if all key types were set and prints the output.

Note: All error messages are cached, some of them (unknown type) are deleted while parsing.

While parsing file, schema_generator builds schema tree. As was said earlier, schema tree is a kind of a tree, where nodes stores information of the tags.

Schema tree later will be saved to a config file with [wml_schema] toplevel tag.

Each tag stores list of allowed keys, and possible children. And there are some nuances. That nuances are named Link and Super-tag

I'm only saying schema "tree", honestly it is more like to a hybrid of *nix filesystem' (using path like (gui/window/resolution/grid/row/column/button) and links to another tag (this prevents from copying, if a tag is a possible child in many-many tags)) and hierarchical inheritance.

Path

Let us take a filesystem tree as an analogy. You can use full path to the tag to definitely distinguish tags. For example in GUI WML tag [resolution] can be used both in definition and instantiation of the widget. But they are extremely different [resolution]s.

Link

Another idea come from filesystem. If you want to use a file in many places you can just create a link to it. It's content is the same, no matter where you use it. Schema link is very like to this. If a tag has a child link (gui/window/resolution/grid)- this allowes him to use a [grid] as a child. All information about [grid] will be got from gui/window/resolution child and the same grids can be used elsewhere. This keeps a lot of space and time non-wasted on rewiting definitions.

Link name schould be a path to a tag, without '/' on beginning and ending "gui/window/resolution/grid" .

Super

The idea of super-tags came to me from inheritance and was in extremely great need. The situation was next: GUI WML contains widgets. Each widget has definition and instance. All definitions have a common list of keys and common list of tags. Nearly every definition has extra keys that are allowed only in this (or 2-3 more) definitons. And even more (!) : some definitions have extended child [resolution] tag. Simple linking would never manage with this.

I created a specilal feature to inherite already used tags. Every tag except top-level tags has it's parent, so I named this feature "super-tag". "Super-tag" is very much like to "super-class", and is optional.

If a tag has "super-tag" (i.e. generic/widget_defintion), this means that all keys, and child tags, and links of super-tag are allowed in current-tag. (!Note Tags of current tag are overriding super's.) and you still can add custom keys.

When schema file is read, all tags are expanded. Expanding is copying lists of keys and links from super-tag, and creating links to super-tag's children.

When schema file is written the two modes of file is allowed: short and expanded. In second maode all tags are expanded and you can see every allowed child tag or attribute of tag which can be useful to debug schema markup. The short is by default.

If you are interested how that problem with GUI WML was managed, the answer is next: New top-level tag was created. But it was created only in schema-markup. In [generic] were put some "abstract" tags, and tags describing widgets were marked using "super". The same was done with resolutions.

Format

Probably are some people who will say that current format of markup is a bit ugly. There is some plans about creating new format for special schema files in my head, but that will be probably done after a great discussion on the forum and polishing new format details. Also integration with mainline will take some time and work.

Future of schema validation

There is an idea of adding schema information inside validated WML. There are some issues like "[wml_schema] schould be the very first toplevel tag in the file", but I think that is not so hard to do, especially using macros. If this idea will be assumed - enabling of validation will be very easy. If you want to validate your's config - just put something like {gui/schema.cfg} at the early beginning of your file. That also need's discussions n the forum.

This page was last modified on 19 August 2011, at 20:54.