Difference between revisions of "GrammarWML"

From The Battle for Wesnoth Wiki
(Separate out preprocesor grammar)
(WML: Better grammar for WML)
Line 41: Line 41:
 
= WML =
 
= WML =
  
ws := ' ' | <tab>
+
This grammar describes WML '''after''' the preprocessor is finished with it, and as such does not account for macros, preprocessor directives, or comments. It also assumes tokenization has already occurred and thus does not specify whitespace. Note that it omits the requirement for opening and closing tags to match.
ws_eol := whitespace* comment? <nl>
 
id := [a-zA-Z0-9_]+
 
wml_document := (wml_attribute | wml_tag | macro_inclusion | ws* ws_eol)*
 
  
  wml_attribute := ws* wml_key_sequence ws* '=' ws* wml_attribute_value ws_eol
+
wml_doc := (wml_tag | wml_attribute)*
  wml_key_sequence := id (ws* ',' ws* id)*
+
wml_tag := '[' wml_name ']' wml_doc '[/' wml_name ']'
  wml_attribute_value := (text | ('_' ws*)? string) (ws* '+' ws_eol wml_attribute_value)
+
wml_name := [a-zA-Z0-9_]+
  text := ([^ <tab>+"]+ ws*)*
+
  wml_attribute := textdomain? wml_key_sequence '=' wml_value
 +
  wml_key_sequence := wml_name (',' wml_name)*
 +
wml_value := wml_value_component ('+' («nl» textdomain)? wml_value_component)*
 +
  wml_value_component := text | '_'? string | '_'? raw_string
 +
 +
  text := [^+«nl»]*
 
  string := '"' ([^"] | '""')* '"'
 
  string := '"' ([^"] | '""')* '"'
 
+
  raw_string := '<<' ([^>] | >[^>])* '>>'
  wml_tag := ws* '[' '+'? id ']' ws_eol wml_document ws* '[' '/' id ']'
+
  textdomain = '#textdomain' [a-zA-Z0-9_-]+
  comment := '#' [^<nl>]* <nl>
 

Revision as of 23:40, 6 December 2016

This page contains a formal grammar of the Wesnoth domain-specific languages, including WML and its preprocessor. It does not attempt to capture any of the ways that the Wesnoth engine may interpret a string, such as WML variable substitution. It also doesn't fully capture the potential consequences of macros, for example the use of unbalanced WML tags. The syntax used is regular-expression-like (which is not quite the same as regex-like!), with the following conventions:

  • Literal values are enclosed in either 'single quotes' or "double quotes".
  • Square brackets enclose character classes, with initial ^ inverting them
  • Whitespace within an expression (unless quoted) is used only for readability or to separate non-terminals
  • The meta-characters * + ? | have the same meaning as is typical in regular expressions
  • The sequence «tab» represents a tab character, and «nl» represents an end-of-line character or character sequence
  • Multiple definitions of a non-terminal are equivalent to alternation (ie, x:=4 and x:=7 combine to produce x:=4|7)

WML Preprocessor

The WML preprocessor knows little of the grammar of the WML language itself; it is primarily just a text-substitution engine. Currently this is just a draft and may not be entirely accurate.

preproc_doc := (preproc_directive | preproc_line)*
preproc_directive := simple_directive | macro_definition | if_block
preproc_line := (preproc_text | '<<' macro_free_text '>>' | macro_inclusion)* comment? «nl»
preproc_text := (preproc_char | '<' preproc_char)* '<'?
preproc_char := [^<{#«nl»]
macro_free_text := (macro_free_char | '>' macro_free_char)*
macro_free_char := [^>]
macro_inclusion := '{' ([^}]+ | macro_function) '}'
macro_function := macro_name_char+ (macro_argument)*
macro_name_char := [^} «tab»]
macro_argument := (macro_name_char | macro_inclusion)+
macro_argument := '(' preproc_doc? ')' | '_'? '"' ([^}"]
macro_argument := '""' | macro_inclusion)* '"'
macro_argument := '<<' macro_free_text '>>'
comment := '#' [^«nl»]+ «nl»
ws := ' ' | «tab»
simple_directive := '#undef' ws+ macro_name_char+ ws* «nl»
simple_directive := ('#warning' | '#error') ws+ [^«nl»]* «nl»
macro_definition := '#define' ws+ macro_name_char+ (ws+ macro_name_char+)* «nl» (simple_directive | if_block | preproc_line)+ '#enddef'
if_block := (ifdef_header | ifver_header | ifhave_header) «nl» preproc_doc ('#else' «nl» preproc_doc)? '#endif'
ifdef_header := ('#ifdef' | '#ifndef') ws+ macro_name_char+
ifver_header := ('#ifver' | '#ifnver') ws+ macro_name_char+ ws* comparison_op ws* version_string
ifhave_header := ('#ifhave' | '#ifnhave') ws+ [^«nl»]+
comparison_op := '<' | '<=' | '==' | '!=' | '>=' | '>'
version_string := integer ('.' integer)*
integer := [0-9]+

WML

This grammar describes WML after the preprocessor is finished with it, and as such does not account for macros, preprocessor directives, or comments. It also assumes tokenization has already occurred and thus does not specify whitespace. Note that it omits the requirement for opening and closing tags to match.

wml_doc := (wml_tag | wml_attribute)*
wml_tag := '[' wml_name ']' wml_doc '[/' wml_name ']'
wml_name := [a-zA-Z0-9_]+
wml_attribute := textdomain? wml_key_sequence '=' wml_value
wml_key_sequence := wml_name (',' wml_name)*
wml_value := wml_value_component ('+' («nl» textdomain)? wml_value_component)*
wml_value_component := text | '_'? string | '_'? raw_string

text := [^+«nl»]*
string := '"' ([^"] | '""')* '"'
raw_string := '<<' ([^>] | >[^>])* '>>'
textdomain = '#textdomain' [a-zA-Z0-9_-]+