Difference between revisions of "GrammarWML"

From The Battle for Wesnoth Wiki
(Rough draft of WML grammar)
 
(Try writing a grammar for the WFL language)
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page contains a formal grammar of the WML data language and preprocessor. It does not attempt to capture any of the ways that the Wesnoth engine may interpret a string, such as WML variable substitution. It also doesn't fully capture the potential consequences of macros, for example the use of unbalanced WML tags. The syntax used is regular-expression-like, with the following conventions:
+
{{WML Tags}}
 +
This page contains a formal grammar of the Wesnoth domain-specific languages, including WML and its preprocessor. It does not attempt to capture any of the ways that the Wesnoth engine may interpret a string, such as WML variable substitution. It also doesn't fully capture the potential consequences of macros, for example the use of unbalanced WML tags. The syntax used is regular-expression-like (which is not quite the same as regex-like!), with the following conventions:
  
 
* Literal values are enclosed in either 'single quotes' or "double quotes".
 
* Literal values are enclosed in either 'single quotes' or "double quotes".
Line 5: Line 6:
 
* Whitespace within an expression (unless quoted) is used only for readability or to separate non-terminals
 
* Whitespace within an expression (unless quoted) is used only for readability or to separate non-terminals
 
* The meta-characters * + ? | have the same meaning as is typical in regular expressions
 
* The meta-characters * + ? | have the same meaning as is typical in regular expressions
* The sequence <tab> represents a tab character, and <nl> represents an end-of-line character or character sequence
+
* The sequence «tab» represents a tab character, and «nl» represents an end-of-line character or character sequence
 
* Multiple definitions of a non-terminal are equivalent to alternation (ie, x:=4 and x:=7 combine to produce x:=4|7)
 
* Multiple definitions of a non-terminal are equivalent to alternation (ie, x:=4 and x:=7 combine to produce x:=4|7)
  
ws := ' ' | <tab>
+
== WML Preprocessor ==
ws_eol := whitespace* comment? <nl>
 
id := [a-zA-Z0-9_]+
 
wml_document := (preprocessor_statement | wml_attribute | wml_tag | macro_inclusion | ws* ws_eol)*
 
  
  wml_attribute := ws* wml_key_sequence ws* '=' ws* wml_attribute_value ws_eol
+
The WML preprocessor knows little of the grammar of the WML language itself; it is primarily just a text-substitution engine. Currently this is just a draft and may not be entirely accurate.
  wml_key_sequence := id (ws* ',' ws* id)*
+
 
  wml_attribute_value := (text | ('_' ws*)? string) (ws* '+' ws_eol wml_attribute_value)
+
preproc_doc := (preproc_directive | preproc_line)*
  text := ([^ <tab>+"]+ ws*)*
+
preproc_directive := simple_directive | macro_definition | if_block
 +
preproc_line := (preproc_text | '<<' macro_free_text '>>' | macro_inclusion)* comment? «nl»
 +
preproc_text := (preproc_char | '<' preproc_char)* '<'?
 +
preproc_char := [^<{#«nl»]
 +
macro_free_text := (macro_free_char | '>' macro_free_char)*
 +
macro_free_char := [^>]
 +
macro_inclusion := '{' ([^}]+ | macro_function) '}'
 +
macro_function := macro_name_char+ (macro_argument)*
 +
macro_name_char := [^} «tab»]
 +
macro_argument := (macro_name_char | macro_inclusion)+
 +
macro_argument := '(' preproc_doc? ')' | '_'? '"' ([^}"]
 +
  macro_argument := ('""' | macro_inclusion)* '"'
 +
macro_argument := '<<' macro_free_text '>>'
 +
comment := '#' [^«nl»]+ «nl»
 +
ws := ' ' | «tab»
 +
simple_directive := '#undef' ws+ macro_name_char+ ws* «nl»
 +
  simple_directive := ('#warning' | '#error') ws+ [^«nl»]* «nl»
 +
macro_definition := '#define' ws+ macro_name_char+ (ws+ macro_name_char+)* «nl» (opt_arg_definition)* (simple_directive | if_block | preproc_line)+ '#enddef' «nl»
 +
opt_arg_definition := '#arg' ws+ macro_name_char+ «nl» (simple_directive | if_block | preproc_line)+ '#endarg' «nl»
 +
  if_block := (ifdef_header | ifver_header | ifhave_header) «nl» preproc_doc ('#else' «nl» preproc_doc)? '#endif' «nl»
 +
ifdef_header := ('#ifdef' | '#ifndef') ws+ macro_name_char+
 +
ifver_header := ('#ifver' | '#ifnver') ws+ macro_name_char+ ws* comparison_op ws* version_string
 +
ifhave_header := ('#ifhave' | '#ifnhave') ws+ [^«nl»]+
 +
comparison_op := '<' | '<=' | '==' | '!=' | '>=' | '>'
 +
version_string := integer ('.' integer)*
 +
integer := [0-9]+
 +
 
 +
== WML ==
 +
 
 +
This grammar describes WML '''after''' the preprocessor is finished with it, and as such does not account for macros, preprocessor directives, or comments. It also assumes tokenization has already occurred and thus does not specify whitespace, except for newlines. Note that it omits the requirement for opening and closing tags to match.
 +
 
 +
  wml_doc := (wml_tag | wml_attribute)*
 +
wml_tag := '[' '+'? wml_name ']' wml_doc '[/' wml_name ']'
 +
wml_name := [a-zA-Z0-9_]+
 +
wml_attribute := textdomain? wml_key_sequence '=' wml_value «nl»
 +
wml_key_sequence := wml_name (',' wml_name)*
 +
wml_value := wml_value_component ('+' («nl» textdomain?)? wml_value_component)*
 +
wml_value_component := text | '_'? string | '_'? raw_string
 +
 +
text := [^+«nl»"]*
 
  string := '"' ([^"] | '""')* '"'
 
  string := '"' ([^"] | '""')* '"'
 +
raw_string := '<<' ([^>] | >[^>])* '>>'
 +
textdomain := '#textdomain' [a-zA-Z0-9_-]+ «nl»
 +
 +
== WML Substitutions ==
 +
 +
This grammar describes the syntax of WML substitutions, the syntax used to specify that variables should be substituted into the value of a WML attribute. The grammar here describes a single placeholder, without regard to the fact that they can be nested. Thus, parsing a string using this grammar would only succeed if done from right to left ''while'' performing the substitutions.
 +
 +
wml_substitution := wml_var | wml_formula | '$|'
 +
wml_var := '$' wml_var_path (wml_var_default | '|')?
 +
wml_var_path := (wml_var_name wml_var_index? '.')* wml_var_name
 +
wml_var_name := [a-zA-Z0-9_]+
 +
wml_var_index := '[' [0-9]+ ']'
 +
wml_var_default := '?' [^|]+ '|'
 +
wml_formula := '$' '(' wfl_document ')'
 +
 +
== Wesnoth Formula Language ==
  
wml_tag := ws* '[' '+'? id ']' ws_eol wml_document ws* '[' '/' id ']'
+
This grammar describes the syntax of the [[Wesnoth Formula Language]]. Though it specifies the format of comments and file markers, they are not integrated into the main grammar since it treats the equivalently to whitespace. The grammar may not be completely accurate to the actual in-game parser.
comment := '#' [^<nl>]* <nl>
 
  
  preprocessor_statement := macro_definition | ifdef_block | ifhave_block | ifver_block | line_preproc
+
  wfl_comment := '#' [^#]* '#'
  line_preproc := '#undef' ws+ [^ <tab>]+ ws_eol | '#' ('warning' | 'error') ws+ [^<nl>]* ws_eol
+
wfl_file_run := 'wfl' wfl_string «any token»* 'wflend'
  macro_definition := '#define' (ws+ [^ <tab>]+)+ ws_eol anything '#enddef'
+
wfl_document := wfl_function_definition* wfl_formula
  ifdef_block := '#ifdef' ws+ [^ <tab>]+ ws_eol anything ws_eol else_block? '#endif'
+
wfl_function_definition := 'def' wfl_name '(' wfl_function_args ')' wfl_formula ';'
  ifhave_block := '#ifhave' ws+ [^<nl>]+ ws_eol anything ws_eol else_block? '#endif'
+
wfl_function_args := (wfl_function_arg (',' wfl_function_arg)*)?
  ifver_block := '#ifver' ws+ comparison_expr ws_eol anything ws_eol else_block? '#endif'
+
wfl_function_arg := wfl_name '*'?
  else_block := '#else' ws_eol anything
+
wfl_formula := 'not'? where_expression
 +
wfl_formula := bracketed_expression
 +
bracketed_expression := '(' wfl_formula ')'
 +
where_expression := (boolean_or_expression | bracketed_expression) ('where' wfl_variables)*
 +
wfl_variables := wfl_variable (',' wfl_variable)*
 +
wfl_variable := wfl_name '=' wfl_formula
 +
boolean_or_expression := (boolean_and_expression | bracketed_expression) ('or' wfl_formula)*
 +
boolean_and_expression := (comparison_expression | bracketed_expression) ('and' wfl_formula)*
 +
comparison_expression := (containment_expression | bracketed_expression) (comparison_op wfl_formula)*
 +
  comparison_op := '=' | '!=' | '<' | '>' | '<=' | '>='
 +
containment_expression := (range_expression | bracketed_expression) ('in' wfl_formula)*
 +
range_expression := (additive_expression | bracketed_expression) ('~' wfl_formula)*
 +
additive_expression := negation_opt? (multiplicative_expression | bracketed_expression) (additive_op wfl_formula)*
 +
negation_op := '-' | '+'
 +
additive_op := '-' | '+' | '..'
 +
multiplicative_expression := (exponent_expression | bracketed_expression) (multiplicative_op wfl_formula)*
 +
muliplicative_op := '*' | '/' | '%'
 +
exponent_expression := (wfl_formula '^')* (dice_expression | bracketed_expression)
 +
dice_expression := (dot_expression | bracketed_expression) ('d' wfl_formula)*
 +
dot_expression := (wfl_value | bracketed_expression) ('.' wfl_formula)*
 +
  wfl_value := 'functions' | wfl_name | wfl_number | wfl_string | wfl_container | wfl_function_call
 +
wfl_name := [a-zA-Z_]+
 +
wfl_number := [0-9]+ ('.' [0-9]+)?
 +
wfl_string := "'" ([^'[]+ | wfl_string_subst | wfl_string_escape)* "'"
 +
wfl_string_subst := '[' wfl_formula ']'
 +
  wfl_string_escape := "[']" | '[(]' | '[)]'
 +
  wfl_container := '[' ('->' | wfl_expression_list | wfl_key_value_list)? ']'
 +
wfl_expression_list := wfl_formula (',' wfl_formula)*
 +
  wfl_key_value_list := wfl_formula '->' wfl_formula (',' wfl_formula '->' wfl_formula)*
 +
  wfl_function_call := wfl_name '(' wfl_expression_list? ')'
  
macro_inclusion := '{' anything '}'
+
[[Category:WML Reference]]

Latest revision as of 16:24, 24 February 2024

[edit]WML Tags

A:

abilities, about, achievement, achievement_group, add_ai_behavior, advanced_preference, advancefrom, advancement, advances, affect_adjacent, ai, allied_with, allow_end_turn, allow_extra_recruit, allow_recruit, allow_undo, and, animate, animate_unit, animation, aspect, attack (replay, weapon), attack_anim, attacks (special, stats), avoid;

B:

base_unit, background_layer, berserk, binary_path, break, brush;

C:

campaign, cancel_action, candidate_action, capture_village, case, chance_to_hit, change_theme, chat, checkbox, choice, choose, clear_global_variable, clear_menu_item, clear_variable, color_adjust, color_palette, color_range, command (action, replay), continue, credits_group, criteria;

D:

damage, death, deaths, default, defend, defends, defense, delay, deprecated_message, destination, difficulty, disable, disallow_end_turn, disallow_extra_recruit, disallow_recruit, do, do_command, drains, draw_weapon_anim;

E:

editor_group, editor_music, editor_times, effect, else (action, animation), elseif, endlevel, end_turn (action, replay), enemy_of, engine, entry (credits, options), era, event, experimental_filter_ability, experimental_filter_ability_active, experimental_filter_specials, extra_anim;

F:

facet, facing, fake_unit, false, feedback, female, filter (concept, event), filter_adjacent, filter_adjacent_location, filter_attack, filter_attacker, filter_base_value, filter_condition, filter_defender, filter_enemy, filter_location, filter_opponent, filter_own, filter_owner, filter_radius, filter_recall, filter_second, filter_second_attack, filter_self, filter_side, filter_student, filter_vision, filter_weapon, filter_wml, find_path, fire_event, firststrike, floating_text, found_item, for, foreach, frame;

G:

game_config, get_global_variable, goal, gold, gold_carryover;

H:

harm_unit, has_ally, has_attack, has_unit, has_achievement, have_location, have_unit, heal_on_hit, heal_unit, healed_anim, healing_anim, heals, hide_help, hide_unit, hides;

I:

idle_anim, if (action, animation, intro), illuminates, image (intro, terrain), init_side, insert_tag, inspect, item, item_group;

J:

jamming_costs, join;

K:

kill, killed;

L:

label, language, leader, leader_goal, leadership, leading_anim, levelin_anim, levelout_anim, lift_fog, limit, literal, load_resource, locale, lock_view, lua;

M:

male, menu_item, message, micro_ai, missile_frame, modification, modifications, modify_ai, modify_side, modify_turns, modify_unit, modify_unit_type, move, move_unit, move_unit_fake, move_units_fake, movement_anim, movement costs, movetype, multiplayer, multiplayer_side, music;

N:

not, note;

O:

object, objective, objectives, on_undo, open_help, option, options, or;

P:

part, petrifies, petrify, place_shroud, plague, poison, post_movement_anim, pre_movement_anim, primary_attack, primary_unit, print, progress_achievement, put_to_recall_list;

R:

race, random_placement, recall (action, replay), recalls, recruit, recruit_anim, recruiting_anim, recruits, redraw, regenerate, remove_event, remove_item, remove_object, remove_shroud, remove_sound_source, remove_time_area, remove_trait, remove_unit_overlay, repeat, replace_map, replace_schedule, replay, replay_start, reset_fog, resistance (ability, unit), resistance_defaults, resolution, resource, return, role, rule;

S:

save, scenario, screen_fade, scroll, scroll_to, scroll_to_unit, secondary_attack, secondary_unit, section, select_unit, sequence, set_achievement, set_extra_recruit, set_global_variable, set_menu_item, set_recruit, set_specials, set_variable, set_variables, sheath_weapon_anim, show_if (message, objective, set_menu_item), show_objectives, side, skirmisher, slider, slow, snapshot, sound, sound_source, source (replay, teleport), special_note, specials, split, stage, standing_anim, statistics, status, store_gold, store_items, store_locations, store_map_dimensions, store_reachable_locations, store_relative_direction, store_side, store_starting_location, store_time_of_day, store_turns, store_unit, store_unit_defense, store_unit_defense_on, store_unit_type, store_unit_type_ids, store_villages, story, swarm, sub_achievement, switch, sync_variable;

T:

target, team, teleport (ability, action), teleport_anim, terrain, terrain_defaults, terrain_graphics, terrain_mask, terrain_type, test, test_condition, test_do_attack_by_id, text_input, textdomain, theme, then, tile, time, time_area, topic, toplevel, trait, transform_unit, traveler, true, tunnel;

U:

unhide_unit, unit, unit_overlay, unit_type, unit_worth, units, unlock_view, unpetrify, unstore_unit, unsynced;

V:

value, variable, variables, variant, variation, victory_anim, village, vision_costs, volume;

W:

while, wml_message, wml_schema;

Z:

zoom;

This page contains a formal grammar of the Wesnoth domain-specific languages, including WML and its preprocessor. It does not attempt to capture any of the ways that the Wesnoth engine may interpret a string, such as WML variable substitution. It also doesn't fully capture the potential consequences of macros, for example the use of unbalanced WML tags. The syntax used is regular-expression-like (which is not quite the same as regex-like!), with the following conventions:

  • Literal values are enclosed in either 'single quotes' or "double quotes".
  • Square brackets enclose character classes, with initial ^ inverting them
  • Whitespace within an expression (unless quoted) is used only for readability or to separate non-terminals
  • The meta-characters * + ? | have the same meaning as is typical in regular expressions
  • The sequence «tab» represents a tab character, and «nl» represents an end-of-line character or character sequence
  • Multiple definitions of a non-terminal are equivalent to alternation (ie, x:=4 and x:=7 combine to produce x:=4|7)

WML Preprocessor

The WML preprocessor knows little of the grammar of the WML language itself; it is primarily just a text-substitution engine. Currently this is just a draft and may not be entirely accurate.

preproc_doc := (preproc_directive | preproc_line)*
preproc_directive := simple_directive | macro_definition | if_block
preproc_line := (preproc_text | '<<' macro_free_text '>>' | macro_inclusion)* comment? «nl»
preproc_text := (preproc_char | '<' preproc_char)* '<'?
preproc_char := [^<{#«nl»]
macro_free_text := (macro_free_char | '>' macro_free_char)*
macro_free_char := [^>]
macro_inclusion := '{' ([^}]+ | macro_function) '}'
macro_function := macro_name_char+ (macro_argument)*
macro_name_char := [^} «tab»]
macro_argument := (macro_name_char | macro_inclusion)+
macro_argument := '(' preproc_doc? ')' | '_'? '"' ([^}"]
macro_argument := ('""' | macro_inclusion)* '"'
macro_argument := '<<' macro_free_text '>>'
comment := '#' [^«nl»]+ «nl»
ws := ' ' | «tab»
simple_directive := '#undef' ws+ macro_name_char+ ws* «nl»
simple_directive := ('#warning' | '#error') ws+ [^«nl»]* «nl»
macro_definition := '#define' ws+ macro_name_char+ (ws+ macro_name_char+)* «nl» (opt_arg_definition)* (simple_directive | if_block | preproc_line)+ '#enddef' «nl»
opt_arg_definition := '#arg' ws+ macro_name_char+ «nl» (simple_directive | if_block | preproc_line)+ '#endarg' «nl»
if_block := (ifdef_header | ifver_header | ifhave_header) «nl» preproc_doc ('#else' «nl» preproc_doc)? '#endif' «nl»
ifdef_header := ('#ifdef' | '#ifndef') ws+ macro_name_char+
ifver_header := ('#ifver' | '#ifnver') ws+ macro_name_char+ ws* comparison_op ws* version_string
ifhave_header := ('#ifhave' | '#ifnhave') ws+ [^«nl»]+
comparison_op := '<' | '<=' | '==' | '!=' | '>=' | '>'
version_string := integer ('.' integer)*
integer := [0-9]+

WML

This grammar describes WML after the preprocessor is finished with it, and as such does not account for macros, preprocessor directives, or comments. It also assumes tokenization has already occurred and thus does not specify whitespace, except for newlines. Note that it omits the requirement for opening and closing tags to match.

wml_doc := (wml_tag | wml_attribute)*
wml_tag := '[' '+'? wml_name ']' wml_doc '[/' wml_name ']'
wml_name := [a-zA-Z0-9_]+
wml_attribute := textdomain? wml_key_sequence '=' wml_value «nl»
wml_key_sequence := wml_name (',' wml_name)*
wml_value := wml_value_component ('+' («nl» textdomain?)? wml_value_component)*
wml_value_component := text | '_'? string | '_'? raw_string

text := [^+«nl»"]*
string := '"' ([^"] | '""')* '"'
raw_string := '<<' ([^>] | >[^>])* '>>'
textdomain := '#textdomain' [a-zA-Z0-9_-]+ «nl»

WML Substitutions

This grammar describes the syntax of WML substitutions, the syntax used to specify that variables should be substituted into the value of a WML attribute. The grammar here describes a single placeholder, without regard to the fact that they can be nested. Thus, parsing a string using this grammar would only succeed if done from right to left while performing the substitutions.

wml_substitution := wml_var | wml_formula | '$|'
wml_var := '$' wml_var_path (wml_var_default | '|')?
wml_var_path := (wml_var_name wml_var_index? '.')* wml_var_name
wml_var_name := [a-zA-Z0-9_]+
wml_var_index := '[' [0-9]+ ']'
wml_var_default := '?' [^|]+ '|'
wml_formula := '$' '(' wfl_document ')'

Wesnoth Formula Language

This grammar describes the syntax of the Wesnoth Formula Language. Though it specifies the format of comments and file markers, they are not integrated into the main grammar since it treats the equivalently to whitespace. The grammar may not be completely accurate to the actual in-game parser.

wfl_comment := '#' [^#]* '#'
wfl_file_run := 'wfl' wfl_string «any token»* 'wflend'
wfl_document := wfl_function_definition* wfl_formula
wfl_function_definition := 'def' wfl_name '(' wfl_function_args ')' wfl_formula ';'
wfl_function_args := (wfl_function_arg (',' wfl_function_arg)*)?
wfl_function_arg := wfl_name '*'?
wfl_formula := 'not'? where_expression
wfl_formula := bracketed_expression
bracketed_expression := '(' wfl_formula ')'
where_expression := (boolean_or_expression | bracketed_expression) ('where' wfl_variables)*
wfl_variables := wfl_variable (',' wfl_variable)*
wfl_variable := wfl_name '=' wfl_formula
boolean_or_expression := (boolean_and_expression | bracketed_expression) ('or' wfl_formula)*
boolean_and_expression := (comparison_expression | bracketed_expression) ('and' wfl_formula)*
comparison_expression := (containment_expression | bracketed_expression) (comparison_op wfl_formula)*
comparison_op := '=' | '!=' | '<' | '>' | '<=' | '>='
containment_expression := (range_expression | bracketed_expression) ('in' wfl_formula)*
range_expression := (additive_expression | bracketed_expression) ('~' wfl_formula)*
additive_expression := negation_opt? (multiplicative_expression | bracketed_expression) (additive_op wfl_formula)*
negation_op := '-' | '+'
additive_op := '-' | '+' | '..'
multiplicative_expression := (exponent_expression | bracketed_expression) (multiplicative_op wfl_formula)*
muliplicative_op := '*' | '/' | '%'
exponent_expression := (wfl_formula '^')* (dice_expression | bracketed_expression)
dice_expression := (dot_expression | bracketed_expression) ('d' wfl_formula)*
dot_expression := (wfl_value | bracketed_expression) ('.' wfl_formula)*
wfl_value := 'functions' | wfl_name | wfl_number | wfl_string | wfl_container | wfl_function_call
wfl_name := [a-zA-Z_]+
wfl_number := [0-9]+ ('.' [0-9]+)?
wfl_string := "'" ([^'[]+ | wfl_string_subst | wfl_string_escape)* "'"
wfl_string_subst := '[' wfl_formula ']'
wfl_string_escape := "[']" | '[(]' | '[)]'
wfl_container := '[' ('->' | wfl_expression_list | wfl_key_value_list)? ']'
wfl_expression_list := wfl_formula (',' wfl_formula)*
wfl_key_value_list := wfl_formula '->' wfl_formula (',' wfl_formula '->' wfl_formula)*
wfl_function_call := wfl_name '(' wfl_expression_list? ')'
This page was last edited on 24 February 2024, at 16:24.