Difference between revisions of "GettextForWesnothDevelopers"

From The Battle for Wesnoth Wiki
m (recategorizing)
(Reusing mainline translations: the wesnoth-test textdomain has been removed)
(40 intermediate revisions by 7 users not shown)
Line 1: Line 1:
This page is used to help Wesnoth developers to work with the internationalization (i18n) system, based on GNU gettext.
+
This page is used to help Wesnoth developers and UMC authors to work with the internationalization (i18n) system, based on GNU gettext.
  
Warning: this page still contains a couple of outdated items to be removed.
+
==  General design of gettext use  ==
 +
 
 +
Gettextized programs usually contain the English strings within the source code, with calls like ''puts (_("Hello world."));'', so that the binary can work (in English) when the system does not support i18n.
 +
 
 +
=== Textdomains ===
 +
 
 +
Gettext splits translations into domains. For Wesnoth, the general idea is to use distinct textdomains for each campaign or add-on, so that UMC authors can easily ship translations together with their campaigns. These domains are covered in more depth in [[GettextForTranslators]].
 +
 
 +
The convention is to name each domain using the name of the add-on, or just its initials. For example, ''wesnoth-utbs'' or ''wesnoth-Son_of_Haldric''. For UMC, it probably makes sense to use the full name to ensure that it doesn't clash with another add-on.
 +
 
 +
=== Caret hints ===
 +
 
 +
Some strings look the same in English but should not necessarily look identical in translations. To handle this, those strings can be prefixed with any descriptive string and a '''^''' character. For users viewing in '''en_US''', these hints will be automatically removed from the string before showing it to the user.
 +
 
 +
{{DevFeature1.15|2}} if the string contains more than one '''^''', the descriptive string ends at the first '''^''', everything following the first '''^''' will be shown to the user.
 +
 
 +
{{DevFeature1.15|18}} When using gettext's Plural Forms, these prefixes can and should be used in both the singular and the plural.
 +
 
 +
=== UTF-8 ===
 +
 
 +
For translation, all C++, WML and Lua files should be in UTF-8. As noted in the [[Typography_Style_Guide]], some punctuation should be used that's outside of the ASCII subset.
 +
 
 +
==  Marking up strings in C++  ==
 +
 
 +
In C++, you can mark up strings for translations using the <syntaxhighlight lang=c++ inline>_("A translation")</syntaxhighlight> and <syntaxhighlight lang=c++ inline>_n("Translation", "Translations", int)</syntaxhighlight> macros. The <code>_n</code> macro is to be used if the string has a singular and plural form.
 +
 
 +
If the string contains any placeholders, do '''not''' use <code>snprintf</code>. Use <code>vgettext</code> instead, or <code>vngettext</code> for any int placeholders.
 +
 
 +
You can also add comments for translators directly above the string - use the keyword <code>TRANSLATORS:</code> for that. The comment must be placed in the line ''immediately'' above the translateable string, like this:
 +
 
 +
<syntaxhighlight lang=c++>
 +
int handfuls = 2;
 +
const std::string translated_text = vngettext(
 +
    // TRANSLATORS: Yum!
 +
    "$handfuls handful of $taste potatoes",
 +
    "$handfuls handfuls of $taste potatoes",
 +
    handfuls,
 +
    utils::string_map({ {"handfuls", handfuls}, {"taste", "yummy"} }));
 +
</syntaxhighlight>
 +
 
 +
The following code will ''not'' work for including the comment:
 +
 
 +
<syntaxhighlight lang=c++>
 +
int handfuls = 2;
 +
// TRANSLATORS: Yuck!
 +
const std::string translated_text = vngettext(
 +
    "$handfuls handful of $taste potatoes",
 +
    "$handfuls handfuls of $taste potatoes",
 +
    handfuls,
 +
    utils::string_map({ {"handfuls", handfuls}, {"taste", "yucky"} }));
 +
</syntaxhighlight>
  
==  Current status  ==
+
You can also use multiline comments:
  
Gettext support is available beginning with version 0.8.3.
+
<syntaxhighlight lang=c++>
 +
int handfuls = 2;
 +
const std::string translated_text = vngettext(
 +
    /* TRANSLATORS: Yum!
 +
      Best potatoes ever! */
 +
    "$handfuls handful of $taste potatoes",
 +
    "$handfuls handfuls of $taste potatoes",
 +
    handfuls,
 +
    utils::string_map({ {"handfuls", handfuls}, {"taste", "yummy"} }));
 +
</syntaxhighlight>
  
The support is there; some calls to the old API remain, and should be
+
By default, all strings in C++ belong to the "wesnoth" textdomain. If a different textdomain is required, you can add a textdomain binding at the top of the source file, before any include statements. A textdomain binding looks like this: <syntaxhighlight lang=c++>#define GETTEXT_DOMAIN "wesnoth-lib"</syntaxhighlight>
handled on a case by case basis before the 1.0 release; most translated strings will be automatically
 
imported.
 
  
==  How to manage the translation work once the transition has been done  ==
+
You should avoid placing translatable strings in C++ headers if at all possible. Though there are a few places where it may be unavoidable, such as if templates are in use, it creates the risk of the strings sometimes being looked up in the wrong textdomain if the header is included in multiple files with different textdomains. If possible, always factor the translatable strings out into a source file.
  
We should definitely talk with other projects already accustomed to gettext.  The main issue I see is interaction with SVN.
+
== Marking up strings in WML ==
  
I think we should really never touch po/*.po directly for translating, or you get annoying conflicts when someone
+
=== The textdomain bindings ===
commits the file you're editing behind your back.  A possible solution is to copy them before editting, and use
 
''msgmerge'' to merge in any cvs-updated version, and have only one person (or well-synchronized group of person)
 
committing them.
 
  
==  How to move strings from one textdomain to another  ==
+
All files with translatable strings must declare which textdomain they use, which is normally done by putting ''#textdomain'' on the first line of each .wml file. See the example below:
  
* run ''make -C po update-po'' and commit, to be sure to only commit your own changes
+
<syntaxhighlight lang=wml>
* move the file into the corect po/*/POTFILES.in
+
#textdomain wesnoth-Son_of_Haldric
* add or change ''#define GETTEXT_DOMAIN "wesnoth-lib"'' at top of the file, before the includes
 
* update the target POT file to include the new strings in its template (eg. ''make -C po/wesnoth-editor
 
wesnoth-editor.pot-update'')
 
* copy the translations using utils/po2po (eg. ''./utils/po2po wesnoth wesnoth-editor'')
 
* update the source POT file to get rid of the old strings (eg. ''make -C po/wesnoth update-po''), then preferably
 
remove the translation from obsolete strings in all languages, to make sure, in case the strings have to move back,
 
that
 
any translation update gets used instead of the current one)
 
* check ''cvs diff'' and commit
 
  
'' ''' TODO / Known bugs ''' ''
+
[unit_type]
 +
    id=Mu
 +
    name= _ "Mu"
 +
    # ...
 +
[/unit_type]
 +
</syntaxhighlight>
  
* terrain types should have ID's like we did for units, that will get them out of ''english.cfg''
+
Note that it is highly recommended that the first textdomain binding be on the first line of the file. Otherwise, odd stuff may happen.
* wmlxgettext should be smart about multiple abilities
 
* (unreproducible ?) sometimes, some strings are apparently not extracted from cpp files, eg. ''"Alignment"'' in
 
help.cpp
 
* problem on macosx - see below
 
* A number of currently untranslatable strings, see end of page.
 
  
== Open items  ==
+
=== The translatable strings ===
  
* understand how ''symbols["noun"]'' is currently set in ''playrurn.cpp::delete_recall_unit::button_pressed()'' and
+
To mark a string as translatable, just put an underscore ( _ ) in front of the string you wish to be marked as translatable, like the example below:
find
 
a way to handle that.
 
* understand ''mapgen.cpp::generate_name()'' and find a way to handle that.
 
* find how we will handle hotkeys and their localization
 
(https://savannah.nongnu.org/bugs/?func=detailitem&item_id=9982)
 
* prefix and suffix stuff (eg. ''"game name prefix|"'') should be allowed to be empty in the translation when they are
 
not in English.  Find a clean way to do that, other than using a space (ideas: unicode no-width space ?  use a format
 
string instead, like ''"${name}'s game"'') (**solved**, basically:
 
  
        string_map i18n_symbols;
+
<syntaxhighlight lang=wml>
        i18n_symbols["login"] = preferences::login();
+
name= _ "Mu"
        name_entry_.assign(new gui::textbox(disp_,width-20,
+
</syntaxhighlight>
        vgettext("$login's game", i18n_symbols)));
 
  
)
+
==== Notes to the translators ====
* In the pre-gettext era, the list of languages presented to the user is detected at runtime.  We could do this as
 
well by
 
looking at
 
available wesnoth message catalogs for gettext.  But the values we would get are things like "fr", which are
 
problematic
 
because:
 
** the locales to use are things like fr_FR or fr_CA, and the one to use appears to depend on the system what locales
 
are available on the local system (please prove me I'm wrong), whereas the available translation usually have no area
 
code (eg. "fr").  This makes choosing a locale to get an autodetected translation quite tricky, so we'll hardcode the
 
list for now.
 
** we need a simple way of presenting the user with the name of the language, in that language.  Since those strings
 
should be the same regardless of the current locale being used, gettext itself is of no use here.  Those language
 
names
 
are available on GNU/Linux in ''/usr/share/locale/all_languages'' (thanks Cedric), but we need a portable API to
 
access
 
this data.  Otherwise we can just hardcode a list of known languages - indeed, we start with an hardcoded list (like
 
what gcompris does).
 
  
==  Tasks  ==
+
If you think a translatable string needs additional guidance to be translated properly, you can provide a special comment that will be seen by the translators. Some hints are generated automatically, but in general if you have to wonder whether a hint is needed then it probably is. The context of the scenario isn't obvious in the translation tools, and you can't assume that the strings are shown to the translator in the same order that they appear in the WML file.
  
This is the current state of plans regarding gettextisation.  It is subject to change.  Get in touch with [mailto:ydirson@altern.org yann] to get involved.
+
Just begin the comment with '#po:' or '# po:' above the string in question. This must be on the line (or lines) immediately before the string that the hint applies to:
  
 +
<syntaxhighlight lang=wml>
 +
#po: "northern marches" is *not* a typo for "northern marshes" here.
 +
#po: In archaic English, "march" means "border country".
 +
story=_ "The orcs were first sighted from the north marches of the great forest of Wesmere."
 +
</syntaxhighlight>
  
'' ''' Phase 3 ''' ''
+
The wmlxgettext tool will automatically generate hints for some tags, in addition to hints from '# po:' comments:
  
Goal: finetuning and improvements
+
* For ''[message]'': the ''id'', ''speaker'', ''role'' or ''type'' used to choose the speaker
* Replace the hardcoded list of known languages with autodetection of available translations
+
* For ''[object]'': the ''id''
 +
* For ''[unit]'': the ''id'' and ''unit_type''
 +
* For ''[unit_type]'': the ''id'' and ''race''
 +
* For ''[objective]'': whether it's ''condition=win'' or ''condition=lose''
  
Things that are not i18n'd even with the pre-gettext system:
+
==== Things to avoid ====
* character traits - see http://savannah.nongnu.org/bugs/index.php?func=detailitem&item_id=9716
 
  
Finished things:
+
Note that there are certain things you should never do. For example, '''never''' mark an empty string as translatable, for wmlxgettext (the tool that extracts strings from WML) will abort upon detecting one. Therefore, what is seen below should never be done:
  
* Gettextize the editor
+
<syntaxhighlight lang=wml>
* Allow running in build-tree and still finding translations
+
name= _ ""
* Add a new declaration to WML files to declare a non-default textdomain to be used, allowing campaigns to ship their
+
</syntaxhighlight>
own po file.
 
* Add support in ''po/'' directory for multiple textdomains.
 
  
 +
Also, never put macro arguments in a translatable string, for it will not work. The reason for this is that the preprocessor does its job before gettext, thus gettext will try to replace a string that does not exist. Therefore, what is shown below should not be done:
  
== General design of gettext use ==
+
<syntaxhighlight lang=wml>
 +
name= _ "{TYPE} Mu"
 +
</syntaxhighlight>
 +
 
 +
To show why it will not work:
 +
 
 +
<syntaxhighlight lang=wml>
 +
#define UNIT_NAME TYPE
 +
    name= _ "{TYPE} Mu"
 +
#enddef
 +
 
 +
{UNIT_NAME ( _ "Sword")}
 +
{UNIT_NAME ( _ "Bow")}
 +
</syntaxhighlight>
 +
 
 +
Translation catalogues would have this: "{TYPE} Mu", therefore gettext will look for it even though it will not exist because we, in fact, have these after the preprocessor is done:
 +
 
 +
<syntaxhighlight lang=wml>
 +
name= _ "Sword Mu"
 +
name= _ "Bow Mu"
 +
</syntaxhighlight>
 +
 
 +
Since those are not in the catalogues, they will not get translated.
 +
 
 +
=== Gender-specific strings ===
 +
 
 +
Several tags, including ''[message]'', ''[abilities]'' and ''[trait]'', can choose different strings based on the gender of the unit. In English the two versions are likely to be the same, but other languages may have gender-specific words for 'I' or 'me'.
 +
 
 +
<syntaxhighlight lang=wml>
 +
[message]
 +
    speaker=student
 +
    message= _ "Have you found an orc for me to fight, huh? A troll?"
 +
    female_message= _ "female^Have you found an orc for me to fight, huh? A troll?"
 +
[/message]
 +
</syntaxhighlight>
 +
 
 +
The convention in WML is, as above, to use ''message='' and ''female_message='', with the latter string including the prefix ''female^''. The mechanism also supports ''male_message='', but all units will fall back to using the plain ''message='' value if there isn't gender-specific version that matches their gender.
 +
 
 +
The message is chosen based on the gender of the speaking unit. To change the message based on the gender of another unit requires separate ''[message]'' tags:
 +
 
 +
<syntaxhighlight lang=wml>
 +
[if]
 +
    [have_unit]
 +
        id=student
 +
        gender=male
 +
    [/have_unit]
 +
    [then]
 +
        [message]
 +
            speaker=Delfador
 +
            message= _ "Young man, you have $student_hp hitpoints and a sword. I’m fairly sure you’ll win."
 +
        [/message]
 +
    [/then]
 +
    [else]
 +
        [message]
 +
            speaker=Delfador
 +
            message= _ "female^Young lady, you have $student_hp hitpoints and a sword. I’m fairly sure you’ll win."
 +
        [/message]
 +
    [/else]
 +
[/if]
 +
</syntaxhighlight>
 +
 
 +
Using a macro to encapsulate most of that can be useful. The example above is from the tutorial, after expanding the ''GENDER'' macro which is defined in data/campaigns/tutorial/utils/utils.cfg.
 +
 
 +
=== Reusing mainline translations ===
 +
 
 +
You can reuse translations for strings in mainline domains by using multiple textdomain bindings:
 +
 
 +
<syntaxhighlight lang=wml>
 +
# textdomain wesnoth-Son_of_Haldric
 +
 
 +
[unit_type]
 +
    id=Mu
 +
    name= _ "Mu"
 +
    # ...
 +
 
 +
    [attack]
 +
        id=sword
 +
        #textdomain wesnoth-units
 +
        description= _ "sword"
 +
        # ...
 +
    [/attack]
 +
 
 +
    #textdomain wesnoth-Son_of_Haldric
 +
    # ...
 +
[/unit_type]
 +
</syntaxhighlight>
 +
 
 +
Of course, if you use bindings for multiple textdomains, make sure the right parts of the file are bound to the right domains. Also, never try to use the mainline campaigns’ domains, for there is no guarantee that the mainline campaigns will be available on all setups. So, only use the core domains: wesnoth, wesnoth-editor, wesnoth-lib, wesnoth-help, and wesnoth-units.
 +
 
 +
==== The gettext helper file ====
 +
 
 +
A gettext helper file is a lovely file that makes reusing mainline translations nice and easy, by having all strings that should use a specific textdomain in a single file. It is also more wmllint-friendly.
 +
 
 +
Here is an example of a gettext helper file. The macro names start with 'SOH_' to ensure that they don't clash with another add-on's macros (assuming that this add-on is Son_of_Haldric).
 +
 
 +
<syntaxhighlight lang=wml>
 +
#textdomain wesnoth-lib
 +
 
 +
#define SOH_STR_ICE
 +
_"Ice" #enddef
 +
 
 +
#textdomain wesnoth-units
 +
 
 +
#define SOH_STR_SWORD
 +
_"sword" #enddef
 +
</syntaxhighlight>
 +
 
 +
A typical name for gettext helper files is ''mainline-strings.cfg''.
 +
 
 +
To use it, just wire it into your add-on and use the macros:
 +
 
 +
<syntaxhighlight lang=wml>
 +
[attack]
 +
    id=sword
 +
    name={SOH_STR_SWORD}
 +
    # ...
 +
[/attack]
 +
 
 +
[terrain_type]
 +
    id=ice2
 +
    name={SOH_STR_ICE}
 +
    # ...
 +
[/terrain_type]
 +
</syntaxhighlight>
 +
 
 +
=== Unbalanced WML macros ===
 +
 
 +
WML macros can be ''unbalanced'', meaning that they either include a [tag] without the corresponding [/tag] or a [/tag] before the corresponding [+tag]. These macros are expected to be used in a place where the [tag] is already open. Writing new macros using this isn't recommended; instead please ask in the WML Workshop forum about better ways to do it.
 +
 
 +
When generating the .pot files for translation, wmlxgettext may stop with one of the errors
 +
* error: Son_Of_Haldric/utils/abilities.cfg:29: unexpected closing tag '[/abilities]' outside any scope.
 +
* error: Son_Of_Haldric/utils/abilities.cfg:300: End of WML file reached, but some tags were not properly closed. (nearest unclosed tag is: [abilities])
 +
 
 +
Suppose abilities.cfg line 29 is in the definition of SOH_ABILITY_BLITZ. To get the .pot file generated, the simplest change is to use ''# wmlxgettext'' comments to add the missing opening or closing tags:
 +
 
 +
<syntaxhighlight lang=wml>
 +
# wmllint: unbalanced-on
 +
# wmlxgettext: [abilities]
 +
#define SOH_ABILITY_BLITZ
 +
    [dummy]
 +
        id=soh_blitz
 +
</syntaxhighlight>
 +
... several lines of code, none of which are an ''#enddef'' ...
 +
<syntaxhighlight lang=wml>
 +
[+abilities]
 +
#enddef
 +
# wmlxgettext: [/abilities]
 +
# wmllint: unbalanced-off
 +
</syntaxhighlight>
 +
 
 +
== Marking up strings in Lua ==
 +
 
 +
In Lua code, textdomains are a callable object that looks up a string. This has support for both singular and plural strings. By convention, the name <code>_</code> is usually used for the textdomain object.
 +
 
 +
The following sample code demonstrates how to fetch translatable strings in Lua:
 +
 
 +
<syntaxhighlight lang=lua>
 +
local _ = wesnoth.textdomain "wesnoth"
 +
 
 +
-- Look up a normal string:
 +
local win_condition = _ "Defeat enemy leader(s)"
 +
 
 +
-- Hints for the translators start with "po:", as in WML:
 +
-- po: Floating text shown when a unit with the "feeding" ability gets a kill
 +
local text = stringx.vformat(_"+$value max HP", { value = feeding.value})
 +
</syntaxhighlight>
 +
 
 +
Plural strings are supported since Wesnoth 1.14:
 +
<syntaxhighlight lang=lua>
 +
local turn_count = 5
 +
turn_counter = _("this turn left", "%d turns left", turn_count)
 +
turn_counter = tostring(turn_counter):format(turn_count)
 +
 
 +
-- For readability, the example's strings are slightly different to the real code.
 +
-- The real strings have brackets in the text shown to the player.
 +
</syntaxhighlight>
 +
 
 +
In Wesnoth 1.15, variables can be interpolated using names:
 +
<syntaxhighlight lang=lua>
 +
-- Look up a plural string, using the preferred style (as of Wesnoth 1.15.3):
 +
local turn_count = 5
 +
turn_counter = _("this turn left", "$remaining_turns turns left", turn_count)
 +
turn_counter = turn_counter:vformat{remaining_turns = turn_count}
 +
</syntaxhighlight>
 +
 
 +
== The textdomain tag ==
 +
 
 +
To tell the engine where to search for the .po and .mo files, each textdomain needs a ''[textdomain]'' tag. For add-ons and mainline campaigns, the tag is usually placed inside of the _main.cfg. This is a top-level tag, so should be outside the ''[campaign]'' or ''[modification]'' tag.
 +
 
 +
Translatable strings from C++ and Lua use the same textdomains as WML; this WML tag tells the engine where to search for these strings irrespective of which programming language the string appeared in.
 +
 
 +
<syntaxhighlight lang=wml>
 +
[textdomain]
 +
    name="wesnoth-Son_of_Haldric"
 +
    path="data/add-ons/Son_of_Haldric/translations"
 +
[/textdomain]
 +
</syntaxhighlight>
 +
 
 +
The .po (or .mo) files will be loaded from a subdirectory of the ''translations'' directory.
 +
 
 +
== Generating the .pot and .po files for UMC ==
 +
 
 +
For each language, Wesnoth will search for a .po file containing the translations. How to create that file will be explained below, but first the overview of where it should go.
  
Gettextized programs usually contain the English strings within program code, with calls like ''printf (_("Hello
+
Continuing with the Son of Haldric example, the Swedish translation would be in the file ''data/add-ons/Son_of_Haldric/translations/wesnoth-Son_of_Haldric/sv.po''.
world.
 
"));'', so that the binary can work (in English) when the system does not support i18n.  However, in Wesnoth
 
all
 
strings were moved into ''translations/english.cfg'', and fetched using a label, like in
 
''translate_string("hello_world");''.
 
  
So we will need to put such strings (mostly GUI material) back into the C++ files.  That part will be quite easy,
+
* ''data/add-ons/Son_of_Haldric/translations'' comes from the ''[textdomain]'' tag's ''path''
except
+
* ''wesnoth-Son_of_Haldric'' is the textdomain's name
we'll have to deal with importing existing translations.  We will use the ''wesnoth'' text domain for this (that is, a
+
* ''sv'' is the language code for Swedish. The codes for each language are given in the big table on [https://www.wesnoth.org/gettext/ https://www.wesnoth.org/gettext/] .
single wesnoth.po file for each language).
 
  
The general idea for strings in WML files is to use distinct text domains for each campaign, so that campaign writers
+
Wesnoth 1.14 (but not 1.12) supports reading .po files directly, so when you add the .po file and the new translation should appear as soon as you refresh the cache.
can easily ship
 
translations together with their campaigns. It will require WML files to declare which text domain they belong to.
 
  
If some strings look the same in english but should not necessarily look identical in translations (eg. all those
+
=== Generating the .pot file ===
prefix/suffix strings, many of which are empty in english).  To hande this, those strings can be prefixed with any
 
descriptive string and a ''^'' character, thanks to the ''sgettext'' implementation partly stolen from the gettext
 
manual (eg. ''foo_prefix = _ "foo prefix^"'')
 
  
==  Related efforts  ==
+
The template (.pot) file contains all of the strings that need to be translated in the .po files, but without the translations.
  
Artur Czechowski also has started to work on a transition to gettext, but with a different approach.  His work can be
+
The .pot is generated from WML and Lua files using a tool called wmlxgettext. With Wesnoth 1.14.5 and later, this is shipped with Wesnoth itself as part of the [[Maintenance_tools]] and can be used from the Maintenance Tools' GUI. At the moment it's not documented on that page, but if you follow the instructions to get GUI.pyw running then you'll see there's a wmlxgettext tab.
found at http://blabluga.hell.pl/wesnoth/po-migration/
 
  
Basically, as I (yann) understand it, his current work does not aim at using gettext completely, but only to use the
+
Pre-1.13 instructions on how to get and use it are in Nobun's [https://r.wesnoth.org/p617733 forum posting].
po
 
format, by providing tools to convert back and forth between po and wml.
 
  
I first thought we could use one of his script for the migration, but it seems to rely on an uptodate version of
+
==== Error messages from wmlxgettext ====
sample_translation, which we have not.
 
  
==  Untranslatable Strings  ==
+
If you get the error from ''wmlxgettext'' of "UTF-8 Format error. Can't decode byte 0x91 (invalid start byte).", and the line in question has a curly quotation mark, that likely means that your text editor is using the Windows-1252 character set, and you need to replace the Windows quotes with their Unicode equivalents, see [[Typography_Style_Guide]] and your editor's documentation for more info. The same applies if the error message says 0x92, 0x93 or 0x94.
  
Due to some problems already mentioned earlier some strings remain untranslatable. Others are untranslatable for an as
+
If you get either "unexpected closing tag '[/''something'']' outside any scope" or "End of WML file reached, but some tags were not properly closed. (nearest unclosed tag is: [''something''])" then see [[#Unbalanced_WML_macros]] above.
yet unknown reason. Below is a list of known issues.
 
  
* In wesnoth.po it is possible to translate the Orcish Crossbow, but it appears twice in the units list of the help
+
=== Generating the .po files for each language ===
browser (once untranslated and once translated) -- WILL BE FIXED SOON
 
* The skills, traits and terrain types are untranslated.
 
* In the help browser the names of units into which a unit can promote are untranslated.
 
* Weapons are untranslatable and so are their type (blade,fire,...) or speciality (magical)
 
  
There is a bug report on savannah as well.
+
Each .po file can start as a simple copy of the .pot file. Either the author or the translator copies the template to the language-specific filename, and then the work of [[GettextForTranslators]] happens on those copies.
  
== Non-working translations ==
+
Some .po editors, for example poedit, will recognise that the .pot is a template, and automatically suggest saving to a different filename. The poedit editor can also update a .po file based on changes to the .pot file.
  
There are a number of places in the Help browser where translations exist, but are not shown in-game. ''Clearing the
+
=== Generating the .mo files for UMC ===
cache does not help//. It seems to be independent on language (tested Swedish, French, and Czech), and might therefore
 
be a general error/bug. The following places are affected:
 
  
* Main menu items: Units, Abilities, Weapon Specials, About.
+
For Wesnoth 1.14, it's generally not necessary to compile the .po files to .mo files. The mainline translations still use .mo files for better performance, but UMC authors can skip the .mo compilation stage.
* Fundamentals of Gameplay: sections Recruiting and Recalling, Healing, Income and Upkeep
 
* Abilities menu: submenu items and titles
 
* Traits section text (although the menu item is shown translated)
 
* Weapon Specials menu: submenu items and titles
 
* Terrain menu: submenu items and titles
 
* About menu: section title
 
  
 
== See Also ==
 
== See Also ==
  
 
* [[WesnothTranslations]]
 
* [[WesnothTranslations]]
* ettin's [http://www.ettin.org/vault/wesnoth/i18n/stats.htm wesnoth pofile-based translations statistics]
 
 
* http://www.gnu.org/software/gettext/
 
* http://www.gnu.org/software/gettext/
* KDE's [http://i18n.kde.org/stats/gui/HEAD/index.php translation statistic pages]
+
* [https://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html#Preparing-Strings GNU gettext manual on preparing translatable strings]
* Artur Czechowski's work (see above): http://blabluga.hell.pl/wesnoth/po-migration/
 
 
* [[GetText]]
 
* [[GetText]]
  
 
[[Category:Development]]
 
[[Category:Development]]

Revision as of 13:22, 25 October 2021

This page is used to help Wesnoth developers and UMC authors to work with the internationalization (i18n) system, based on GNU gettext.

General design of gettext use

Gettextized programs usually contain the English strings within the source code, with calls like puts (_("Hello world."));, so that the binary can work (in English) when the system does not support i18n.

Textdomains

Gettext splits translations into domains. For Wesnoth, the general idea is to use distinct textdomains for each campaign or add-on, so that UMC authors can easily ship translations together with their campaigns. These domains are covered in more depth in GettextForTranslators.

The convention is to name each domain using the name of the add-on, or just its initials. For example, wesnoth-utbs or wesnoth-Son_of_Haldric. For UMC, it probably makes sense to use the full name to ensure that it doesn't clash with another add-on.

Caret hints

Some strings look the same in English but should not necessarily look identical in translations. To handle this, those strings can be prefixed with any descriptive string and a ^ character. For users viewing in en_US, these hints will be automatically removed from the string before showing it to the user.

(Version 1.15.2 and later only) if the string contains more than one ^, the descriptive string ends at the first ^, everything following the first ^ will be shown to the user.

(Version 1.15.18 and later only) When using gettext's Plural Forms, these prefixes can and should be used in both the singular and the plural.

UTF-8

For translation, all C++, WML and Lua files should be in UTF-8. As noted in the Typography_Style_Guide, some punctuation should be used that's outside of the ASCII subset.

Marking up strings in C++

In C++, you can mark up strings for translations using the _("A translation") and _n("Translation", "Translations", int) macros. The _n macro is to be used if the string has a singular and plural form.

If the string contains any placeholders, do not use snprintf. Use vgettext instead, or vngettext for any int placeholders.

You can also add comments for translators directly above the string - use the keyword TRANSLATORS: for that. The comment must be placed in the line immediately above the translateable string, like this:

int handfuls = 2;
const std::string translated_text = vngettext(
    // TRANSLATORS: Yum!
    "$handfuls handful of $taste potatoes",
    "$handfuls handfuls of $taste potatoes",
    handfuls,
    utils::string_map({ {"handfuls", handfuls}, {"taste", "yummy"} }));

The following code will not work for including the comment:

int handfuls = 2;
// TRANSLATORS: Yuck!
const std::string translated_text = vngettext(
    "$handfuls handful of $taste potatoes",
    "$handfuls handfuls of $taste potatoes",
    handfuls,
    utils::string_map({ {"handfuls", handfuls}, {"taste", "yucky"} }));

You can also use multiline comments:

int handfuls = 2;
const std::string translated_text = vngettext(
    /* TRANSLATORS: Yum!
       Best potatoes ever! */
    "$handfuls handful of $taste potatoes",
    "$handfuls handfuls of $taste potatoes",
    handfuls,
    utils::string_map({ {"handfuls", handfuls}, {"taste", "yummy"} }));

By default, all strings in C++ belong to the "wesnoth" textdomain. If a different textdomain is required, you can add a textdomain binding at the top of the source file, before any include statements. A textdomain binding looks like this:

#define GETTEXT_DOMAIN "wesnoth-lib"

You should avoid placing translatable strings in C++ headers if at all possible. Though there are a few places where it may be unavoidable, such as if templates are in use, it creates the risk of the strings sometimes being looked up in the wrong textdomain if the header is included in multiple files with different textdomains. If possible, always factor the translatable strings out into a source file.

Marking up strings in WML

The textdomain bindings

All files with translatable strings must declare which textdomain they use, which is normally done by putting #textdomain on the first line of each .wml file. See the example below:

#textdomain wesnoth-Son_of_Haldric

[unit_type]
    id=Mu
    name= _ "Mu"
    # ...
[/unit_type]

Note that it is highly recommended that the first textdomain binding be on the first line of the file. Otherwise, odd stuff may happen.

The translatable strings

To mark a string as translatable, just put an underscore ( _ ) in front of the string you wish to be marked as translatable, like the example below:

name= _ "Mu"

Notes to the translators

If you think a translatable string needs additional guidance to be translated properly, you can provide a special comment that will be seen by the translators. Some hints are generated automatically, but in general if you have to wonder whether a hint is needed then it probably is. The context of the scenario isn't obvious in the translation tools, and you can't assume that the strings are shown to the translator in the same order that they appear in the WML file.

Just begin the comment with '#po:' or '# po:' above the string in question. This must be on the line (or lines) immediately before the string that the hint applies to:

#po: "northern marches" is *not* a typo for "northern marshes" here.
#po: In archaic English, "march" means "border country".
story=_ "The orcs were first sighted from the north marches of the great forest of Wesmere."

The wmlxgettext tool will automatically generate hints for some tags, in addition to hints from '# po:' comments:

  • For [message]: the id, speaker, role or type used to choose the speaker
  • For [object]: the id
  • For [unit]: the id and unit_type
  • For [unit_type]: the id and race
  • For [objective]: whether it's condition=win or condition=lose

Things to avoid

Note that there are certain things you should never do. For example, never mark an empty string as translatable, for wmlxgettext (the tool that extracts strings from WML) will abort upon detecting one. Therefore, what is seen below should never be done:

name= _ ""

Also, never put macro arguments in a translatable string, for it will not work. The reason for this is that the preprocessor does its job before gettext, thus gettext will try to replace a string that does not exist. Therefore, what is shown below should not be done:

name= _ "{TYPE} Mu"

To show why it will not work:

#define UNIT_NAME TYPE
    name= _ "{TYPE} Mu"
#enddef

{UNIT_NAME ( _ "Sword")}
{UNIT_NAME ( _ "Bow")}

Translation catalogues would have this: "{TYPE} Mu", therefore gettext will look for it even though it will not exist because we, in fact, have these after the preprocessor is done:

name= _ "Sword Mu"
name= _ "Bow Mu"

Since those are not in the catalogues, they will not get translated.

Gender-specific strings

Several tags, including [message], [abilities] and [trait], can choose different strings based on the gender of the unit. In English the two versions are likely to be the same, but other languages may have gender-specific words for 'I' or 'me'.

[message]
    speaker=student
    message= _ "Have you found an orc for me to fight, huh? A troll?"
    female_message= _ "female^Have you found an orc for me to fight, huh? A troll?"
[/message]

The convention in WML is, as above, to use message= and female_message=, with the latter string including the prefix female^. The mechanism also supports male_message=, but all units will fall back to using the plain message= value if there isn't gender-specific version that matches their gender.

The message is chosen based on the gender of the speaking unit. To change the message based on the gender of another unit requires separate [message] tags:

[if]
    [have_unit]
        id=student
        gender=male
    [/have_unit]
    [then]
        [message]
            speaker=Delfador
            message= _ "Young man, you have $student_hp hitpoints and a sword. I’m fairly sure you’ll win."
        [/message]
    [/then]
    [else]
        [message]
            speaker=Delfador
            message= _ "female^Young lady, you have $student_hp hitpoints and a sword. I’m fairly sure you’ll win."
        [/message]
    [/else]
[/if]

Using a macro to encapsulate most of that can be useful. The example above is from the tutorial, after expanding the GENDER macro which is defined in data/campaigns/tutorial/utils/utils.cfg.

Reusing mainline translations

You can reuse translations for strings in mainline domains by using multiple textdomain bindings:

# textdomain wesnoth-Son_of_Haldric

[unit_type]
    id=Mu
    name= _ "Mu"
    # ...

    [attack]
        id=sword
        #textdomain wesnoth-units
        description= _ "sword"
        # ...
    [/attack]
   
    #textdomain wesnoth-Son_of_Haldric
    # ...
[/unit_type]

Of course, if you use bindings for multiple textdomains, make sure the right parts of the file are bound to the right domains. Also, never try to use the mainline campaigns’ domains, for there is no guarantee that the mainline campaigns will be available on all setups. So, only use the core domains: wesnoth, wesnoth-editor, wesnoth-lib, wesnoth-help, and wesnoth-units.

The gettext helper file

A gettext helper file is a lovely file that makes reusing mainline translations nice and easy, by having all strings that should use a specific textdomain in a single file. It is also more wmllint-friendly.

Here is an example of a gettext helper file. The macro names start with 'SOH_' to ensure that they don't clash with another add-on's macros (assuming that this add-on is Son_of_Haldric).

#textdomain wesnoth-lib

#define SOH_STR_ICE
_"Ice" #enddef

#textdomain wesnoth-units

#define SOH_STR_SWORD
_"sword" #enddef

A typical name for gettext helper files is mainline-strings.cfg.

To use it, just wire it into your add-on and use the macros:

[attack]
    id=sword
    name={SOH_STR_SWORD}
    # ...
[/attack]

[terrain_type]
    id=ice2
    name={SOH_STR_ICE}
    # ...
[/terrain_type]

Unbalanced WML macros

WML macros can be unbalanced, meaning that they either include a [tag] without the corresponding [/tag] or a [/tag] before the corresponding [+tag]. These macros are expected to be used in a place where the [tag] is already open. Writing new macros using this isn't recommended; instead please ask in the WML Workshop forum about better ways to do it.

When generating the .pot files for translation, wmlxgettext may stop with one of the errors

  • error: Son_Of_Haldric/utils/abilities.cfg:29: unexpected closing tag '[/abilities]' outside any scope.
  • error: Son_Of_Haldric/utils/abilities.cfg:300: End of WML file reached, but some tags were not properly closed. (nearest unclosed tag is: [abilities])

Suppose abilities.cfg line 29 is in the definition of SOH_ABILITY_BLITZ. To get the .pot file generated, the simplest change is to use # wmlxgettext comments to add the missing opening or closing tags:

# wmllint: unbalanced-on
# wmlxgettext: [abilities]
#define SOH_ABILITY_BLITZ
    [dummy]
        id=soh_blitz

... several lines of code, none of which are an #enddef ...

[+abilities]
#enddef
# wmlxgettext: [/abilities]
# wmllint: unbalanced-off

Marking up strings in Lua

In Lua code, textdomains are a callable object that looks up a string. This has support for both singular and plural strings. By convention, the name _ is usually used for the textdomain object.

The following sample code demonstrates how to fetch translatable strings in Lua:

local _ = wesnoth.textdomain "wesnoth"

-- Look up a normal string:
local win_condition = _ "Defeat enemy leader(s)"

-- Hints for the translators start with "po:", as in WML:
-- po: Floating text shown when a unit with the "feeding" ability gets a kill
local text = stringx.vformat(_"+$value max HP", { value = feeding.value})

Plural strings are supported since Wesnoth 1.14:

local turn_count = 5
turn_counter = _("this turn left", "%d turns left", turn_count)
turn_counter = tostring(turn_counter):format(turn_count)

-- For readability, the example's strings are slightly different to the real code.
-- The real strings have brackets in the text shown to the player.

In Wesnoth 1.15, variables can be interpolated using names:

-- Look up a plural string, using the preferred style (as of Wesnoth 1.15.3):
local turn_count = 5
turn_counter = _("this turn left", "$remaining_turns turns left", turn_count)
turn_counter = turn_counter:vformat{remaining_turns = turn_count}

The textdomain tag

To tell the engine where to search for the .po and .mo files, each textdomain needs a [textdomain] tag. For add-ons and mainline campaigns, the tag is usually placed inside of the _main.cfg. This is a top-level tag, so should be outside the [campaign] or [modification] tag.

Translatable strings from C++ and Lua use the same textdomains as WML; this WML tag tells the engine where to search for these strings irrespective of which programming language the string appeared in.

[textdomain]
    name="wesnoth-Son_of_Haldric"
    path="data/add-ons/Son_of_Haldric/translations"
[/textdomain]

The .po (or .mo) files will be loaded from a subdirectory of the translations directory.

Generating the .pot and .po files for UMC

For each language, Wesnoth will search for a .po file containing the translations. How to create that file will be explained below, but first the overview of where it should go.

Continuing with the Son of Haldric example, the Swedish translation would be in the file data/add-ons/Son_of_Haldric/translations/wesnoth-Son_of_Haldric/sv.po.

  • data/add-ons/Son_of_Haldric/translations comes from the [textdomain] tag's path
  • wesnoth-Son_of_Haldric is the textdomain's name
  • sv is the language code for Swedish. The codes for each language are given in the big table on https://www.wesnoth.org/gettext/ .

Wesnoth 1.14 (but not 1.12) supports reading .po files directly, so when you add the .po file and the new translation should appear as soon as you refresh the cache.

Generating the .pot file

The template (.pot) file contains all of the strings that need to be translated in the .po files, but without the translations.

The .pot is generated from WML and Lua files using a tool called wmlxgettext. With Wesnoth 1.14.5 and later, this is shipped with Wesnoth itself as part of the Maintenance_tools and can be used from the Maintenance Tools' GUI. At the moment it's not documented on that page, but if you follow the instructions to get GUI.pyw running then you'll see there's a wmlxgettext tab.

Pre-1.13 instructions on how to get and use it are in Nobun's forum posting.

Error messages from wmlxgettext

If you get the error from wmlxgettext of "UTF-8 Format error. Can't decode byte 0x91 (invalid start byte).", and the line in question has a curly quotation mark, that likely means that your text editor is using the Windows-1252 character set, and you need to replace the Windows quotes with their Unicode equivalents, see Typography_Style_Guide and your editor's documentation for more info. The same applies if the error message says 0x92, 0x93 or 0x94.

If you get either "unexpected closing tag '[/something]' outside any scope" or "End of WML file reached, but some tags were not properly closed. (nearest unclosed tag is: [something])" then see #Unbalanced_WML_macros above.

Generating the .po files for each language

Each .po file can start as a simple copy of the .pot file. Either the author or the translator copies the template to the language-specific filename, and then the work of GettextForTranslators happens on those copies.

Some .po editors, for example poedit, will recognise that the .pot is a template, and automatically suggest saving to a different filename. The poedit editor can also update a .po file based on changes to the .pot file.

Generating the .mo files for UMC

For Wesnoth 1.14, it's generally not necessary to compile the .po files to .mo files. The mainline translations still use .mo files for better performance, but UMC authors can skip the .mo compilation stage.

See Also