Difference between revisions of "TranslatorShellscript"

From The Battle for Wesnoth Wiki
m (Style scripts following the typographic style guide: correcting typos, explained a bit more)
m (added category Tools)
Line 156: Line 156:

Revision as of 10:29, 10 July 2011

Feel free to post improvements as well. In which languages it does not work at all? What kind of shell script are needed as well?

First of all

Prior to use the shell scripts, the po-File needs to be opened and saved with poEdit. The reason is, that then the texts are saved in separated lines without interfering /n's. As result, a normal entry consists of 4 lines, where one line contains the "characters text" and a prior line the "characters name". Without saving the file with poEdit, the number of text-lines would be undefined.

poEdits Wrath

Anyhow, those changes introduced by poEdit should be undone prior sending the new po-File to your language maintainer. Especially, it must be undone prior to "diff" the new version against the old one. This may be done by executing the following:

  msgattrib file.po > file.po1
  mv file.po1 file.po

Now, if you

  diff -u SVN.file.po file.po > file.po.diff

Your "diff" will only contain your changes and will look much nicer - your language maintainer will like you even more :). (see GettextForTranslators#FAQ)

Trailing Spaces

At the end of each translated "string", spaces should be avoided. Checking might be done using

sed -n '/ "$/{N;s/\n$//p}' de.po

in a shell. (Checks whether there is a trailing space and the next line is empty.)

Extracting a target language script

For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (target language only), e.g. to read it in a bus, a train or in a car. Thanks to the programmer(s) that included the "speaker" in the comments of the po-files (Ivanovic?)! Those are definitely helpful and make this approach possible.

>cat ./de.po | grep -v "msgid \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgstr/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g" > target-language-script.txt

An even nicer extraction method without saving via poedit is the following :

> sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$//;p' de.po

(derived from the SED-command below- thanks to Soliton)

I load this text file into a editor to print it in two columns with the name of the "speaker" intended (like in script book).

Extracting an English script

For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (english only), e.g. to read it in a bus. This one is even more preliminary than the one before. It still contains some target language parts.

>cat de.po | grep -v "msgstr \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgid/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g" > english.txt

Extracting a script with original and translated strings

Similar as above. A sed script to extract dialog and story parts in the form:

[scenario]: id=<id>
<original title>
<translated title>
<original story part>
<translated story part>
<speaker>: <original dialog>
<speaker>: <translated dialog>
one-line version:
sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;p;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$/\n/;p' de.po

It works on unmodified po files as well. (no re-saving in poedit necessary)

Extracting a list of characters

Be aware, that you should be in the right directory. It contains *.cfg files for each campaign-scenario.

cd <wesnoth-installpath>/data/campaigns/Northern_Rebirth/scenarios
sed -n '/\[unit\]/,/\[\/unit\]/ba;/\[side\]/,/\[\/side\]/ba;b;:a s/^ *id *= *"\?\([^"]*\)"\?/\1/p' *.cfg | sort | uniq -c | sort -nr > ~/NRcharacterlist.txt

(Looks for id= keys in [side] and [unit] tags and extracts the value then sorts them by number of occurrences.)

You can leave off the "-c | sort -nr" part if you don't care about the number of occurrences.

Translating without knowing English?

(This was copied from the Wesnoth translation forum, a thread and an brilliant idea from User:Viliam http://www.wesnoth.org/forum/viewtopic.php?f=7&t=8817)

Have you found someone willing to translate Wesnoth to a language X (or willing to help you translating to your language), but then you found that the person does not speak English? This is no longer a problem!

My solution is a Perl script that will enrich your PO file with translations to other language(s), inserted as comments to the English phrases. When you work with "poEdit" application and click on the phrase, you will see translations in other language(s) in the comment window; so you can translate the phrase without a good knowledge of English, or you can have an advantage of using translation to language similar to your one.

You can immediately use the compiled MO files in game, because the PO file remains an "English to language X" translation. Only comments are changed, nothing else.


I have a Slovak translation "sk.po", and would like to use a Czech translation as help:

  multi_po.pl sk.po --add cs.po CS > sk_multi.po

In file "sk_multi.po" there is for every phrase added corresponding Czech translation from "cs.po" file, labeled as "CS". (My original comments in "sk.po" file are there, too.)

Before sending to translation maintainers, I remove unnecessary comments:

  multi_po.pl sk_multi.po --remove > sk.po

The Czech translations are removed. (My original comments remained.)

It is possible to add more languages to one PO file. To use newer version of translation, you must remove the translations, and add them again. Your comments will remain (unless you use "{ ... }" as substring in them).

If someone finds this helpful, please report to me any bugs found. (That is... be careful and backup your PO files. This is a program written today, not much tested yet.)


Style scripts following the typographic style guide

Here, I show possibilities using an UNIX-environment (Windows-user - check "cygwin" or similar for getting something like this). I like to use pipes ("|"), as you may easily put those commandos one after another as you need it for your purpose.

Those scripts only work on po-files re-saved with po-Edit, thus guaranteeing "single-line strings". The "single-line string" is important, as it is a simple way to separate the original from the translated string. Otherwise you will introduce changes in the original strings as well and subsequent patching your translation to the game might fail.

No double spaces

You should get rid of double (multiple) spaces in your translation strings:

  cat de.po | sed '/msgstr/s/ \+/ /g' > de.po1

Ellipsis instead of three dots

Use only if this applies to your language - English does not yet implement it.

  cat de.po | sed '/msgstr/s/\.\.\./…/g' > de.po1

Check all lines for "irregular leftovers" of multiple dots (as a line like "........" would leave)

  cat de.po1 | grep 'msgstr' | grep '…\.'

gives you a list of those.


This is kind of tricky as the sophisticated use of different dashes explained in Typography_Style_Guide#Dashes might not be used in all translations.

The following command line introduces the "long" em-dash in all cases that "involves spaces and minuses". This might be correct in most cases.

  cat de.po | sed '/msgstr/s/ - / — /g;/msgstr/s/ -/ —/g;/msgstr/s/- /— /g' > de.po1

Again, check the changed lines on occurrences of "real minuses" or "concatenated words"

  cat de.po1 | grep 'msgstr' | grep '—'

and reverse the "wrong dashes" back to "minus" using po-Edit.


This command line exchanges all occurrences of the keyboard »'« with the curly one »’«. This might not be applicable for all languages depending on its use.

  cat de.po |  sed "/msgstr/s/'/’/g" > de.po1

Use of » or « for Quotes

If your language team decided on using » and « as quotation marks (instead of " or ' or similar) you may type >> and << in your po-file and the command line below changes it into the nice ones:

  cat de.po | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' > de.po1

Fusing commands

Use the followoing style:

  cat de.po | sedcommand1 | sedcommand2 | sedcommand3 > de.po1

e.g. changing apostrophes and quotes and ellipsis in one step and producing a new file called "de.po1":

  cat de.po |  sed "/msgstr/s/'/’/g" | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' | sed '/msgstr/s/\.\.\./…/g' > de.po1

Reminder: use msgattrib (see above) for reformatting the po-file prior using the "diff command".


Disclaimer: This scripts are far away from "flawless". Anyhow, try it, like it, use it; dislike it, improve it :).

And, yes, I know, its programming (the ones I did) is pathetic :). And, yes, more knowledge about WML would definitely improve it. Or a higher frustration threshold... :). Or the accuracy, a real programmer would have. Hence, improvements might be a very good idea - If anybody has suggestions, feel free to contact me or just post it here. Or even realize them on a higher level - why not writing a script to transform it in TEX - per aspera ad astra :).