Difference between revisions of "TranslatorShellscript"

Latest revision as of 10:01, 2 April 2021

Feel free to post improvements as well. In which languages it does not work at all? What kind of shell script are needed as well?

1 First of all
2 poEdits Wrath
3 Trailing Spaces
4 Extracting a target language script
5 Extracting an English script
6 Extracting a script with original and translated strings
7 Extracting a list of characters
8 Translating without knowing English?
9 Style scripts following the typographic style guide
10 Comments

First of all

Prior to use the shell scripts, the po-File needs to be opened and saved with poEdit. The reason is, that then the texts are saved in separated lines without interfering /n's. As result, a normal entry consists of 4 lines, where one line contains the "characters text" and a prior line the "characters name". Without saving the file with poEdit, the number of text-lines would be undefined.

Or using msgcat (part of GNU gettext), save the file with the --no-wrap switch,

 msgcat --no-wrap file1.po -o file2.po

poEdits Wrath

Anyhow, those changes introduced by poEdit should be undone prior sending the new po-File to your language maintainer. Especially, it must be undone prior to "diff" the new version against the old one. This may be done by executing the following:

  msgcat file.po -o file.po

Now, if you

  diff -u SVN.file.po file.po > file.po.diff

Your "diff" will only contain your changes and will look much nicer - your language maintainer will like you even more :). (see GettextForTranslators#FAQ)

Trailing Spaces

At the end of each translated "string", spaces should be avoided. Checking might be done using

sed -n '/ "$/{N;s/\n$//p}' de.po

in a shell. (Checks whether there is a trailing space and the next line is empty.)

Extracting a target language script

For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (target language only), e.g. to read it in a bus, a train or in a car. Thanks to the programmer(s) that included the "speaker" in the comments of the po-files (Ivanovic?)! Those are definitely helpful and make this approach possible.

An even nicer extraction method without saving via poedit is the following :

> sed -n 's/#\. $\[scenario\]: id=.\+$$/\n\n\1/p;s/#\. $\[part\]$$/\n\1/p;t msgid;s/#\. \[message\]: speaker=$.\+$$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=$.\+$$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid $".*"$/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;;s/".*"//;x;s/msgstr $".*"$/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$//;p' ./de.po > target-language-script.txt

(derived from the SED-command below- thanks to Soliton)

I load this text file into a editor to print it in two columns with the name of the "speaker" intended (like in script book).

Extracting an English script

For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (english only), e.g. to read it in a bus. This one is even more preliminary than the one before. It still contains some target language parts.

Extracting a script with original and translated strings

Similar as above. A sed script to extract dialog and story parts in the form:

[scenario]: id=<id>

[part]

one-line version:

sed -n 's/#\. $\[scenario\]: id=.\+$$/\n\n\1/p;s/#\. $\[part\]$$/\n\1/p;t msgid;s/#\. \[message\]: speaker=$.\+$$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=$.\+$$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid $".*"$/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;p;s/".*"//;x;s/msgstr $".*"$/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$/\n/;p' de.po

It works on unmodified po files as well. (no re-saving in poedit necessary)

Extracting a list of characters

Be aware, that you should be in the right directory. It contains *.cfg files for each campaign-scenario.

cd <wesnoth-installpath>/data/campaigns/Northern_Rebirth/scenarios

sed -n '/\[unit\]/,/\[\/unit\]/ba;/\[side\]/,/\[\/side\]/ba;b;:a s/^ *id *= *"\?$[^"]*$"\?/\1/p' *.cfg | sort | uniq -c | sort -nr > ~/NRcharacterlist.txt

(Looks for id= keys in [side] and [unit] tags and extracts the value then sorts them by number of occurrences.)

You can leave off the "-c | sort -nr" part if you don't care about the number of occurrences.

Translating without knowing English?

(This was copied from the Wesnoth translation forum, a thread and an brilliant idea from User:Viliam http://www.wesnoth.org/forum/viewtopic.php?f=7&t=8817)

Have you found someone willing to translate Wesnoth to a language X (or willing to help you translating to your language), but then you found that the person does not speak English? This is no longer a problem!

My solution is a Perl script that will enrich your PO file with translations to other language(s), inserted as comments to the English phrases. When you work with "poEdit" application and click on the phrase, you will see translations in other language(s) in the comment window; so you can translate the phrase without a good knowledge of English, or you can have an advantage of using translation to language similar to your one.

You can immediately use the compiled MO files in game, because the PO file remains an "English to language X" translation. Only comments are changed, nothing else.

Examples:

I have a Slovak translation "sk.po", and would like to use a Czech translation as help:

  multi_po.pl sk.po --add cs.po CS > sk_multi.po

In file "sk_multi.po" there is for every phrase added corresponding Czech translation from "cs.po" file, labeled as "CS". (My original comments in "sk.po" file are there, too.)

Before sending to translation maintainers, I remove unnecessary comments:

  multi_po.pl sk_multi.po --remove > sk.po

The Czech translations are removed. (My original comments remained.)

It is possible to add more languages to one PO file. To use newer version of translation, you must remove the translations, and add them again. Your comments will remain (unless you use "{ ... }" as substring in them).

If someone finds this helpful, please report to me any bugs found. (That is... be careful and backup your PO files. This is a program written today, not much tested yet.)

http://www.wesnoth.org/forum/download/file.php?id=5872

Style scripts following the typographic style guide

Here, I show possibilities using an UNIX-environment (Windows-user - check "cygwin" or similar for getting something like this). I like to use pipes ("|"), as you may easily put those commandos one after another as you need it for your purpose.

Those scripts only work on po-files re-saved with po-Edit, thus guaranteeing "single-line strings". The "single-line string" is important, as it is a simple way to separate the original from the translated string. Otherwise you will introduce changes in the original strings as well and subsequent patching your translation to the game might fail.

No double spaces

You should get rid of double (multiple) spaces in your translation strings:

  cat de.po | sed '/msgstr/s/ \+/ /g' > de.po1

Ellipsis instead of three dots

Use only if this applies to your language - English does not yet implement it.

  cat de.po | sed '/msgstr/s/\.\.\./…/g' > de.po1

Check all lines for "irregular leftovers" of multiple dots (as a line like "........" would leave)

  cat de.po1 | grep 'msgstr' | grep '…\.'

gives you a list of those.

Dashes

This is kind of tricky as the sophisticated use of different dashes explained in Typography_Style_Guide#Dashes might not be used in all translations.

The following command line introduces the "long" em-dash in all cases that "involves spaces and minuses". This might be correct in most cases.

  cat de.po | sed '/msgstr/s/ - / — /g;/msgstr/s/ -/ —/g;/msgstr/s/- /— /g' > de.po1

Again, check the changed lines on occurrences of "real minuses" or "concatenated words"

  cat de.po1 | grep 'msgstr' | grep '—'

and reverse the "wrong dashes" back to "minus" using po-Edit.

Apostrophes

This command line exchanges all occurrences of the keyboard »'« with the curly one »’«. This might not be applicable for all languages depending on its use.

  cat de.po |  sed "/msgstr/s/'/’/g" > de.po1

Use of » or « for Quotes

If your language team decided on using » and « as quotation marks (instead of " or ' or similar) you may type >> and << in your po-file and the command line below changes it into the nice ones:

  cat de.po | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' > de.po1

Fusing commands

Use the following style:

  cat de.po | sedcommand1 | sedcommand2 | sedcommand3 > de.po1

e.g. changing apostrophes and quotes and ellipsis in one step and producing a new file called "de.po1":

  cat de.po |  sed "/msgstr/s/'/’/g" | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' | sed '/msgstr/s/\.\.\./…/g' > de.po1

Reminder: use msgattrib (see above) for reformatting the po-file prior using the "diff command".

Comments

Disclaimer: This scripts are far away from "flawless". Anyhow, try it, like it, use it; dislike it, improve it :).

And, yes, I know, its programming (the ones I did) is pathetic :). And, yes, more knowledge about WML would definitely improve it. Or a higher frustration threshold... :). Or the accuracy, a real programmer would have. Hence, improvements might be a very good idea - If anybody has suggestions, feel free to contact me or just post it here. Or even realize them on a higher level - why not writing a script to transform it in TEX - per aspera ad astra :).

@@ Line 1: / Line 1: @@
 Feel free to post improvements as well. In which languages it does not work at all? What kind of shell script are needed as well?
-Thanks to the programmer(s) that included the "speaker" in the comments of the po-files (Ivanovic?)! Those are definitely helpful and make this approach possible.
 == First of all ==
 Prior to use the shell scripts, the po-File needs to be opened and saved with poEdit. The reason is, that then the texts are saved in separated lines without interfering  /n's. As result, a normal entry consists of 4 lines, where one line contains the "characters text" and a prior line the "characters name". Without saving the file with poEdit, the number of text-lines would be undefined.
+Or using msgcat (part of GNU gettext), save the file with the ''--no-wrap'' switch,
+  msgcat --no-wrap file1.po -o file2.po
 == poEdits Wrath ==
 Anyhow, those changes introduced by poEdit should be undone prior sending the new po-File to your language maintainer. Especially, it must be undone prior to "diff" the new version against the old one. This may be done by executing the following:
-    msgattrib file.po > file.po1
+    msgcat file.po -o file.po
-   mv file.po1 file.po
 Now, if you
@@ Line 20: / Line 20: @@
 At the end of each translated "string", spaces should be avoided. Checking might be done using
-::cat de.po | grep '\. \"'
+::sed -n '/ "$/{N;s/\n$//p}' de.po
-in a shell.
+in a shell. (Checks whether there is a trailing space and the next line is empty.)
 == Extracting a target language script ==
-For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (target language only), e.g. to read it in a bus, a train or in a car.
+For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (target language only), e.g. to read it in a bus, a train or in a car. Thanks to the programmer(s) that included the "speaker" in the comments of the po-files (Ivanovic?)! Those are definitely helpful and make this approach possible.
+::>cat ./de.po | grep -v "msgid \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgstr/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g"   > target-language-script.txt
+An even nicer extraction method without saving via poedit is the following :
+::> sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$//;p' ./de.po > target-language-script.txt
-I load this textfile into a editor to print it in two columns with the name of the "speaker" intended (like in script book).
+(derived from the SED-command below- thanks to Soliton)
-::>cat ./de.po | grep -v "msgid \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgstr/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g"   > target-language-script.txt
+I load this text file into a editor to print it in two columns with the name of the "speaker" intended (like in script book).
 == Extracting an English script ==
@@ Line 38: / Line 43: @@
 ::>cat de.po | grep -v "msgstr \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgid/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g"   > english.txt
-== Extracting a list of characters ==
+== Extracting a script with original and translated strings ==
+Similar as above. A sed script to extract dialog and story parts in the form:
+:[scenario]: id=<id>
+:<original title>
+:<translated title>
+:[part]
+:<original story part>
+:<translated story part>
-=== using the command line ===
+:<speaker>: <original dialog>
-Be aware, that you should be in the right directory. It contains *.cfg files for each campaign-scenaio.
+:<speaker>: <translated dialog>
-::> cd <wesnoth-installpath>/data/campaigns/Northern_Rebirth/scenarios
+::one-line version:
-::>cat *.cfg | grep "id=" | grep -v "_" | sed 's/^[ \t]*//;s/[ \t]*$//' | sort |uniq -c | sort -nr > ~/NRcharacterlist.txt
+::sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;p;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$/\n/;p' de.po
-=== using a shell script ===
+It works on unmodified po files as well. (no re-saving in poedit necessary)
-save the following as script, e.g. named "extChar.sh". Use it within the correct directory (see above) with the target file as option. Use "chmod" to make it executable.
+== Extracting a list of characters ==
--------
-::''#bash
-::''# extract characters/personnames from wesnoth cfg files
+Be aware, that you should be in the right directory. It contains *.cfg files for each campaign-scenario.
-::cat *.cfg | grep "description=" | grep -v "_" | sed 's/^[ \t]*//;s/[ \t]*$//' | sort |uniq -c | sort -nr > ~/Desktop/$1
--------
-step 2 - if you want to have the characters list only
+::cd <wesnoth-installpath>/data/campaigns/Northern_Rebirth/scenarios
+::sed -n '/\[unit\]/,/\[\/unit\]/ba;/\[side\]/,/\[\/side\]/ba;b;:a s/^ *id *= *"\?\([^"]*\)"\?/\1/p' *.cfg | sort | uniq -c | sort -nr > ~/NRcharacterlist.txt
+(Looks for id= keys in [side] and [unit] tags and extracts the value then sorts them by number of occurrences.)
-::>cat NRcharacterlist.txt | cut -d '=' -f 2
+You can leave off the "-c | sort -nr" part if you don't care about the number of occurrences.
 == Translating without knowing English? ==
@@ Line 75: / Line 88: @@
 I have a Slovak translation "sk.po", and would like to use a Czech translation as help:
-multi_po.pl sk.po --add cs.po CS > sk_multi.po
+   multi_po.pl sk.po --add cs.po CS > sk_multi.po
 In file "sk_multi.po" there is for every phrase added corresponding Czech translation from "cs.po" file, labeled as "CS". (My original comments in "sk.po" file are there, too.)
@@ Line 81: / Line 94: @@
 Before sending to translation maintainers, I remove unnecessary comments:
-multi_po.pl sk_multi.po --remove > sk.po
+   multi_po.pl sk_multi.po --remove > sk.po
 The Czech translations are removed. (My original comments remained.)
@@ Line 92: / Line 105: @@
 http://www.wesnoth.org/forum/download/file.php?id=5872
+== Style scripts following the typographic style guide ==
+Here, I show possibilities using an UNIX-environment (Windows-user - check "cygwin" or similar for getting something like this). I like to use pipes ("|"), as you may easily put those commandos one after another as you need it for your purpose.
+Those scripts only work on po-files re-saved with po-Edit, thus guaranteeing "single-line strings". The "single-line string" is important, as it is a simple way to separate the original from the translated string. Otherwise you will introduce changes in the original strings as well and subsequent patching your translation to the game might fail.
+=== No double spaces ===
+You should get rid of double (multiple) spaces in your translation strings:
+   cat de.po | sed '/msgstr/s/ \+/ /g' > de.po1
+=== Ellipsis instead of three dots ===
+Use only if this applies to your language - English does not yet implement it.
+   cat de.po | sed '/msgstr/s/\.\.\./…/g' > de.po1
+Check all lines for "irregular leftovers" of multiple dots (as a line like "........" would leave)
+   cat de.po1 | grep 'msgstr' | grep '…\.'
+gives you a list of those.
+=== Dashes ===
+This is kind of tricky as the sophisticated use of different dashes explained in [[Typography_Style_Guide#Dashes]] might not be used in all translations.
+The following command line introduces the "long" em-dash in all cases that "involves spaces and minuses". This might be correct in most cases.
+   cat de.po | sed '/msgstr/s/ - / — /g;/msgstr/s/ -/ —/g;/msgstr/s/- /— /g' > de.po1
+Again, check the changed lines on occurrences of "real minuses" or "concatenated words"
+   cat de.po1 | grep 'msgstr' | grep '—'
+and reverse the "wrong dashes" back to "minus" using po-Edit.
+=== Apostrophes ===
+This command line exchanges all occurrences of the keyboard »'« with the curly one »’«. This might not be applicable for all languages depending on its use.
+   cat de.po |  sed "/msgstr/s/'/’/g" > de.po1
+=== Use of » or « for Quotes ===
+If your language team decided on using » and « as quotation marks (instead of " or ' or similar) you may type >> and << in your po-file and the command line below changes it into the nice ones:
+   cat de.po | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' > de.po1
+=== Fusing commands ===
+Use the following style:
+   cat de.po | sedcommand1 | sedcommand2 | sedcommand3 > de.po1
+e.g. changing apostrophes and quotes and ellipsis in one step and producing a new file called "de.po1":
+   cat de.po |  sed "/msgstr/s/'/’/g" | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' | sed '/msgstr/s/\.\.\./…/g' > de.po1
+Reminder: use msgattrib (see above) for reformatting the po-file prior using the "diff command".
 == Comments ==
@@ Line 102: / Line 158: @@
 [[Category:Translations]]
+[[Category:Tools]]