Difference between revisions of "TranslatorShellscript"
(→Extracting a target language script) |
m (→Fusing commands: typo) |
||
(32 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | Feel free to post improvements as well. In which languages it does not work at all? What kind of shell script are needed as well? | |
+ | |||
+ | == First of all == | ||
+ | Prior to use the shell scripts, the po-File needs to be opened and saved with poEdit. The reason is, that then the texts are saved in separated lines without interfering /n's. As result, a normal entry consists of 4 lines, where one line contains the "characters text" and a prior line the "characters name". Without saving the file with poEdit, the number of text-lines would be undefined. | ||
+ | |||
+ | Or using msgcat (part of GNU gettext), save the file with the ''--no-wrap'' switch, | ||
+ | msgcat --no-wrap file1.po -o file2.po | ||
+ | |||
+ | == poEdits Wrath == | ||
+ | Anyhow, those changes introduced by poEdit should be undone prior sending the new po-File to your language maintainer. Especially, it must be undone prior to "diff" the new version against the old one. This may be done by executing the following: | ||
+ | msgcat file.po -o file.po | ||
− | + | Now, if you | |
+ | diff -u SVN.file.po file.po > file.po.diff | ||
− | + | Your "diff" will only contain your changes and will look much nicer - your language maintainer will like you even more :). | |
+ | (see [[GettextForTranslators#FAQ]]) | ||
− | + | == Trailing Spaces == | |
− | + | At the end of each translated "string", spaces should be avoided. Checking might be done using | |
− | + | ::sed -n '/ "$/{N;s/\n$//p}' de.po | |
+ | in a shell. (Checks whether there is a trailing space and the next line is empty.) | ||
== Extracting a target language script == | == Extracting a target language script == | ||
− | For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (target language only), e.g. to read it in a bus, a train or in a car. | + | For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (target language only), e.g. to read it in a bus, a train or in a car. Thanks to the programmer(s) that included the "speaker" in the comments of the po-files (Ivanovic?)! Those are definitely helpful and make this approach possible. |
− | + | ::>cat ./de.po | grep -v "msgid \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgstr/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g" > target-language-script.txt | |
− | ::> | + | An even nicer extraction method without saving via poedit is the following : |
+ | ::> sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$//;p' ./de.po > target-language-script.txt | ||
+ | |||
+ | (derived from the SED-command below- thanks to Soliton) | ||
+ | |||
+ | I load this text file into a editor to print it in two columns with the name of the "speaker" intended (like in script book). | ||
== Extracting an English script == | == Extracting an English script == | ||
Line 25: | Line 43: | ||
::>cat de.po | grep -v "msgstr \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgid/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g" > english.txt | ::>cat de.po | grep -v "msgstr \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgid/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g" > english.txt | ||
− | == Extracting a | + | == Extracting a script with original and translated strings == |
+ | |||
+ | Similar as above. A sed script to extract dialog and story parts in the form: | ||
− | = | + | :[scenario]: id=<id> |
− | + | :<original title> | |
+ | :<translated title> | ||
− | :: | + | :[part] |
− | : | + | :<original story part> |
+ | :<translated story part> | ||
− | + | :<speaker>: <original dialog> | |
+ | :<speaker>: <translated dialog> | ||
− | + | ::one-line version: | |
− | + | ::sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;p;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$/\n/;p' de.po | |
− | |||
− | + | It works on unmodified po files as well. (no re-saving in poedit necessary) | |
− | |||
− | |||
− | + | == Extracting a list of characters == | |
+ | |||
+ | Be aware, that you should be in the right directory. It contains *.cfg files for each campaign-scenario. | ||
+ | |||
+ | ::cd <wesnoth-installpath>/data/campaigns/Northern_Rebirth/scenarios | ||
+ | ::sed -n '/\[unit\]/,/\[\/unit\]/ba;/\[side\]/,/\[\/side\]/ba;b;:a s/^ *id *= *"\?\([^"]*\)"\?/\1/p' *.cfg | sort | uniq -c | sort -nr > ~/NRcharacterlist.txt | ||
+ | (Looks for id= keys in [side] and [unit] tags and extracts the value then sorts them by number of occurrences.) | ||
+ | |||
+ | You can leave off the "-c | sort -nr" part if you don't care about the number of occurrences. | ||
+ | |||
+ | == Translating without knowing English? == | ||
+ | (This was copied from the Wesnoth translation forum, a thread and an brilliant idea from [[User:Viliam]] http://www.wesnoth.org/forum/viewtopic.php?f=7&t=8817) | ||
+ | |||
+ | Have you found someone willing to translate Wesnoth to a language X (or willing to help you translating to your language), but then you found that the person does not speak English? This is no longer a problem! | ||
+ | |||
+ | |||
+ | My solution is a Perl script that will enrich your PO file with translations to other language(s), inserted as comments to the English phrases. When you work with "poEdit" application and click on the phrase, you will see translations in other language(s) in the comment window; so you can translate the phrase without a good knowledge of English, or you can have an advantage of using translation to language similar to your one. | ||
+ | |||
+ | You can immediately use the compiled MO files in game, because the PO file remains an "English to language X" translation. Only comments are changed, nothing else. | ||
+ | |||
+ | |||
+ | Examples: | ||
+ | |||
+ | I have a Slovak translation "sk.po", and would like to use a Czech translation as help: | ||
+ | |||
+ | multi_po.pl sk.po --add cs.po CS > sk_multi.po | ||
+ | |||
+ | In file "sk_multi.po" there is for every phrase added corresponding Czech translation from "cs.po" file, labeled as "CS". (My original comments in "sk.po" file are there, too.) | ||
+ | |||
+ | Before sending to translation maintainers, I remove unnecessary comments: | ||
+ | |||
+ | multi_po.pl sk_multi.po --remove > sk.po | ||
+ | |||
+ | The Czech translations are removed. (My original comments remained.) | ||
+ | |||
+ | It is possible to add more languages to one PO file. To use newer version of translation, you must remove the translations, and add them again. Your comments will remain (unless you use "{ ... }" as substring in them). | ||
+ | |||
+ | |||
+ | If someone finds this helpful, please report to me any bugs found. (That is... be careful and backup your PO files. This is a program written today, not much tested yet.) | ||
+ | |||
+ | http://www.wesnoth.org/forum/download/file.php?id=5872 | ||
+ | |||
+ | == Style scripts following the typographic style guide == | ||
+ | Here, I show possibilities using an UNIX-environment (Windows-user - check "cygwin" or similar for getting something like this). I like to use pipes ("|"), as you may easily put those commandos one after another as you need it for your purpose. | ||
+ | |||
+ | Those scripts only work on po-files re-saved with po-Edit, thus guaranteeing "single-line strings". The "single-line string" is important, as it is a simple way to separate the original from the translated string. Otherwise you will introduce changes in the original strings as well and subsequent patching your translation to the game might fail. | ||
+ | |||
+ | === No double spaces === | ||
+ | You should get rid of double (multiple) spaces in your translation strings: | ||
+ | cat de.po | sed '/msgstr/s/ \+/ /g' > de.po1 | ||
+ | |||
+ | === Ellipsis instead of three dots === | ||
+ | Use only if this applies to your language - English does not yet implement it. | ||
+ | cat de.po | sed '/msgstr/s/\.\.\./…/g' > de.po1 | ||
+ | |||
+ | Check all lines for "irregular leftovers" of multiple dots (as a line like "........" would leave) | ||
+ | cat de.po1 | grep 'msgstr' | grep '…\.' | ||
+ | gives you a list of those. | ||
+ | |||
+ | === Dashes === | ||
+ | This is kind of tricky as the sophisticated use of different dashes explained in [[Typography_Style_Guide#Dashes]] might not be used in all translations. | ||
+ | |||
+ | The following command line introduces the "long" em-dash in all cases that "involves spaces and minuses". This might be correct in most cases. | ||
+ | cat de.po | sed '/msgstr/s/ - / — /g;/msgstr/s/ -/ —/g;/msgstr/s/- /— /g' > de.po1 | ||
+ | |||
+ | Again, check the changed lines on occurrences of "real minuses" or "concatenated words" | ||
+ | cat de.po1 | grep 'msgstr' | grep '—' | ||
+ | and reverse the "wrong dashes" back to "minus" using po-Edit. | ||
+ | |||
+ | === Apostrophes === | ||
+ | This command line exchanges all occurrences of the keyboard »'« with the curly one »’«. This might not be applicable for all languages depending on its use. | ||
+ | cat de.po | sed "/msgstr/s/'/’/g" > de.po1 | ||
+ | |||
+ | === Use of » or « for Quotes === | ||
+ | If your language team decided on using » and « as quotation marks (instead of " or ' or similar) you may type >> and << in your po-file and the command line below changes it into the nice ones: | ||
+ | cat de.po | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' > de.po1 | ||
+ | |||
+ | === Fusing commands === | ||
+ | |||
+ | Use the following style: | ||
+ | cat de.po | sedcommand1 | sedcommand2 | sedcommand3 > de.po1 | ||
+ | |||
+ | e.g. changing apostrophes and quotes and ellipsis in one step and producing a new file called "de.po1": | ||
+ | cat de.po | sed "/msgstr/s/'/’/g" | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' | sed '/msgstr/s/\.\.\./…/g' > de.po1 | ||
+ | |||
+ | Reminder: use msgattrib (see above) for reformatting the po-file prior using the "diff command". | ||
+ | |||
+ | == Comments == | ||
+ | |||
+ | |||
+ | Disclaimer: This scripts are far away from "flawless". Anyhow, try it, like it, use it; dislike it, improve it :). | ||
− | :: | + | And, yes, I know, its programming (the ones I did) is pathetic :). And, yes, more knowledge about WML would definitely improve it. Or a higher frustration threshold... :). Or the accuracy, a real programmer would have. Hence, improvements might be a very good idea - If anybody has suggestions, feel free to contact me or just post it here. Or even realize them on a higher level - why not writing a script to transform it in TEX - per aspera ad astra :). |
− | + | [[Category:Translations]] | |
+ | [[Category:Tools]] |
Latest revision as of 10:01, 2 April 2021
Feel free to post improvements as well. In which languages it does not work at all? What kind of shell script are needed as well?
Contents
- 1 First of all
- 2 poEdits Wrath
- 3 Trailing Spaces
- 4 Extracting a target language script
- 5 Extracting an English script
- 6 Extracting a script with original and translated strings
- 7 Extracting a list of characters
- 8 Translating without knowing English?
- 9 Style scripts following the typographic style guide
- 10 Comments
First of all
Prior to use the shell scripts, the po-File needs to be opened and saved with poEdit. The reason is, that then the texts are saved in separated lines without interfering /n's. As result, a normal entry consists of 4 lines, where one line contains the "characters text" and a prior line the "characters name". Without saving the file with poEdit, the number of text-lines would be undefined.
Or using msgcat (part of GNU gettext), save the file with the --no-wrap switch,
msgcat --no-wrap file1.po -o file2.po
poEdits Wrath
Anyhow, those changes introduced by poEdit should be undone prior sending the new po-File to your language maintainer. Especially, it must be undone prior to "diff" the new version against the old one. This may be done by executing the following:
msgcat file.po -o file.po
Now, if you
diff -u SVN.file.po file.po > file.po.diff
Your "diff" will only contain your changes and will look much nicer - your language maintainer will like you even more :). (see GettextForTranslators#FAQ)
Trailing Spaces
At the end of each translated "string", spaces should be avoided. Checking might be done using
- sed -n '/ "$/{N;s/\n$//p}' de.po
in a shell. (Checks whether there is a trailing space and the next line is empty.)
Extracting a target language script
For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (target language only), e.g. to read it in a bus, a train or in a car. Thanks to the programmer(s) that included the "speaker" in the comments of the po-files (Ivanovic?)! Those are definitely helpful and make this approach possible.
- >cat ./de.po | grep -v "msgid \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgstr/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g" > target-language-script.txt
An even nicer extraction method without saving via poedit is the following :
- > sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$//;p' ./de.po > target-language-script.txt
(derived from the SED-command below- thanks to Soliton)
I load this text file into a editor to print it in two columns with the name of the "speaker" intended (like in script book).
Extracting an English script
For proof-reading of a campaign it might be much easier to print it out in a more "scriptlike" format (<charactername>: <text>) (english only), e.g. to read it in a bus. This one is even more preliminary than the one before. It still contains some target language parts.
- >cat de.po | grep -v "msgstr \"" | grep -v "#: data" | sed "s/\[message\]: speaker//g" | sed "s/#. =//g" | sed "s/msgid/:/g" |grep -v ": \"\"" | sed ':;s/\n:/:/;N;T' | sed ':;s/\n\n/\n/;N;T'| sed "s/#. \[scenario\]/\n#. \[scenario\]/g" > english.txt
Extracting a script with original and translated strings
Similar as above. A sed script to extract dialog and story parts in the form:
- [scenario]: id=<id>
- <original title>
- <translated title>
- [part]
- <original story part>
- <translated story part>
- <speaker>: <original dialog>
- <speaker>: <translated dialog>
- one-line version:
- sed -n 's/#\. \(\[scenario\]: id=.\+\)$/\n\n\1/p;s/#\. \(\[part\]\)$/\n\1/p;t msgid;s/#\. \[message\]: speaker=\(.\+\)$/\1/;t speaker;b;:speaker;H;n;s/#\. \[message\]: speaker=\(.\+\)$/, \1/;t speaker;x;s/$/:\t/;s/\n//g;x;t msgid;:msgid;s/msgid \(".*"\)/\1/;t nextmsgid;n;b msgid;:nextmsgid;H;n;/^".*"$/b nextmsgid;x;s/"\n"//g;s/\n//;p;s/".*"//;x;s/msgstr \(".*"\)/\1/;:nextmsgstr;H;n;/^".*"$/b nextmsgstr;x;s/"\n"//g;s/\n//;s/$/\n/;p' de.po
It works on unmodified po files as well. (no re-saving in poedit necessary)
Extracting a list of characters
Be aware, that you should be in the right directory. It contains *.cfg files for each campaign-scenario.
- cd <wesnoth-installpath>/data/campaigns/Northern_Rebirth/scenarios
- sed -n '/\[unit\]/,/\[\/unit\]/ba;/\[side\]/,/\[\/side\]/ba;b;:a s/^ *id *= *"\?\([^"]*\)"\?/\1/p' *.cfg | sort | uniq -c | sort -nr > ~/NRcharacterlist.txt
(Looks for id= keys in [side] and [unit] tags and extracts the value then sorts them by number of occurrences.)
You can leave off the "-c | sort -nr" part if you don't care about the number of occurrences.
Translating without knowing English?
(This was copied from the Wesnoth translation forum, a thread and an brilliant idea from User:Viliam http://www.wesnoth.org/forum/viewtopic.php?f=7&t=8817)
Have you found someone willing to translate Wesnoth to a language X (or willing to help you translating to your language), but then you found that the person does not speak English? This is no longer a problem!
My solution is a Perl script that will enrich your PO file with translations to other language(s), inserted as comments to the English phrases. When you work with "poEdit" application and click on the phrase, you will see translations in other language(s) in the comment window; so you can translate the phrase without a good knowledge of English, or you can have an advantage of using translation to language similar to your one.
You can immediately use the compiled MO files in game, because the PO file remains an "English to language X" translation. Only comments are changed, nothing else.
Examples:
I have a Slovak translation "sk.po", and would like to use a Czech translation as help:
multi_po.pl sk.po --add cs.po CS > sk_multi.po
In file "sk_multi.po" there is for every phrase added corresponding Czech translation from "cs.po" file, labeled as "CS". (My original comments in "sk.po" file are there, too.)
Before sending to translation maintainers, I remove unnecessary comments:
multi_po.pl sk_multi.po --remove > sk.po
The Czech translations are removed. (My original comments remained.)
It is possible to add more languages to one PO file. To use newer version of translation, you must remove the translations, and add them again. Your comments will remain (unless you use "{ ... }" as substring in them).
If someone finds this helpful, please report to me any bugs found. (That is... be careful and backup your PO files. This is a program written today, not much tested yet.)
http://www.wesnoth.org/forum/download/file.php?id=5872
Style scripts following the typographic style guide
Here, I show possibilities using an UNIX-environment (Windows-user - check "cygwin" or similar for getting something like this). I like to use pipes ("|"), as you may easily put those commandos one after another as you need it for your purpose.
Those scripts only work on po-files re-saved with po-Edit, thus guaranteeing "single-line strings". The "single-line string" is important, as it is a simple way to separate the original from the translated string. Otherwise you will introduce changes in the original strings as well and subsequent patching your translation to the game might fail.
No double spaces
You should get rid of double (multiple) spaces in your translation strings:
cat de.po | sed '/msgstr/s/ \+/ /g' > de.po1
Ellipsis instead of three dots
Use only if this applies to your language - English does not yet implement it.
cat de.po | sed '/msgstr/s/\.\.\./…/g' > de.po1
Check all lines for "irregular leftovers" of multiple dots (as a line like "........" would leave)
cat de.po1 | grep 'msgstr' | grep '…\.'
gives you a list of those.
Dashes
This is kind of tricky as the sophisticated use of different dashes explained in Typography_Style_Guide#Dashes might not be used in all translations.
The following command line introduces the "long" em-dash in all cases that "involves spaces and minuses". This might be correct in most cases.
cat de.po | sed '/msgstr/s/ - / — /g;/msgstr/s/ -/ —/g;/msgstr/s/- /— /g' > de.po1
Again, check the changed lines on occurrences of "real minuses" or "concatenated words"
cat de.po1 | grep 'msgstr' | grep '—'
and reverse the "wrong dashes" back to "minus" using po-Edit.
Apostrophes
This command line exchanges all occurrences of the keyboard »'« with the curly one »’«. This might not be applicable for all languages depending on its use.
cat de.po | sed "/msgstr/s/'/’/g" > de.po1
Use of » or « for Quotes
If your language team decided on using » and « as quotation marks (instead of " or ' or similar) you may type >> and << in your po-file and the command line below changes it into the nice ones:
cat de.po | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' > de.po1
Fusing commands
Use the following style:
cat de.po | sedcommand1 | sedcommand2 | sedcommand3 > de.po1
e.g. changing apostrophes and quotes and ellipsis in one step and producing a new file called "de.po1":
cat de.po | sed "/msgstr/s/'/’/g" | sed '/msgstr/s/>>/»/g;/msgstr/s/<</«/g' | sed '/msgstr/s/\.\.\./…/g' > de.po1
Reminder: use msgattrib (see above) for reformatting the po-file prior using the "diff command".
Comments
Disclaimer: This scripts are far away from "flawless". Anyhow, try it, like it, use it; dislike it, improve it :).
And, yes, I know, its programming (the ones I did) is pathetic :). And, yes, more knowledge about WML would definitely improve it. Or a higher frustration threshold... :). Or the accuracy, a real programmer would have. Hence, improvements might be a very good idea - If anybody has suggestions, feel free to contact me or just post it here. Or even realize them on a higher level - why not writing a script to transform it in TEX - per aspera ad astra :).