TranslationSpellcheckingOnWindows

From The Battle for Wesnoth Wiki
Revision as of 06:56, 6 September 2009 by Szymek (talk | contribs) (Undo revision 32199 by CnadeLeror (Talk), spam)

WARNING: This page is still a draft.

This text assumes you work on Windows XP and can't check spelling in the tool you use for editing PO files. Another important thing to take into consideration when trying to follow this guide is that you should have some experience with "managing computers".

Preparations

  1. Check if you have installed the following command line programs from GnuWin32 collection: cat, tr, tee, msgexec. If yes, skip to step 3.
  2. Download and install packages CoreUtils and GetText. You have a choice from installer and ZIP archive, use what you can.
  3. Determine where is the folder these utilities are in - Let's call it A. If you used installers and always clicked Next/Yes without changes, it should be C:\Program Files\GnuWin32\bin. Mind that the installation folder does not contain the programs directly, they're inside another folder bin.
  4. Make sure A is in your PATH so that the utilities can be called directly without full path prefix. PATH is an environment variable, these are accessed in different ways on different versions of Windows. For XPs exists a short tutorial by Microsoft itself, available in more languages as well (button names will be different if you do not have English version of Windows). You can always find another tutorials for this by googling for windows change path. Once you know how, check the variable Path and if it does not already contain A, add it. Used syntax is simple - entry;entry2;entry3 and so on. If an entry contains spaces, it must be enclosed in quotation marks. Example Path could look like this:
    C:\Windows;C:\Windows\System32;"C:\Program Files\Perl\bin".

Now the utilities are ready to use, so let's see what they do.

Ingredients

You can skip this section, it only describes the involved four utilities:

  • cat simply copies specified file to console - in other words, prints it to the "DOS screen". This is not very useful, but console input and output can be passed between more programs, allowing for more elaborate processing. We will use cat as usual - make it the entry point of data into chain of programs.
  • msgexec reads text in format of PO file from console input, splits it into translated texts and performs some operation on them. In this case we will use it as a filter to reduce the input PO file to translation texts. Thus no operation on outgoing translation texts is needed, for which is used parameter 0
  • tr changes characters in the "passing" data. Since msgexec adds to its output trailing zero byte (which text editors don't like), it must be turned into something harmless. tr \0 \n will turn it into line ending.
  • tee copies the passing-along data to file. There is a problem with msgexec - it "restarts" its output for every translation item. That can be cured by calling tee with parameter -a which tells it to append to its output file rather than write it whole anew.

Processing

Get the text

To get a plain text file with translation texts from PO file, all these programs must be chained together. The final command is

cat INPUT.PO | msgexec 0 | tr \0 \n | tee -a OUTPUT.TXT

However, entering this monster into command line every time you want to check a translation would be just backward. Create a batch file (extension BAT, plain text) and write into it

wesnoth_tr_sp_ck_win.png
del OUTPUT.TXT
cat "%1" | msgexec 0 | tr \0 \n | tee -a OUTPUT.TXT

Why first delete the output file? If it already exists, its old content won't disappear (since tee is told to append) and it will contain old and unwanted texts.

Since chaining programs on Windows seems to disrupt the "current folder" and reset it to something else (which command line defaults to), right click on this batch file and create shortcut for it. The starting path will be that of batch file, so OUTPUT.TXT will appear in the folder with batch file.

If you now drag you PO file onto this shortcut, you'll (magically :) get a plain text file OUTPUT.TXT with all your translated texts.

And load it

Now you must transfer text from the resulting file into a word processor such as MS Word, Openoffice writer or another, which can do spellchecking in your language. This last step seems easy, but beware: OUTPUT.TXT has national characters encoded in UTF-8 (since that's what Wesnoth's translation catalogs are in), while word processor will certainly expect something else (to make it all complicated, of course). This final transition of text must be accomplished with some "advanced" text editor such as Notepad++ or PSPad that will recognize correctly text's encoding and won't show it as gibberish. Just open OUTPUT.TXT in that, select all (Ctrl-A), copy (Ctrl-C) nad paste to the word processor. Oof, done.

In OpenOffice, you can do it another way, too - open the "open" dialog, select type "encoded text" and open the file. Another window will pop up, asking for options. Only the first - encoding - is important. Find "Unicode (UTF-8)" and confirm. The text will show correctly. The last selected encoding is remembered, so you after first use won't always have to find it in list.

Spellchecking

Well, you have the text in you word processor, so press some button and do the spell checking! You will have to open the PO at the same time and search in it manually for texts your spellchecker does not like, but it's a relatively small inconvenience.

Further advice

If you know how: It is a good idea to create a separate file for entries you will mark as correct while spellchecking your translations. Wesnoth's language usually requires the translators to use old and strange words which will not be present in normal spellchecking dictionaries. Even better idea is to create another one for names like Kalenz, Delfador etc. that appear frequently. You will be able to exclude them when checking catalogs without any names in them (wesnoth, wesnoth-lib), which helps - short names from Wesnoth can interfere with common typos