Saturday, November 28, 2009

Introducing Glossary Kit

I wrote some script to help a translation project I took part in recently. I think its generic enough to benefit other translation project, hence decide to give them a name and advertise them here :)

You can find a repo for the scripts on Github, as well as the package for direct download.

And I'm just going to post the README file in HTML here, which is generated by markdown.

Glossary Kit




What is glossary-kit?


This is a set of scripts that helps a translation project
check its cooperative work against a glossary, typically
a bilingual word list that unifies a translation for certain
terminologies throughout the projects.


How does it work?


Given the glossary in the original text (or text), glossary-kit
will gather all the appearances of the listed words in the
glossary and each corresponding translation into single
files, where the translators can examine the result, modify
the content if necessary. After the translation for the word(s)
is in satisfying status, glossary-kit can apply all the changes
made in the generated files to where they ought to be in the
translation.


How to use it?


glossay-kit takes 2 assumptions for the translation project:


  1. It is stored in text file. Including text source files such
    as markdown or TeX.

  2. The original files and the translated files are stored under
    two seperate directories that has the same substructure.

  3. The translation keeps the original line positions of the text.
    For one, every translated file has to have the same number
    of lines as the original text.


Usage


To check a certain keyword:


gkfind.py keyword rootToText rootToTranslation [outputFile]


To check a list of words:


gkfind.py -l listname rootToText rootToTranslation


The list should be a text file consist of a list of keywords, each
occupying its own line.


gkfind.py will generate files named keyword.checklist, which contains
the lines where keyword appears, both the text and the translation,
as well as meta info such as the location of the files and the line
number of appearance. Edit the tranlsation in this/these file(s) to
satifying status. And run


gkapply.py keyword.checklist [keyword2.checklist] ...


The changes made are now applied to the translation files.


Why not Bash?


I'm aware that a lot of wheel-inventing seemed to be happening
here. With a shell language like Bash and some typical commands
on *nix systems like grep, diff and patch, this toolkit might
be redundent. Nevertheless, I wrote this to take advantage of
the portability of Python. Specifically, it might benefit the
translators using MS Windows system more than anybody else.


Copyright


These scripts are released under Apache License 2.0, see COPYING for more details.

No comments: