i18n: rules, tips & tricks
SliTaz have high class homemade tools & scripts, but, sadly, method of i18n of them can't catch both high standards of i18n. I want to improve this state and give some tips & tricks to script developers.
So, we all know about gettext
( http://www.gnu.org/software/gettext/manual/gettext.html#Programmers ). Do you know about eval_gettext
? eval_ngettext
? I'm both programmer and translator, and I want to clear using of gettext
suite.
Very main rule: DON'T split message (which have to be translated) into pieces. Because it's very possible situation, that translator may need to shuffle that pieces to give correct translation on native, non-english language.
So, please, don't use:
gettext "Unpacking : "; echo $PACKAGE
Use instead this:
eval_gettext "Unpacking : \$PACKAGE"
Now, I can translate it, say, to Russian: "Пакет $PACKAGE распаковывается".
You can use both variants in your scripts (they are equivalent):
eval_gettext "text \$variable again text"
eval_gettext 'text $variable again text'
But, if we use commands included one into another, personally I prefer use double outer qoutes and single inner quotes:
boldify "$(eval_gettext 'Now: $date')"
There is also right variant with all double quotes too, but my Geany gets crazy in this point ;)
Second rule: use plurals.
I saw one funny piece of code. Some sort of this (in my memory):
# pkg - number of packages
[ $pkg -gt 1 ] && ss="s" || ss=""
echo "$pkg package$ss installed"
How can I internationalize this code?
1 package installed; 2 packages installed…
It's pretty simple:
eval_ngettext '$pkg package installed' \
'$pkg packages installed' $pkg
English have only two forms of words: single and plural. And you can write «package(s)», or, think that we have more than one package, then use only «packages». But, as my example, Russian have three forms of words. And I want to translate «package(s)» to «пакет(а,ов)». It's very dirty since we have eval_ngettext.
Example in Russian:
1, 21, 31 пакет
2, 3, 4, 22, 23, 24 пакета
5-20, 25-30 пакетов
Gettext solves all the problems with plurals.
And again: don't split message into pieces.
Today I saw next code in tazpkg:
echo -n $(colorize 32 "$packages ")
echo -n $(boldify $(gettext "packages installed of category:"))
colorize 34 " $ASKED_CATEGORY_I18N"
I'm so sorry, but I want to delete this coloured beauty to provide correct i18n. I do it this way:
I see colored variable $packages, then text, then colored variable $ASKED_CATEGORY_I18N.
Text needs to modify to use plural form (package/packages). Below is not colored, but correct variant of i18n:
eval_ngettext '$packages package installed of category: $ASKED_CATEGORY_I18N' \
'$packages packages installed of category: $ASKED_CATEGORY_I18N' $packages
Now, I can translate it to Russian:
Form 1: В категории $ASKED_CATEGORY_I18N установлен $packages пакет.
Form 2: В категории $ASKED_CATEGORY_I18N установлены $packages пакета.
Form 3: В категории $ASKED_CATEGORY_I18N установлены $packages пакетов.
(To comparison I give you bad old form: В категории $ASKED_CATEGORY_I18N установлен(ы) $packages пакет(а,ов).
)
Now I'm enjoy — program speak my language! And speak correctly!
But, how we can to keep colorized messages? I think, we can use some sort of markup. I proposed worked function in Mailing list (). Using this function, we can write example above in next form:
emsg "$(eval_ngettext \
'<c 32>$packages</c> package installed of category: <c 34>$ASKED_CATEGORY_I18N</c>' \
'<c 32>$packages</c> packages installed of category: <c 34>$ASKED_CATEGORY_I18N</c>' \
$packages)"
And translations:
Form 1: В категории <c 34>$ASKED_CATEGORY_I18N</c> установлен <c 32>$packages</c> пакет.
Form 2: В категории <c 34>$ASKED_CATEGORY_I18N</c> установлены <c 32>$packages</c> пакета.
Form 3: В категории <c 34>$ASKED_CATEGORY_I18N</c> установлены <c 32>$packages</c> пакетов.
We have colorized output again! And both correct i18n!
This function emsg() not too hard, it's only have some sed's, and can operate with different output forms (html, gtk, xterm).
Next, some examples to harden rules above
1. Before:
repository_name="
gettext \"Undigest\"
$(basename $path)"
echo "$repository_namegettext \"is up to date.\"
"
1. After:
undigest_path="$(basename $path)"
repository_name="$(eval_gettext 'Undigest $undigest_path')"
eval_gettext '$repository_name is up to date.'; echo
2. Before:
eval_gettext "\$new_pkgs new packages on the mirror."; echo
2. After:
eval_ngettext '$new_pkgs new package on the mirror.' \
'$new_pkgs new packages on the mirror.' $new_pkgs; echo
3. Before:
echo -n "$pkgs "; gettext "installed packages scanned in"; echo " ${time}s"
3. After:
eval_ngettext '$pkgs installed package scanned in ${time}s' \
'$pkgs installed packages scanned in ${time}s' $pkgs; echo
4. Hidden plural
4. Before:
if [ "$blocked_count" -gt 0 ]; then
blocks=eval_gettext " (\$blocked_count blocked)"
fi
eval_gettext "You have \$upnb available upgrades\$blocks"
4. After:
Context of message "(xxx blocked)" is "package" or "packages", so there is one translation in English, but plural form in some other languages (singular or plural form of word "blocked" in other languages):
if [ "$blocked_count" -gt 0 ]; then
blocks=$(eval_ngettext ' ($blocked_count blocked)' \
' ($blocked_count blocked)' $blocked_count)
fi
eval_ngettext 'You have $upnb available upgrade$blocks' \
'You have $upnb available upgrades$blocks' $upnb
Enjoy with your scripts, that follows standards and best practices!