Giving CGI::Application internationalization (I18N)
Part 1
My project is planned in CGI::Application, I would use Catalyst, DBIx::Class and Moose but don't fancy the learning curve and the weight of the framework. CGI::Application is lightweight and easy to distribute with my software. I've also been working on my own database module, that just makes doing SQL queries a bit easier, rather than trying to apply a full abstraction layer. If I use an object system it'll be Mouse rather than Moose.
I think a lightweight easy to use alternative to the very popular Catalyst - Moose - DBIx::Class is a very good thing. Finally I've got something Perlish to blog about :)
1st Hurdle
I want my project to be language packable. There is no I18N plugin for CGI::App, so it looks like the first things I am going to have to do is create one. I've looked into Internationalization and Localization a few times before, it seems like a total mine field, but now is the time to finally battle through it.
Checking out CPAN
Catalyst does have an I18N plugin, so my first stop is to checkout how they've done it.
http://search.cpan.org/~bricas/Catalyst-Plugin-I18N-0.08/lib/Catalyst/Plugin/I18N.pm
Quick check of the copyright:-
"This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself."
Great! I can use this as the template and adapt it to CGI::App.
How are they doing it?
It's basically a wrapper around Locale::Maketext::Simple with some extra features. Modules is uses:-
I18N::LangTags
I18N::LangTags::Detect
Locale::Maketext::Simple
After a bit of playing about and trying different bits of the module it seems there is a lot more to it, as some subroutines are called that certainly aren't a part of the above three modules. Not wanting to use any of the code without fully understanding it seems I'm going to have to follow the dependency tree and figure out how it all fits together.
>I18N::LangTags
->Doesn't appear to lean on anything
>I18N::LangTags::Detect
->I18N::LangTags
->Win32::Locale (if available)
>Locale::Maketext::Simple
->Locale::Maketext::Lexicon
-->I18N::Langinfo
-->Win32::Console (used as Windows alternative to I18N::Langinfo)
-->File::Glob
-->Locale::Maketext::Lexicon::Gettext
--->Locale::Maketext::Lexicon (already done)
--->Encode (Encode::compat on Perls < 5.7.1
-->File::Spec
->File::Spec
->Locale::Maketext (as base, but set to a different packages namespace)
-->I18N::LangTags::Detect
-->I18N::LangTags
-->integer
-->Locale::Maketext::GutsLoader (right below a line saying 'This is where most people should stop reading')
(This dependency list was created by manually checking the .pm files for use and require statements. Not all of these are listed in prerequisites and not all are used depending on the features you need. Some were not followed any further as it didn't seem necessary).
I'm going to read through all the Perl docs from these, starting with the ---> modules and moving my way back up the chain to the modules I really want. I'll also save a local copy of some of these .pm files for quick reference later.
--->Encode
Read the first 1/4 of the perldoc. Vaguely happy what this does. UTF-8 (8-bit UCS/Unicode Transformation Format) is pretty much the standard format for storing and transferring character data 'strings'. Perl has it's own internal format for strings which is different to UTF-8. Both have ASCII at their core. This module encodes from Perls internal format (the one you work with in memory) to UTF-8 (the one you save or transfer over the internet). It also decodes back and has a number of assisting functions.
--->Encode::compat
Provides the basic functions from Encode.pm to earlier versions of Perl.
-->I18N::Langinfo
Sort docs. Gives you basic locale information for things like yes/no, time format, days of week, months, etc. Only exports the function langinfo by default, which returns results based on numeric input. Exportable constants make it easier to use (so you can use things like MON_1 rather than looking up numbers). I'll be keeping the perldoc open to reference if needed to later.
-->Win32::Console
Has console functions which we won't be interested in this context. But it also has Character mode functions.
-->File::Glob
Implements a BSD glob routine, giving you the function bsd_glob and updating the way <*.c> works. You use it to get a list of matching files (files that match the pattern between the <>).
-->Locale::Maketext::Lexicon::Gettext
PO and MO file parser for Maketext. This is a pure Perl parser that turns standard gettext po and mo files into the format maketext understands.
This is the first file I've made a backup of so I can reference the functions quickly.
-->File::Spec
This module allows you to portably perform operations on file names. It loads the appropriate sub module for your system, be that Unix, Win32, etc. I'll be keeping the perldoc open to reference if needed to later.
->Locale::Maketext::Lexicon
Expands Maketext so that it can read other localization formats, such as Gettext, Msgcat, etc. I'll be saving a copy of this one as well.
>Locale::Maketext::Simple
Simple interface to Locale::Maketext::Lexicon. It basically makes using Locale::Maketext::Lexicon easier for us. This is the module I'll be using directly. I'll be saving a copy of this one as well.
->I18N::LangTags
Functions for dealing with RFC3066-style language tags.
http://www.faqs.org/rfcs/rfc3066.html
http://www.i18nguy.com/unicode/language-identifiers.html
Such as en for English. en-US for American English. en for the Queens English. fr for French and so on...
Although the routines do not consult a definitive list of language tags, they do check the format and tell you if it looks like a valid tag. I'll be saving a copy of this one as well.
->Win32::Locale
Gget the current MSWin locale or language. Has the function Win32::Locale::get_language which returns RFC3066 language tag.
>I18N::LangTags::Detect
Has a routine aimed to automatically detect the user's language preferences. Checks for several environment variables to make the determination.
-->integer
Turn Perl's arithmetic to integer mode rather than floating point. Not more .XX in calculations.
-->Locale::Maketext::GutsLoader
I'll skip this one for now...
At this point I've got a good view of the base of the tree, the top, and all of it's branches. Now I know enough to look at the main trunk that holds it all together.
>Locale::Maketext
Gives you a base class to build your projects localization from. Locale::Maketext::Simple, Locale::Maketext::Lexicon and Locale::Maketext::Lexicon::Gettext all build upon and extend this base class.
The idea is that you have a project class as a subclass of your main project, such as MyProject::Localize. This has Locale::Maketext as it's base class. The project class then has sub classes of the different language tags (LangTags) which use it as a base, such as MyProject::Localize::it, MyProject::Localize::fr, MyProject::Localize::en, etc.
The Locale::Maketext base class gives you a function called get_handle. This is used to select the appropriate lexicon and return an object. You can then call the method maketext from the object to create text in the users language.
Such as:-
use MyProject::Localize;
$lh = MyProject::Localize->get_handle( en-us );
die "Couldn't make a language handle??" unless $lh;
print $lh->maketext("Hello World");
You might be wondering what a lexicon is. In this context it's easier to think of a lexicon a as hash that contains all the translations. In reality it's a bit more complicated than that as it deals with things like quantities, etc.
You can supply get_handle a list of language tags. If a lexicon for the first tag isn't found then the next is checked. Super-ordinates (such as en is a superordinate of en-us) are automatically checked for if their subordinate is not found.
This entry is getting a bit long for a blog post, I'll name it part 1 and continue in another post.
Lyle
Leave a comment