English Japanese Kawa.netxp [Perl] Lingua::*::Romanize::* - Romanization of CJK characters

Lingua::*::Romanize::* modules generate roman letteres from CJK characters.

Lingua::ZH::Romanize::Pinyin module parses Chinese characters, both of Mandarin and Cantonese.

Lingua::JA::Romanize::Japanese module parses Japanese characters, both of Kanji and Kana.

Lingua::KO::Romanize::Hangul module parses Korean characters, Hangul.

Download

You can download these modules from CPAN or links below.

Chinese: Lingua-ZH-Romanize-Pinyin-0.23.tar.gz TARGZ CPAN SVN
Japanese: Lingua-JA-Romanize-Japanese-0.23.tar.gz TARGZ CPAN SVN
Korean: Lingua-KO-Romanize-Hangul-0.20.tar.gz TARGZ CPAN SVN

These modules are available on Perl 5.005/5.6.x/5.8.x.

Module Dependencies

Lingua::JA::Romanize::Japanese module requires DB_File module.
Jcode module is also used on Perl 5.005-5.8.0. (not required on Perl 5.8.1 or above)
Optionaly, LWP::UserAgent module is also required for downloading external dictionaries from SKK.

Lingua::ZH::Romanize::Pinyin module requires Storable module.
Unicode::Map module and Unicode::String module are also used on Perl 5.005/5.6.x. (except on Perl 5.8.x)

Lingua::KO::Romanize::Hangul module does not require any other external modules.

Methods and Usage

Romanize modules provide new() constructer and three methods below:

    use Lingua::JA::Romanize::Japanese;

    my $conv = Lingua::JA::Romanize::Japanese->new();

    my $kanji = "字";                           # one CJK character
    my $roman = $conv->char( $kanji );
    printf( "<ruby><rb>%s</rb><rt>%s</rt></ruby>", $kanji, $roman );

char() method parses one CJK character and returns its romanized letters.
This method returns undef when it is not CJK character or conversion is failed.
This method returns slash "/" separated letters when multiple candidates are found.

    my $string = "文字列の場合";                # multiple CJK characters
    print $conv->chars( $string ), "\n";

chars() method parses CJK characters and returns its romanized letters separated by a space.

This method was added by version 0.12.

    my @array = $conv->string( $string );
    foreach my $pair ( @array ) {
        my( $raw, $ruby ) = @$pair;
        if ( defined $ruby ) {
            printf( "<ruby><rb>%s</rb><rt>%s</rt></ruby>", $raw, $ruby );
        } else {
            print $raw;
        }
    }

string() method parses CJK characters and returns a array of pairs of original CJK character(s) and its romanized letters.

    $array[0]           # first pair (reference for array)
    $array[1][0]        # second CJK character(s)
    $array[1][1]        # its romanized letters

Memo

Hangul character mapping table for Lingua::KO::Romanize::Hangul is below:

Initial Letter
(0~18)
gkkndttrmbppsss-jjjchktph
Peak Letter
(0~20)
aaeyayaeeoeyeoyeowawaeoeyouwowewiyueuuii
Final Letter
(0~27)
-gkkksnnjnhdrlglmlblsltlplhmbpssssngjcktph

This follows the Revised Romanization of Korean which was released on July 7, 2000 as the official romanization system in South Korea.

Comments by AjaxCom

Links

Trackbacks by AjaxTB

Trackback URL:http://www.kawa.net/service/tb/ajaxtb.cgi/works/perl/romanize/romanize-e.html

Kawa.netxp © Copyright 2006-2008 Yusuke Kawasaki