Lingua::*::Romanize::* modules generate roman letteres from CJK characters.
Lingua::ZH::Romanize::Pinyin module parses
Chinese
characters, both of Mandarin and Cantonese.
Lingua::JA::Romanize::Japanese module parses
Japanese
characters, both of Kanji and Kana.
Lingua::KO::Romanize::Hangul module parses
Korean
characters, Hangul.
You can download these modules from CPAN or links below.
Chinese:
Lingua-ZH-Romanize-Pinyin-0.23.tar.gz
TARGZ
CPAN
SVN
Japanese:
Lingua-JA-Romanize-Japanese-0.23.tar.gz
TARGZ
CPAN
SVN
Korean:
Lingua-KO-Romanize-Hangul-0.20.tar.gz
TARGZ
CPAN
SVN
These modules are available on Perl 5.005/5.6.x/5.8.x.
Lingua::JA::Romanize::Japanese module requires
DB_File module.
Jcode module is also used on Perl 5.005-5.8.0.
(not required on Perl 5.8.1 or above)
Optionaly,
LWP::UserAgent module is also required
for downloading external dictionaries from
SKK.
Lingua::ZH::Romanize::Pinyin module requires
Storable module.
Unicode::Map module and
Unicode::String module are also used on Perl 5.005/5.6.x.
(except on Perl 5.8.x)
Lingua::KO::Romanize::Hangul module does not require any other external modules.
Romanize modules provide new() constructer and three methods below:
use Lingua::JA::Romanize::Japanese; my $conv = Lingua::JA::Romanize::Japanese->new(); my $kanji = "字"; # one CJK character my $roman = $conv->char( $kanji ); printf( "<ruby><rb>%s</rb><rt>%s</rt></ruby>", $kanji, $roman );
char() method parses one CJK character and returns its romanized letters.
This method returns undef
when it is not CJK character or conversion is failed.
This method returns slash "/" separated letters
when multiple candidates are found.
my $string = "文字列の場合"; # multiple CJK characters print $conv->chars( $string ), "\n";
chars() method parses CJK characters and returns its romanized letters
separated by a space.
This method was added by version 0.12.
my @array = $conv->string( $string ); foreach my $pair ( @array ) { my( $raw, $ruby ) = @$pair; if ( defined $ruby ) { printf( "<ruby><rb>%s</rb><rt>%s</rt></ruby>", $raw, $ruby ); } else { print $raw; } }
string() method parses CJK characters and returns a array of pairs of original CJK character(s) and its romanized letters.
$array[0] # first pair (reference for array) $array[1][0] # second CJK character(s) $array[1][1] # its romanized letters
Hangul character mapping table for Lingua::KO::Romanize::Hangul is below:
Initial Letter (0~18) |
g | kk | n | d | tt | r | m | b | pp | s | ss | - | j | jj | ch | k | t | p | h | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ㄱ | ㄲ | ㄴ | ㄷ | ㄸ | ㄹ | ㅁ | ㅂ | ㅃ | ㅅ | ㅆ | ㅇ | ㅈ | ㅉ | ㅊ | ㅋ | ㅌ | ㅍ | ㅎ | ||||||||||
Peak Letter (0~20) |
a | ae | ya | yae | eo | e | yeo | ye | o | wa | wae | oe | yo | u | wo | we | wi | yu | eu | ui | i | |||||||
ㅏ | ㅐ | ㅑ | ㅒ | ㅓ | ㅔ | ㅕ | ㅖ | ㅗ | ㅘ | ㅙ | ㅚ | ㅛ | ㅜ | ㅝ | ㅞ | ㅟ | ㅠ | ㅡ | ㅢ | ㅣ | ||||||||
Final Letter (0~27) |
- | g | kk | ks | n | nj | nh | d | r | lg | lm | lb | ls | lt | lp | lh | m | b | ps | s | ss | ng | j | c | k | t | p | h |
ㅤ | ㄱ | ㄲ | ㄳ | ㄴ | ㄵ | ㄶ | ㄷ | ㄹ | ㄺ | ㄻ | ㄼ | ㄽ | ㄾ | ㄿ | ㅀ | ㅁ | ㅂ | ㅄ | ㅅ | ㅆ | ㅇ | ㅈ | ㅊ | ㅋ | ㅌ | ㅍ | ㅎ |
This follows the Revised Romanization of Korean which was released on July 7, 2000 as the official romanization system in South Korea.
Trackback URL:http://www.kawa.net/service/tb/ajaxtb.cgi/works/perl/romanize/romanize-e.html
Kawa.netxp © Copyright 2006-2008 Yusuke Kawasaki