Kawa.netxp [Perl] Lingua::::Romanize:: - Romanization of CJK characters

Lingua::*::Romanize::* modules generate roman letteres from CJK characters.

Lingua::ZH::Romanize::Pinyin module parses Chinese characters, both of Mandarin and Cantonese.

Lingua::JA::Romanize::Japanese module parses Japanese characters, both of Kanji and Kana.

Lingua::KO::Romanize::Hangul module parses Korean characters, Hangul.

Download
Module Dependencies
Methods and Usage
Online Demo try this!
Comments
Trackbacks

Download

You can download these modules from CPAN or links below.

Chinese: Lingua-ZH-Romanize-Pinyin-0.23.tar.gz TARGZ CPAN SVN
Japanese: Lingua-JA-Romanize-Japanese-0.23.tar.gz TARGZ CPAN SVN
Korean: Lingua-KO-Romanize-Hangul-0.20.tar.gz TARGZ CPAN SVN

These modules are available on Perl 5.005/5.6.x/5.8.x.

Module Dependencies

Lingua::JA::Romanize::Japanese module requires DB_File module.
Jcode module is also used on Perl 5.005-5.8.0. (not required on Perl 5.8.1 or above)
Optionaly, LWP::UserAgent module is also required for downloading external dictionaries from SKK.

Lingua::ZH::Romanize::Pinyin module requires Storable module.
Unicode::Map module and Unicode::String module are also used on Perl 5.005/5.6.x. (except on Perl 5.8.x)

Lingua::KO::Romanize::Hangul module does not require any other external modules.

Methods and Usage

Romanize modules provide new() constructer and three methods below:

    use Lingua::JA::Romanize::Japanese;

    my $conv = Lingua::JA::Romanize::Japanese->new();

    my $kanji = "字";                           # one CJK character
    my $roman = $conv->char( $kanji );
    printf( "<ruby><rb>%s</rb><rt>%s</rt></ruby>", $kanji, $roman );

char() method parses one CJK character and returns its romanized letters.
This method returns undef when it is not CJK character or conversion is failed.
This method returns slash "/" separated letters when multiple candidates are found.

    my $string = "文字列の場合";                # multiple CJK characters
    print $conv->chars( $string ), "\n";

chars() method parses CJK characters and returns its romanized letters separated by a space.

This method was added by version 0.12.

    my @array = $conv->string( $string );
    foreach my $pair ( @array ) {
        my( $raw, $ruby ) = @$pair;
        if ( defined $ruby ) {
            printf( "<ruby><rb>%s</rb><rt>%s</rt></ruby>", $raw, $ruby );
        } else {
            print $raw;
        }
    }

string() method parses CJK characters and returns a array of pairs of original CJK character(s) and its romanized letters.

    $array[0]           # first pair (reference for array)
    $array[1][0]        # second CJK character(s)
    $array[1][1]        # its romanized letters

Memo

Hangul character mapping table for Lingua::KO::Romanize::Hangul is below:

Initial Letter
(0～18) g kk n d tt r m b pp s ss - j jj ch k t p h

ㄱ ㄲ ㄴ ㄷ ㄸ ㄹ ㅁ ㅂ ㅃ ㅅ ㅆ ㅇ ㅈ ㅉ ㅊ ㅋ ㅌ ㅍ ㅎ

Peak Letter
(0～20) a ae ya yae eo e yeo ye o wa wae oe yo u wo we wi yu eu ui i

ㅏ ㅐ ㅑ ㅒ ㅓ ㅔ ㅕ ㅖ ㅗ ㅘ ㅙ ㅚ ㅛ ㅜ ㅝ ㅞ ㅟ ㅠ ㅡ ㅢ ㅣ

Final Letter
(0～27) - g kk ks n nj nh d r lg lm lb ls lt lp lh m b ps s ss ng j c k t p h

ㅤ ㄱ ㄲ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅆ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ

Initial Letter (0～18)	g	kk	n	d	tt	r	m	b	pp	s	ss	-	j	jj	ch	k	t	p	h
ㄱ	ㄲ	ㄴ	ㄷ	ㄸ	ㄹ	ㅁ	ㅂ	ㅃ	ㅅ	ㅆ	ㅇ	ㅈ	ㅉ	ㅊ	ㅋ	ㅌ	ㅍ	ㅎ
Peak Letter (0～20)	a	ae	ya	yae	eo	e	yeo	ye	o	wa	wae	oe	yo	u	wo	we	wi	yu	eu	ui	i
ㅏ	ㅐ	ㅑ	ㅒ	ㅓ	ㅔ	ㅕ	ㅖ	ㅗ	ㅘ	ㅙ	ㅚ	ㅛ	ㅜ	ㅝ	ㅞ	ㅟ	ㅠ	ㅡ	ㅢ	ㅣ
Final Letter (0～27)	-	g	kk	ks	n	nj	nh	d	r	lg	lm	lb	ls	lt	lp	lh	m	b	ps	s	ss	ng	j	c	k	t	p	h
ㅤ	ㄱ	ㄲ	ㄳ	ㄴ	ㄵ	ㄶ	ㄷ	ㄹ	ㄺ	ㄻ	ㄼ	ㄽ	ㄾ	ㄿ	ㅀ	ㅁ	ㅂ	ㅄ	ㅅ	ㅆ	ㅇ	ㅈ	ㅊ	ㅋ	ㅌ	ㅍ	ㅎ

This follows the Revised Romanization of Korean which was released on July 7, 2000 as the official romanization system in South Korea.

Comments by AjaxCom

Trackbacks by AjaxTB

Trackback URL：http://www.kawa.net/service/tb/ajaxtb.cgi/works/perl/romanize/romanize-e.html

Kawa.netxp [Perl] Lingua::*::Romanize::* - Romanization of CJK characters