Convert Cyrillic to Latin

Solapas principales

Hello everyone! 

Recently I search how to convert Cyrillic to Latin with Cache Object Script, but didn't find anything and decided to write ourselves, 

So here code: 

//Create Class method

ClassMethod convertRussionToEnglish(word As %String)
{

   //add array of transliteration system
    Set convertArray = $LB(
        $LB("а","a"),$LB("б","b"),$LB("в","v"),$LB("г","g"),$LB("д","d"),$LB("е","e"),$LB("ё","e"),$LB("ж","zh"),$LB("з","z"),
        $LB("и","i"),$LB("й","y"),$LB("к","k"),$LB("л","l"),$LB("м","m"),$LB("н","n"),$LB("о","o"),$LB("п","p"),
        $LB("р","r"),$LB("с","s"),$LB("т","t"),$LB("у","u"),$LB("ф","f"),$LB("х","kh"),$LB("ц","ts"),$LB("ч","ch"),
        $LB("ш","sh"),$LB("щ","shch"),$LB("ы","y"),$LB("э","e"),$LB("ю","yu"),$LB("я","ya"),$LB("ъ",""),$LB("ь",""),
        $LB("А","A"),$LB("Б","B"),$LB("В","V"),$LB("Г","G"),$LB("Д","D"),$LB("Е","E"),$LB("Ё","E"),$LB("Ж","ZH"),$LB("З","Z"),
        $LB("И","I"),$LB("Й","Y"),$LB("К","K"),$LB("Л","L"),$LB("М","M"),$LB("Н","N"),$LB("О","O"),$LB("П","P"),
        $LB("Р","R"),$LB("С","S"),$LB("Т","T"),$LB("У","U"),$LB("Ф","F"),$LB("Х","KH"),$LB("Ц","TS"),$LB("Ч","CH"),
        $LB("Ш","SH"),$LB("Щ","SHCH"),$LB("Ы","Y"),$LB("Э","E"),$LB("Ю","YU"),$LB("Я","YA"),$LB("Ъ",""),$LB("Ь","")    
    )
    

   //word Example

    Set wordToConvert = "Пример для Кода"
    Set wordToConvertLength = $L(wordToConvert)
    
    Set cnt=$ListLength(convertArray)
    Set latinWord = ""
    

    //and with cycle get each letter and parse in  transliteration array
    for i=1:1:wordToConvertLength {
        
        Set cyrillicWord = $E(wordToConvert,i)
        
        for j=1:1:cnt {
            Set codes=$ListGet(convertArray,j)
            Set cyrillicLetter=$ListGet(codes,1)
            Set latinLetter=$ListGet(codes,2)

            if cyrillicLetter=cyrillicWord {
                Set cyrillicWord = latinLetter    
            }
        }
        Set latinWord = latinWord_cyrillicWord

        
    }
    //Get result of convert
    Quit latinWord

}
  • + 2
  • 0
  • 202
  • 10

Comentarios

Interesting, why you duplicated lower and uppercase, and not sure if it's good to uppercase all letters in transliterated variant, even when only this letter was in uppercase. I mean like, Юла -> YUla, looks weird. I think it should check the case of the original word, if it completely uppercase, it should uppercase resulting word, but if only first letter in upper, so, resulting string should use $zconvert(word, "W")

I was looking for quick solutions for my task and  get mapping of letters from the Internet

Less searching all around:

ClassMethod getDict(Output dict)
{
    kill dict
    set dict("а")="a"
    set dict("б")="b"
    set dict("в")="v"
    set dict("г")="g"
    set dict("д")="d"
    set dict("е")="e"
    set dict("ж")="zh"
    set dict("з")="z"
    set dict("и")="i"
    set dict("й")="y"
    set dict("к")="k"
    set dict("л")="l"
    set dict("м")="m"
    set dict("н")="n"
    set dict("о")="o"
    set dict("п")="p"
    set dict("р")="r"
    set dict("с")="s"
    set dict("т")="t"
    set dict("у")="u"
    set dict("ф")="f"
    set dict("х")="kh"
    set dict("ц")="ts"
    set dict("ч")="ch"
    set dict("ш")="sh"
    set dict("щ")="shch"
    set dict("ъ")=""
    set dict("ы")="y"
    set dict("ь")=""
    set dict("э")="e"
    set dict("ю")="yu"
    set dict("я")="ya"
}

/// w ##class(Test.Cyr).convertRussionToEnglish()
ClassMethod convertRussionToEnglish(word As %String = "Привет")
{
    do ..getDict(.dict)
    set out = ""
    for i=1:1:$l(word) {
        set letter = $e(word, i)
        set letterL = $zcvt(letter, "l")
        set outLetter = dict(letterL)
        set:letter'=letterL outLetter = $zcvt(outLetter, "U")
        set out = out _ outLetter
    }
    quit out
}

There is a way to do something similar in Caché and IRIS. In a Russian locale, you have access to the "KOI8R" I/O translation table. KOI8-R has the funny property that if you mask out the high-order bit, you get a sort of readable transliteration. Here's an example using a Unicode instance in the "rusw" locale:

    USER>s koi8=$zcvt("Пример для Кода","O","KOI8R")

    USER>s ascii="" f i=1:1:$l(koi8) s ascii=ascii_$c($zb($a(koi8,i),127,1))

    USER>zw ascii
    ascii="pRIMER DLQ kODA"

Mine is faster ;)

ClassMethod RussianToEnglish(russian = "привет") As %String

{

set rus="абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭьЬъЪ"

set eng="abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE"

set rus("ж")="zh"

set rus("ц")="ts"

set rus("ч")="ch"

set rus("ш")="sh"

set rus("щ")="shch"

set rus("ю")="yu"

set rus("я")="ya"

set rus("Ж")="Zh"

set rus("Ц")="Ts"

set rus("Ч")="Ch"

set rus("Ш")="Sh"

set rus("Щ")="Shch"

set rus("Ю")="Yu"

set rus("Я")="Ya"

set english=$tr(russian,rus,eng)



set wow=$O(rus(""))

while wow'="" {

set english=$Replace(english,wow,rus(wow))

set wow=$O(rus(wow))

}

return english

}

USER>w ##class(Example.ObjectScript).RussianToEnglish("Я вас любил: любовь еще, быть может, В душе моей угасла не совсем;"))
Ya vas lyubil: lyubov eshche, byt mozhet, V dushe moey ugasla ne sovsem;
USER>

Here's my new one-liner. Now 6 times faster.

ClassMethod convertRussionToEnglish4(russian = "привет") As %String [ CodeMode = expression ]
{
$tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Ш","Sh"),"O","UnicodeBig"),$c(0))
}

Here's tests:

do ##class(Test.Cyr).Time()
Method: convertRussionToEnglish1, time: .009022 <- original
Method: convertRussionToEnglish2, time: .000689 <- my first idea
Method: convertRussionToEnglish3, time: .000417 <- Evgeny
Method: convertRussionToEnglish4, time: .000072 <- this version
Method: convertRussionToEnglish5, time: .000124 <- Jon
 
Compete code

Eduard,

You have just forgotten about "Щ" in your awesome one-liner, while the $replacing of "Ш" is excessive. So, it should look like that: 

$tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Щ","Shch"),"O","UnicodeBig"),$c(0))

You're right, I need to change $replace with Ш to $replace with Щ. Ш is replaced in $translate anyway

ClassMethod convertRussionToEnglish4(russian = "привет") As %String [ CodeMode = expression ]
{
$tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Щ","Shch"),"O","UnicodeBig"),$c(0))
}

Actual rules used for names and surnames transliteration are more complex as they can be phonetically dependent. E.g. "Егор" -> "Egor", but "Иеремия" -> "Iyeremiya".