Converting System of Phonetics Transcriptions to Myanmar Text Using N-grams Language Models

Authors

  • Kyaw Kyaw Maung  University of Computer Studies, Mandalay, Mandalay Division, Myanmar

Keywords:

N-grams, Unigram, Bi-grams, Trigrams, 4-grams, 5-grams, Phonetics Transcriptions, Myanmar Text

Abstract

Converting between Phonetics transcriptions and Myanmar text is a process of converting between the sequence of Phonetics transcriptions and Myanmar text. Phonetics transcription is based on the pronunciation of the language and the Myanmar text is based on the written language. One Phonetics alphabet can be represented many possible forms in written language that leads into word sense ambiguity problem. Another problem is that both of the Phonetics transcriptions and Myanmar text have no space to identify the boundary of syllables and words. This problem can be defined as segmentation problem for matching and mapping between Phonetics transcriptions and Myanmar text. To solve the word-sense ambiguity problem, the research developed n-grams language models from correct training data in Myanmar language. By using these trained n-grams language models, the system can be converted from Phonetics to Myanmar text. Instead of computing the probability on the trained n-grams data, the system matched the input data and the trained n-grams model data. The system has built n-grams models where unigram model, bi-grams model, trigrams model, 4-grams models and 5-grams models to train and convert between Phonetics and Myanmar text. To solve the segmentation problem, the system needed to break the input text into individual tokens. In the system, each token may be represented the consonant, or consonant clusters or vowels. To segment the input text Myanmar text or Phonetics transcriptions correctly, the proposed used the Unicode fonts for both Myanmar text and Phonetics transcriptions.

References

  1. A. Kemp, 2006, "Phonetic Transcription: History", University of Edinburgh, Edinburgh, U.K., Elsevier Ltd.
  2. Myanmar Language Commission, 2011, Myanmar-English Dictionary, 11th Edition, University Press, Yangon, Myanmar.
  3. Myanmar Language Commission, 2008, Myanmar-Dictionary, 2nd Edition, University Press, Yangon, Myanmar.
  4. A. Akmajian, R. A. Demers, A. K. Farmer, R. M. Harnish, 2001, “Linguistics, An Introduction to Language and Communication”, Fifth Edition, The MIT Press, Cambridge, Massachusetts, London, England.
  5. U. T. Tun, 2007, "Acoustic Phonetics and The Phonology of the Myanmar Language", First Edition, Win Yadanar Press, Yangon, Myanmar.
  6. U. T. Tun, 2012, "The subtleties of the Myanmar Language, Grammar, segments and prosody in the sound system of the language and spelling", First Edition, The Emperor Press, Yangon, Myanmar.
  7. Myanmar Language Commission, Myanmar Grammar, 2005, 30th Year Special Edition, University Press, Yangon, Myanmar.
  8. X. Huang, A. Acero, H. Hon, "Spoken Language Processing, A guide to Theory, Algorithm and System Development", Prentice-Hall, 2001.
  9. A. Kehler, K. V. Linder and N. Ward, "Speech and Language Processing", First Edition, Prentice-Hall, 2000.
  10. K.K.Maung, “Syllable Level Segmentation between Myanmar Text and Phonetics Transcriptions”, 2015, Proceeding in International Conference Data Mining, Civil and Mechanical Engineering, ICDMCME’2015, Indonesia.

Downloads

Published

2015-06-25

Issue

Section

Research Articles

How to Cite

[1]
Kyaw Kyaw Maung, " Converting System of Phonetics Transcriptions to Myanmar Text Using N-grams Language Models, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 3, pp.260-264, May-June-2015.