Myanmar Continuous Speech to Isolated Word Segmentation

Taryar Myo Tun; Khin Thida Lynn

doi:10.32628/IJSRSET151339

Authors

Taryar Myo Tun University of Computer Studies Mandalay, Myanmar
Khin Thida Lynn

Keywords:

Myanmar language, Energy based VAD, ZCR, Myanmar Tone Length

Abstract

This paper proposes a word segmentation method for Myanmar continuous speech. This system consists of speech processing inclusive of segment boundary detection for isolated words which used zero crossing, duration and energy techniques. Inaccurate segment boundaries are a major cause of errors in automatic speech recognition and a pre-processing stage that segments the speech signal into periods of speech and non-speech is invaluable in improving the recognition accuracy. We propose a combination of three audio features that is energy based voice activity detection, zero crossing rate (ZCR) and duration length for the speech/non-speech detection. Each feature has unique properties to differentiate speech and non-speech segments. We evaluate the results by dividing the speech sample into some segments and used the zero crossing rate, energy based VAD and Myanmar tone length to separate the parts of speech. The algorithm is tested on speech samples that are recorded as sentences of Myanmar speech. The results show that the algorithm managed to segment almost 98.5% of the Myanmar words for all recorded sentences.

References

Bachu R.G., Kopparthi S., Adapa B and Barkana B.D, “Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal”, Electrical Engineering Department, University of Bridgeport
Mohammad Abushariah, , Raja Ainon, Roziati Zainuddin, Moustafa Elshafei and Othman Khalifa, “Arabic Speaker Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus”, The International Arab Journal of Information Technology, Vol. 9, No. 1, January ,2012
Ei Phyu Phyu Soe, Aye Thida, “Diphone-Concatenation Speech Synthesis for Myanmar Language”, International Journal of Science, Engineering and Technology Research (IJSETR), Volume 2, Issue 5, May 2013
Moe Pwint and Farook Sattar, “Speech/Nonspeech Detection Using Minimal Walsh Basis Functions”, EURASIP Journal on Audio, Speech, and Music Processing, Mark Clements, Volume 2007First Author and Second Author. 2002.
Moe Pwint, Student Member, IEEE and Farook Sattar, Member, IEEE,” A Segmentation method for noisy Speech Using Genetic Algorithm”, School of Electrical and Electronic Engineering, Nanyang Technological University Nanyang Avenue, Singapore 639798, 0-7803-8874-7/05©2005 IEEE
Won-Ho Shin, Byoung-Soo Lee, Yun-Keun Lee and Jong-Seok Lee,” Speech/ Non-speech Classification Using Multiple Features For Robust Endpoint Detection”, LG Corporate Institute of Technology
Runshen Cai,” An Automatic Syllable Segmentation Method for
Mandarin Speech”, Computer Science & Information Engineering College, Tianjin University of Science and Technology, Tianjin, China
F. Pan, N. Ding,” Speech De-noising and Syllable Segmentation Based on Fractal Dimension”, International Conference on Measuring Technology and Mechatronics Automation(ICMTMA2010), pp. 433—43, IEEE Computer Society(2010)
M. A. Ben Messaoud, A. Bouzid, N. Ellouze,” Automatic Segmentation of the Clean Speech Signal”, World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Electronics and Communication Engineering Vol:9, No:1, 2015

Myanmar Continuous Speech to Isolated Word Segmentation

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite