A Study on Automatic Speech Recognition

Authors

  • Viren Nivangune  Department of Computer Engineering, Zeal College of Engineering and Research, Pune, Maharashtra, India
  • Argade Siddhi Anil  Department of Computer Engineering, Zeal College of Engineering and Research, Pune, Maharashtra, India
  • Mrudula Milind Patankar  Department of Computer Engineering, Zeal College of Engineering and Research, Pune, Maharashtra, India

Keywords:

Abstract

Speech is an easy and usable technique of communication between humans, but nowadays humans are not limited to connecting to each other but even to the different machines in our lives. The most important is the computer. So, thiscommunication technique can be used between computers and humans. This interaction is done through interfaces, this area called Human Computer Interaction (HCI). This paper gives an overview of the main definitions of Automatic Speech Recognition (ASR) which is an important domain of artificial intelligence and which should be taken into account duringany related research (Type of speech, vocabulary size... etc.). It also gives a summary of important research relevant to speechprocessing in the few last years, with a general idea of our proposal that could be considered as a contribution in this areaof research and by giving a conclusion referring to certain enhancements that could be in the future works.

References

  1. Laszlo, T. (2018). Deep Neural Networks with Linearly Augmented Rectifier Layers for Speech Recognition, SAMI 2018 IEEE 16th World Symposium on Applied Machine Intelligence and Informatics February 7-10 Košice, Herl’any, Slovakia.
  2. Yuki, S., Shinnosuke, T. (2018). Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (1).
  3. Lilia, L., Mohamed, T. L., Rachid, B. (2017). Discriminant Learning for Hybrid HMM/MLP Speech Recognition System using a Fuzzy Genetic Clustering, Intelligent Systems Conference 20177-8 | London, UK.
  4. Abhijit, M., Vinay, K. M. (2017). Human Emotional States Classification Based upon Changes in Speech Production Features in Vowel Regions, 2017 2nd International Conference on Telecommunication and Networks (TEL-NET 2017).
  5. Michael, P., James, G., Anantha, P. C. (2018). ALow-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks, IEEE Journal of Solid-state Circuits, 53(1).
  6. Stefan, H., Marco, D., Christian, R., Fabrice, L., Patrick, L., Renato, D., Alessandro, M., Hermann, N., Giuseppe, R. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages, IEEE Transactions on Audio, Speech, and Language Processing, 19 (6) 1569–1583.
  7. Edwin, S., Sahar, G., Nathalie, C., Yannick, E., Renato, D. (2017). ASR error management for improving spoken language understanding, arXiv: 1705.09515v1[cs.CL].
  8. Gregory, G., Jean-Luc, G. (2018). Optimization of RNN-Based Speech Activity Detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (3).
  9. Gregory, G., Jean-Luc, G. ( 2015). Minimum Word Error Training of RNN-based Voice Activity Detection, INTERSPEECH 2015,16th Annual Conference of the International Speech Communication Association, Dresden, Germany.
  10. Dominique, F., Odile, M., Irina, I. ( 2017). New Paradigm in Speech Recognition: Deep Neural Networks, the ContNomina project supported, French National Research Agency (ANR).
  11. Luiza, O. (2015). Reconnaissance de la parole pour l’aide à la communication pour les sourds et malentendants, Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications - UMR 7503 .
  12. https://www.voicebox.com/wp-content/uploads/2017/05/Automatic-Speech-Recognition-Overview-and-Core- Technology.pdf, © 2017 Voicebox Technologies Corporation, voicebox.com.
  13. Xuedong, H., Li, D. (2009). An Overview of Modern Speech Recognition, Indurkhya/Handbook of Natural Language Processing C5921_C01, 339 -344, Microsoft Corporation.
  14. Julien, A. ( 2003). Approche Dela Reconnaissance Automatiquede La Parole, Rapport cycle probatoire, CNAM.
  15. Anusuya, M. A., Katti, S. K. (2009). Speech Recognition by Machine: AReview, (IJCSIS) International Journal of Computer Science and Information Security, 6(3).
  16. Preeti, S., Parneet, K. (2013). Automatic Speech Recognition: A Review, International Journal of Engineering Trends and Technology, 4(2) 2013,http://www.internationaljournalssrg.org
  17. Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of Computer Applications (0975 – 8887) 10(3), (November).
  18. Vrinda1, Shekhar, Chander. (2013). Speech Recognition System For English Language, International Journalof Advanced Research in Computer and Communication Engineering, 2(1), January 2013, ISSN (Print): 2319-5940 ISSN (Online): 2278- 1021, www.ijarcce.com, Copyright to IJARCCE.
  19. http://www.speech.cs.cmu.edu/comp.speech/Section6/Q6.1.html, date of consultation: 29/08/2018.
  20. Pegah, G., Jash, D., Michael, L. S. ( 2016). Linearly augmented deep neural network, in Proc. ICASSP, 5085–5089.
  21. Sylvain,G., Guillaume, G., Laura, C. (2009). The ESTER2 Evaluation Campaign for the Rich Transcription of French Radio Broadcasts, Brighton UK, Proceedings of Interspeech.
  22. Yannick, E.,Thierry, B., Jean-Yves, A., Frédéric, B. (2010). The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news, In: Proceedings of the International Conference on Language Resources and Evaluation (LREC).
  23. Guillaume, G., Gilles, A., Niklas, P., Matthieu, C., Aude, G., Olivier. (2012). The ETAPE corpus for the evaluation of speech- based TV content processing in the French language, In: Proceedings of the International Conference on Language Resources, Evaluation and Corpora (LREC).
  24. Nicolás, M., John, H., Hansen, L., Doorstep, T. T. (2005). MFCC Compensation for improved recognition filtered and band limited speech, Center for Spoken Language Research, University of Colorado at Boulder, Boulder (CO), USA.
  25. Santosh, K. G., Bharti, W. G., Pravin, Y. (2010). A Review on Speech Recognition Technique, International Journal of Computer Applications (0975 – 8887), 10(3).
  26. Manoj, K. S., Omendri, K. (2015). Speech Recognition: A Review, Special Conference Issue: National Conference on Cloud Computing & Big Data.
  27. Benjamin., B. (2016). Reconnaissance Automatique de la Parolepour la transcription et le sous-titrage de contenus audio et vidéo, 52 av. P. Sémard -94200 Ivry-sur-Seine, Authôt.com - November 2016, 64.
  28. Gravier, G., Adda, G., Paulson, N., Carré, M., Giraudel, A., Galibert, O. (2012). The ETAPE corpus for the evaluation of speech- based TV content processing in the French language, LREC Eighth International Conference on Language Resources and Evaluation, p. na.
  29. Kahn, J., Galibert, O., Quintard, L., Carré, M., Giraudel, A., Joly, P. (2012). Apresentation of the REPERE challenge, Content- Based Multimedia Indexing (CBMI), 2012 10th International Workshop on, IEEE, 1-6, 2012.
  30. Zied, E., Benjamin, L., Olivier, G., Laurent, B. (2019). Prédiction de performance des systèmes de reconnaissance automatique de la parole à l’aide de réseaux de neurones convolutifs, HAL Id: hal-01976284, TAL. Volume 59 - no 2/2018.
  31. Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.-F., Gravier, G. (2005). The ESTER phase II evaluation campaign for the rich transcription of French broadcast news, Interspeech, 1149-1152.

Downloads

Published

2022-02-28

Issue

Section

Research Articles

How to Cite

[1]
Viren Nivangune, Argade Siddhi Anil, Mrudula Milind Patankar "A Study on Automatic Speech Recognition" International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 9, Issue 1, pp.318-326, January-February-2022.