Development of an AI-Powered Voice Assistant: Enhancing Speech Recognition and User Interaction

Ms Shivani Gupta; Mohd Haider; Md Shabbir

doi:10.32628/IJSRSET251296

Authors

Ms Shivani Gupta Assistant Professor, Department of Computer Science, Lingaya’s Vidyapeeth Faridabad, India Author
Mohd Haider Department of Computer Science Lingaya’s Vidyapeeth Faridabad, India Author
Md Shabbir Department of Computer Science Lingaya’s Vidyapeeth Faridabad, India Author

DOI:

https://doi.org/10.32628/IJSRSET251296

Keywords:

Artificial Intelligence, Voice Assistant, Speech Recognition, Natural Language Processing, Text-to-Speech, Human-Computer Interaction, Offline Voice Assistant

Abstract

Artificial Intelligence (AI) and Natural Language Processing (NLP) have significantly transformed human- computer interaction, enabling intelligent systems to process voice commands efficiently. Voice assistants, such as Google Assistant, Amazon Alexa, and Apple Siri, have set industry benchmarks, but they still face challenges in real-time response accuracy, handling ambient noise, and offline functionality. This paper presents the development of a custom AI -powered voice assistant, focusing on improving listening abilities, noise filtration, and command execution efficiency. The proposed system integrates Google Speech Recognition API for real-time speech-to-text conversion, pyttsx3 for text-to- speech synthesis, and natural language processing techniques to interpret and execute commands. Unlike cloud-dependent voice assistants, this system provides offline capabilities for essential commands, ensuring flexibility and usability even in low- connectivity environments. Experimental results demonstrate that the assistant achieves over 91% speech recognition accuracy in controlled environments, with an average command execution time of less than one second. Future enhancements include deep learning- based NLP models, real-time wake-word detection, and a graphical user interface for better interaction. The proposed system serves as a foundation for customizable, intelligent, and efficient AI-powered voice assistants.

📊 Article Downloads

References

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and I. Polosukhin, “Attention Is All You Need,” in Advances in Neural Information Processing Systems, vol. 30, 2017.

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding with Unsupervised Learning,” OpenAI, 2018.

J. Chen, D. Parikh, and A. Gupta, “Learning from Language Explanations for Visual Reasoning,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–120, 2018.

A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.

Google Developers, “Google Speech-to-Text API Documentation,” 2023. [Online]. Available: https://cloud.google.com/speech-to-text

Mozilla DeepSpeech, “DeepSpeech Documentation,” 2023. [Online]. Available: https://deepspeech.readthedocs.io

IBM Watson, “IBM Watson Speech Services Overview,” 2023. [Online]. Available: https://www.ibm.com/cloud/watson-speech-to-text

Development of an AI-Powered Voice Assistant: Enhancing Speech Recognition and User Interaction

Authors

DOI:

Keywords:

Abstract

📊 Article Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

IssueDate

RightSideBlock

Latest publications