Development of an AI-Powered Voice Assistant: Enhancing Speech Recognition and User Interaction

Authors

  • Ms Shiwani Gupta Assistant Professor, Department of Computer Science, Lingaya’s Vidyapeeth Faridabad, India Author
  • Mohd Haider Department of Computer Science Lingaya’s Vidyapeeth Faridabad, India Author
  • Md Shabbir Department of Computer Science Lingaya’s Vidyapeeth Faridabad, India Author

DOI:

https://doi.org/10.32628/IJSRSET251296

Keywords:

Artificial Intelligence, Voice Assistant, Speech Recognition, Natural Language Processing, Text-to-Speech, Human-Computer Interaction, Offline Voice Assistant

Abstract

Artificial Intelligence (AI) and Natural Language Processing (NLP) have significantly transformed human- computer interaction, enabling intelligent systems to process voice commands efficiently. Voice assistants, such as Google Assistant, Amazon Alexa, and Apple Siri, have set industry benchmarks, but they still face challenges in real-time response accuracy, handling ambient noise, and offline functionality. This paper presents the development of a custom AI -powered voice assistant, focusing on improving listening abilities, noise filtration, and command execution efficiency. The proposed system integrates Google Speech Recognition API for real-time speech-to-text conversion, pyttsx3 for text-to- speech synthesis, and natural language processing techniques to interpret and execute commands. Unlike cloud-dependent voice assistants, this system provides offline capabilities for essential commands, ensuring flexibility and usability even in low- connectivity environments. Experimental results demonstrate that the assistant achieves over 91% speech recognition accuracy in controlled environments, with an average command execution time of less than one second. Future enhancements include deep learning- based NLP models, real-time wake-word detection, and a graphical user interface for better interaction. The proposed system serves as a foundation for customizable, intelligent, and efficient AI-powered voice assistants.

Downloads

Download data is not yet available.

References

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and I. Polosukhin, “Attention Is All You Need,” in Advances in Neural Information Processing Systems, vol. 30, 2017.

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding with Unsupervised Learning,” OpenAI, 2018.

J. Chen, D. Parikh, and A. Gupta, “Learning from Language Explanations for Visual Reasoning,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–120, 2018.

A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.

Google Developers, “Google Speech-to-Text API Documentation,” 2023. [Online]. Available: https://cloud.google.com/speech-to-text

Mozilla DeepSpeech, “DeepSpeech Documentation,” 2023. [Online]. Available: https://deepspeech.readthedocs.io

IBM Watson, “IBM Watson Speech Services Overview,” 2023. [Online]. Available: https://www.ibm.com/cloud/watson-speech-to-text

Downloads

Published

01-06-2025

Issue

Section

Research Articles

How to Cite

[1]
Ms Shiwani Gupta, Mohd Haider, and Md Shabbir, “Development of an AI-Powered Voice Assistant: Enhancing Speech Recognition and User Interaction”, Int J Sci Res Sci Eng Technol, vol. 12, no. 3, pp. 717–727, Jun. 2025, doi: 10.32628/IJSRSET251296.

Similar Articles

1-10 of 311

You may also start an advanced similarity search for this article.