Automated Image Captioning: Harnessing Machine Learning for Image Description Generation

Y Ravikumar; A Ravi Kiran; D. Subba Rao

doi:10.32628/IJSRSET218654

Authors

Y Ravikumar Associate Professor, Department of Computer Science Engineering, A.M Reddy Memorial College of Engineering and Technology, Andhra Pradesh, India
A Ravi Kiran Assistant Professor, Department of Computer Science Engineering, A.M Reddy Memorial College of Engineering and Technology, Andhra Pradesh, India
D. Subba Rao Assistant Professor, Department of Computer Science Engineering, A.M Reddy Memorial College of Engineering and Technology, Andhra Pradesh, India

Keywords:

Computer Vision, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Xception, Flicker 8K, LSTM, Preprocessing.

Abstract

Advancements in computer vision have led to its widespread application across various domains. This project focuses on a specific aspect of computer vision: image captioning. While generating descriptive language for images remains a challenging task, recent research has made significant progress, particularly in the realm of still images. Although earlier efforts primarily concentrated on video content, there has been a shift towards enhancing image descriptions using natural language understandable to humans. Our project aims to leverage convolutional neural networks (CNNs) and explore various hyperparameters using extensive datasets such as Flickr8k and ResNet. By combining the outputs of these image classifiers with recurrent neural networks (RNNs), we seek to generate accurate captions for images. This paper provides a comprehensive overview of the architecture and methodology employed in our image captioning model.

References

G Geetha,T.Kirthigadevi,G GODWIN Ponsam,T.Karthik,M.Safa,” Image Captioning Using Deep Convolutional Neural Networks(CNNs)” Published under licence by IOP Publishing Ltd in Journal of Physics :Conference Series ,Volume 1712, International Conference On Computational Physics in Emerging Technologies(ICCPET) 2020 August 2020,Manglore India in 2015. [2] Simonyan, Karen, and Andrew Zisserman. ”Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014). [3] Donahue, Jeffrey, et al. ”Long-term recurrent convolutional networks for visual recogni-tion and description.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Lu, Jiasen, et al. ”Knowing when to look: Adaptive attention via a visual sentinel for im-age captioning.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 6. 2017. [5] Ordonez, Vicente, Girish Kulkarni, and Tamara L. Berg. ”Im2text: Describing images us-ing 1 million captioned photographs.” Advances in neural information processing systems. 2011. [9] Chen, Xinlei, and C. Lawrence Zitnick. ”Mind’s eye: A recurrent visual representation for image caption generation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Feng, Yansong, and Mirella Lapata. ”How many words is a picture worth? automatic caption generation for news images.” Proceedings of the 48th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
Rashtchian, Cyrus, et al. ”Collecting image annotations using Amazon’s Mechanical Turk.” Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, 2010

Automated Image Captioning: Harnessing Machine Learning for Image Description Generation

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite