Automated Image Captioning: Harnessing Machine Learning for Image Description Generation
Keywords:
Computer Vision, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Xception, Flicker 8K, LSTM, Preprocessing.Abstract
Advancements in computer vision have led to its widespread application across various domains. This project focuses on a specific aspect of computer vision: image captioning. While generating descriptive language for images remains a challenging task, recent research has made significant progress, particularly in the realm of still images. Although earlier efforts primarily concentrated on video content, there has been a shift towards enhancing image descriptions using natural language understandable to humans. Our project aims to leverage convolutional neural networks (CNNs) and explore various hyperparameters using extensive datasets such as Flickr8k and ResNet. By combining the outputs of these image classifiers with recurrent neural networks (RNNs), we seek to generate accurate captions for images. This paper provides a comprehensive overview of the architecture and methodology employed in our image captioning model.
References
- G Geetha,T.Kirthigadevi,G GODWIN Ponsam,T.Karthik,M.Safa,” Image Captioning Using Deep Convolutional Neural Networks(CNNs)” Published under licence by IOP Publishing Ltd in Journal of Physics :Conference Series ,Volume 1712, International Conference On Computational Physics in Emerging Technologies(ICCPET) 2020 August 2020,Manglore India in 2015. [2] Simonyan, Karen, and Andrew Zisserman. ”Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014). [3] Donahue, Jeffrey, et al. ”Long-term recurrent convolutional networks for visual recogni-tion and description.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
- Lu, Jiasen, et al. ”Knowing when to look: Adaptive attention via a visual sentinel for im-age captioning.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 6. 2017. [5] Ordonez, Vicente, Girish Kulkarni, and Tamara L. Berg. ”Im2text: Describing images us-ing 1 million captioned photographs.” Advances in neural information processing systems. 2011. [9] Chen, Xinlei, and C. Lawrence Zitnick. ”Mind’s eye: A recurrent visual representation for image caption generation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
- Feng, Yansong, and Mirella Lapata. ”How many words is a picture worth? automatic caption generation for news images.” Proceedings of the 48th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
- Rashtchian, Cyrus, et al. ”Collecting image annotations using Amazon’s Mechanical Turk.” Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, 2010
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.