Manuscript Number : IJSRSET162346
Object Detection and Sentence Generation from Images
Authors(4) :-Anakha P. J. , Devika Hari, Rinku Roy, Prof. Joby George
Being able to automatically describe the content of an image using properly formed English sentences is a very challenging task. The ultimate goal is to generate descriptions of image regions. A model that generates natural language descriptions of images and their regions is thus developed. The approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. A Multimodal Recurrent Neural Network architecture is described that uses the inferred alignments to learn to generate novel descriptions of image regions. The alignment model produces state of the art results in retrieval experiments on Flickr8K dataset. The generated descriptions significantly outperform retrieval baselines on both full images and on a new dataset of region-level annotations.
Anakha P. J.
Computer vision, Object detection, RNN
Publication Details
Published in :
Volume 2 | Issue 3 | May-June 2016 Article Preview
Department of Computer Science, M. G. University, Kerala, India
Devika Hari
Department of Computer Science, M. G. University, Kerala, India
Rinku Roy
Department of Computer Science, M. G. University, Kerala, India
Prof. Joby George
Department of Computer Science, M. G. University, Kerala, India
Date of Publication :
2016-06-30
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) :
277-280
Manuscript Number :
IJSRSET162346
Publisher : Technoscience Academy
Journal URL :
https://ijsrset.com/IJSRSET162346