Manuscript Number : IJSRSET162146
Hadoop based Information Extract from Text Document
Authors(3) :-Deepak Motwani, V. K. Chaubey, A. S. Saxena
Hadoop is one of the generally received bunch figuring structures for handling of the Big Data. Despite the fact that Hadoop seemingly has turned into the standard answer for overseeing Big Data, it is not free from constraints. In nowadays developing technology researchers, students prefer all documents in txt format and doc format. Most text files are available in pdf format as per demand. Even all research papers are available in pdf format only and extracting a text from pdf format is one of the most difficult jobs. So for text extraction from multiple pdf files we have to apply some algorithms so that text extraction process takes place in comfortable mode. Text extraction is the basic step which we bear to follow before making a motion for further processing. We begin with the concise discussion concerning to the keyword. Steps involved in text extraction from any txt file. In this paper, we use a keyword based extraction method for extracting the text from txt file and with the help of these keywords we can get all the detail on that part of the research paper or any pdf file. Here we are also using the multithreading approach. Our approach is able to extract text in very less time, so time complexity is very less. The aim of this paper is to extract the text on the basis of particular keyword which is useful for the new researcher.
Deepak Motwani
Hadoop Big Data, Text Extraction, Keyword Based Extraction, Map Reduce
Publication Details
Published in :
Volume 2 | Issue 1 | January-February 2016 Article Preview
Department of Computer Science and Engineering,. Mewar University, Rajasthan, India
V. K. Chaubey
Department of Computer Science and Engineering,. Mewar University, Rajasthan, India
A. S. Saxena
Department of Computer Science and Engineering,. Mewar University, Rajasthan, India
statistical information,” International Journal on Artificial Intelligence Tools, vol. 13, no. 1, pp. 157-169,
2004.
Date of Publication :
2015-02-25
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) :
156-160
Manuscript Number :
IJSRSET162146
Publisher : Technoscience Academy
Journal URL :
https://ijsrset.com/IJSRSET162146