Classifying Blocks of Page Layout of a Document

Prof. Prashant Gadakh; Prof. Ramkrushna M; Prof. Bailappa Bhovi; Prof. Malayaj Kumar

doi:10.32628/IJSRSET184124

Authors

Prof. Prashant Gadakh International Institute of Information Technology, Hinjewadi, Pune, Maharashtra, India
Prof. Ramkrushna M International Institute of Information Technology, Hinjewadi, Pune, Maharashtra, India
Prof. Bailappa Bhovi International Institute of Information Technology, Hinjewadi, Pune, Maharashtra, India
Prof. Malayaj Kumar International Institute of Information Technology, Hinjewadi, Pune, Maharashtra, India

Keywords:

Classification Algorithm, Page-Blocks , Blackand, Blackpix

Abstract

In our work, we decided to do classification, for which we chose a dataset from University of California at Irvine machine learning repository. We took the dataset called page-blocks, which contains page layouts of a document created from segmentation process. After visualizing the data, we first ran Naive Bayes Classification algorithm to classify data. We noticed that the accuracy is not good. We then classified it again using Decision Tree algorithm. In this report, we discuss about the structure of the dataset, its visualization, classification algorithms and contrast their outputs. The report ends with a brief section about the future work that is possible on this dataset.

References

B565 (Data Mining) class notes
Bishop, Christopher (2007): Pattern Recognition and Machine Learning, Springer
https://en.wikipedia.org/wiki/Document_classification
Pang-Ning Tan, Michael Steinback, Vipin Kumar (2007): Introduction to Data Mining,
Pearson
http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machi
ne-learning-dataset/
R. Longadge, S. S. Dongre, and L. Malik, "Class Imbalance Problem in Data Mining:
Review," International Journal 0/ Computer Science and Network (lJCSN), vol. 2, 2013.
P. Foster, “Machine learning from imbalanced data sets 101.” Proceedings of the AAAI
2000 workshop on imbalanced data sets, 2000, pp. 1-3

Classifying Blocks of Page Layout of a Document

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite