Classifying Blocks of Page Layout of a Document
Keywords:
Classification Algorithm, Page-Blocks , Blackand, BlackpixAbstract
In our work, we decided to do classification, for which we chose a dataset from University of California at Irvine machine learning repository. We took the dataset called page-blocks, which contains page layouts of a document created from segmentation process. After visualizing the data, we first ran Naive Bayes Classification algorithm to classify data. We noticed that the accuracy is not good. We then classified it again using Decision Tree algorithm. In this report, we discuss about the structure of the dataset, its visualization, classification algorithms and contrast their outputs. The report ends with a brief section about the future work that is possible on this dataset.
References
- B565 (Data Mining) class notes
- Bishop, Christopher (2007): Pattern Recognition and Machine Learning, Springer
- https://en.wikipedia.org/wiki/Document_classification
- Pang-Ning Tan, Michael Steinback, Vipin Kumar (2007): Introduction to Data Mining,
- Pearson
- http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machi
- ne-learning-dataset/
- R. Longadge, S. S. Dongre, and L. Malik, "Class Imbalance Problem in Data Mining:
- Review," International Journal 0/ Computer Science and Network (lJCSN), vol. 2, 2013.
- P. Foster, “Machine learning from imbalanced data sets 101.” Proceedings of the AAAI
- 2000 workshop on imbalanced data sets, 2000, pp. 1-3
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.