Document Categorization by using Weighted J48 Classifier

Authors(2) :-Sonali Suskar, Dr. S. D. Babar

In the field of information retrieval text categorization is the key research area in present. The text categorization selects entries from set of prebuilt categories and allots those to a document. Learning with high dimensional data space is challenging in a text categorization method. Learning with high-dimensional features may prompt a heavy calculation overhead and may affect the classification performance of classifiers because of unrelated and repetitive features. To improve the “scourge of dimensionality “issue and to accelerate the learning procedure of classifiers, it is important to perform feature reduction to reduce the size of features. This paper introduces a Bayesian arrangement approach and WeightedJ48 classifier for auto text categorization using class-specific features. For text classification, the proposed strategy selects a specific feature subset for every class. The presented system reconstructs PDF in raw data space from class specific PDF in low dimensional feature space and assembles Bayes classification rule utilizing Baggenstoss PDF Projection Theorem. The detectable importance of this methodology is that many feature selection criteria. The WeightedJ48 classifier saves the time and memory. The proposed system also uses Term weighting concept for pre-processing. These methods increase the accuracy of classification, feature selection process, and improve the system performance.

Authors and Affiliations

Sonali Suskar
Department of Computer Engineering SIT College of Engineering, Lonavala, Maharashtra, India
Dr. S. D. Babar

Text categorization, class-specific features, Feature selection, PDF projection and estimation, dimension reduction, WeightedJ48, Term weighting.

  1. Bo Tang, Haibo He, Paul M. Baggenstoss, and Steven Kay, "A BayesianClassification Approach Using Class-Specific Features for Text Categorization",1041-4347 (c) 2015 IEEE, Transactions on Knowledge and DataEngineering.
  2. Paul M. Baggenstoss, "The pdf projection theorem and the class-specific method," IEEETransactions on Signal Processing, vol. 51, no. 3, pp.672-685, 2003.
  3. W. Lam, M. Ruiz, and P. Srinivasan, "Automatic text categorization andits application to text retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 6, pp. 865-879, 1999.
  4. F. Sebastiani, "Machine learning in automated text categorization," ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1-47, 2002.
  5. H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification and clustering," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491-502, 2005.s
  6. P. M. Baggenstoss, "Class specific feature sets in classification," IEEE Transactions on Signal Processing, vol. 47, no. 12,pp. 3428-3432, 1999.
  7. B. Tang and H. He, "ENN: Extended nearest neighbor method for pattern recognition research frontier]," IEEE Computational Intelligence Magazine, vol. 10, no. 3, pp. 52-60, 2015.
  8. I.-S. Oh, J.-S. Lee, and C. Y. Suen, "Analysis of class separation and combination of class-dependent features for handwriting recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 1089-1094, 1999.
  9. J. J. Patil and N. Bogiri, "Automatic text categorization: Marathi documents," 2015 International Conference on Energy Systems and Applications,Pune, 2015, pp. 689-694.
  10. F. S. Al-Anzi and D. AbuZeina, "Stemming impact on Arabic textcategorization performance: A survey," 2015 5th International Conferenceon "Information Communication Technology and Accessibility"IEEE (ICTA),Marrakech, 2015, pp. 1-7.

Publication Details

Published in : Volume 4 | Issue 9 | July-August 2018
Date of Publication : 2018-07-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 51-58
Manuscript Number : IJSRSET18495
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Sonali Suskar, Dr. S. D. Babar, " Document Categorization by using Weighted J48 Classifier, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 9, pp.51-58, July-August-2018.
Journal URL :

Article Preview