A Review on Improving the Clustering Performance in Text Mining

Authors

  • Ashwini Harishchandra Ghonge  M. Tech, Computer Science Engineering, Guru Nanak Institute of Engineering and Technology, Nagpur, Maharashtra, India
  • Prof. Vijaya Kamble  Computer Science Engineering, Guru Nanak Institute of Engineering and Technology, Nagpur, Maharashtra, India

Keywords:

Clustering Techniques, Co-Clustering,Constrained Clustering, Semi-supervised Learning, Text Mining.

Abstract

Lately, the improvement of data frameworks in each field, for example, business, scholastics, and the drug have prompted increment in the measure of put away information step by step. A lion's share of information is put away in archives that are practically unstructured. Content mining innovation is exceptionally useful for individuals to process immense data by forcing structure upon content. Clustering is a well-known procedure for automatically sorting out a huge gathering of content. Nonetheless, in genuine application spaces, the experimenter has some foundation learning that helps in clustering the information. Customary clustering strategies are somewhat unsatisfactory of various information types and can't deal with sparsity and high dimensional information. Co-clustering strategies are received to defeat the customary clustering strategy by at the same time performing report and word clustering taking care of the two insufficiencies. Semantic comprehension has turned out to be a fundamental element for data extraction, which is made by receiving requirements as a semi-supervised learning technique. This overview audits on the compelled co-clustering techniques embraced by specialists to support the clustering execution.

References

  1. Banerjee.A, Dhillon.I, Ghosh.J,. Merugu.S, and Modha.D.S (2017), “A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix
  2. Approximation,” J. Machine Learning Research, vol. 8, pp. 1919-1986.
  3. Basu S., Bilenko M., and Mooney R.J. (2014), “A Probabilistic Framework for Semi-Supervised Clustering,” Proc. SIGKDD, pp. 59-68.
  4. Basu.S, Banerjee A., and Mooney R.J. (2012), “Semi- Supervised Clustering by Seeding,” Proc. 19th Int’l Conf. Machine Learning (ICML), pp. 27-34.
  5. Bikel D., Schwartz R., and Weischedel R. (1999),” An algorithm that learns what’s in a name”, Machine learning, 34:211–231.
  6. Bilenko M. and Basu S.(2004), “A Comparison of Inference Techniques for Semi-Supervised Clustering with Hidden Markov Random Fields,” Proc. ICML
  7. Workshop Statistical Relational Learning (SRL ’04).
  8. Bilenko.M, Basu.S, and Mooney R.J. (2004), “Integrating Constraints and Metric Learning in Semi-Supervised Clustering,” Proc. 21st Int’l Conf. Machine Learning (ICML), pp. 81-88.
  9. Chen Y., Wang L., and Dong M.(2010), “Non-Negative Matrix Factorization for Semi-Supervised Heterogeneous Data Co- Clustering,” IEEE Trans. Knowledge and Data Eng., vol.22, no. 10, pp. 1459-1474.
  10. Cheng Y. and Church G.M. (2000), “Biclustering of Expression Data,” Proc. Int’l System for Molecular Biology Conf. (ISMB), pp. 93-103.
  11. Cho H., Dhillon I.S., Guan Y., and Sra S. (2004), “Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data,” Proc. Fourth SIAM Int’l Conf.Datamining (SDM).
  12. Cozman F.G., Cohen I., and Cirelo M.C. (2003), “Semi- Supervised Learning of Mixture Models,” Proc. Int’l Conf. Machine Learning (ICML), pp. 99-106.
  13. Dai W., Xue G.-R., Yang Q., and Yu Y. (2007), “Co- Clustering Based Classification for Out-of-Domain Documents,” Proc. 13th ACM SIGKDD Int’l Conf.Knowledge Discovery and Data Mining, pp. 210- 219.
  14. Dhillon I.S. (2001), “Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning,” Proc. Seventh ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining(KDD), pp. 269-274.
  15. Dhillon.I.S, Mallela.S, and Modha D.S.(2003), “Information-Theoretic Co-Clustering,” Proc. Ninth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 89-98.
  16. Ding C., Li.T, Peng.W, and Park.H (2006), “Orthogonal Nonnegative Matrix T-Factorizations for Clustering,” Proc. 12th ACM SIGKDD Int’l Conf. KnowledgeDiscovery and Data Mining, pp. 126-135.
  17. Gao.B, Liu T.-Y., Feng G., Qin T., Cheng Q.-S. And Ma W.-Y. (2005) ,“Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Co partitioning,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1263- 1273.
  18. Jain.A, Murty.M, and Flynn.P (1999), “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323.
  19. Li T., Ding C., Zhang Y., and Shao B. (2008), “Knowledge Transformation from Word Space to Document Space,” Proc. 31st Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 187-194.
  20. Li T., Zhang Y., and Sindhwani V.(2009), “A Non- Negative Matrix Tri- Factorization Approach to Sentiment Classification with Lexical PriorKnowledge,” Proc. Joint Conf. (ACL-IJCNLP), pp. 244- 252.
  21. Long.B, Wu X., Zhang Z. and Yu. P.S (2006), “Spectral Clustering for Multi-Type Relational Data,” Proc. 23rd Int’l Conf. Machine Learning, pp. 585-592.
  22. Lu Z. and Leen T.K. (2007), “Penalized Probabilistic Clustering,” Neural Computation, vol. 19, no. 6, pp. 1528-1567.
  23. Michael W. Berry and Malu Castellanos (2007),”Survey of Text Mining: Clustering, Classification, and Retrieval”, Springer, Second Edition.
  24. Nigam K., McCallum A.K., Thrun S., and Mitchell T.M. (2000), “Text Classification from Labeled and Unlabeled Documents using EM,” Machine Learning, vol. 39, no. 2/3, pp. 103-134.
  25. Pensa R.G. and Boulicaut J.-F.(2008), “Constrained Co- Clustering of Gene Expression Data,” Proc. SIAM Int’l Conf. Data Mining (SDM), pp. 25-36.
  26. Revathi.T, Sumathi.P (2013),” A Survey on Data Mining using Clustering Techniques”, International Journal of Scientific & Engineering Research Volume 4, Issue 1.
  27. Rui Xu, Donald Wunsch II (2005),” Survey of Clustering Algorithms”, IEEE Transactions On Neural Networks, Vol. 16, NO. 3, pp. 645-678.
  28. Shan.H and A. Banerjee.A (2008), “Bayesian Co- Clustering,” Proc. IEEE Eight Int’l Conf. DataMining (ICDM), pp. 530-539.
  29. Shi X., Fan W., and Yu P.S. (2010), “Efficient Semi- Supervised Spectral Co-Clustering with Constraints,” Proc. IEEE 10th Int’l Conf. Data Mining (ICDM), pp. 1043-1048.
  30. Song Y., Pan S., Liu S., Wei F., Zhou M.X., and Qian W. (2010), “Constrained Co-Clustering for Textual Documents,” Proc. Conf. Artificial Intelligence (AAAI).

Downloads

Published

2019-02-28

Issue

Section

Research Articles

How to Cite

[1]
Ashwini Harishchandra Ghonge, Prof. Vijaya Kamble, " A Review on Improving the Clustering Performance in Text Mining, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 6, Issue 1, pp.380-385, January-February-2019.