Self-Tuned Descriptive Document Clustering using a Predictive Network

Authors

  • Dr. K. Syed Kousar Niasi  Associate Professor, Department of Computer Science, Jamal Mohamed College. Trichy, Tamil Nadu, India
  • P. Sidheshwari  Research Scholar, Department of Computer Science, Jamal Mohamed College. Trichy, Tamil Nadu, India

DOI:

https://doi.org//10.32628/IJSRSET21841135

Keywords:

Text Ranking, Text Mining, FLSA, Document Clustering, Text Summarization

Abstract

Document network is defined as a collection of documents that are connected by links. Document clustering become ubiquitous nowadays due to the widespread use of online databases, such as academic search engines. Topic modeling has become a widely used tool for document management because of its superior performance. However, there are few topic models differentiate the importance of documents on different topics. In this survey, can implement text rank algorithms of documents to improve topic modeling and propose to incorporate link based ranking into topic modeling. Text summarization provides an important role in information retrieval. Snippets generated by web search engines for every query result is an application of text summarization. Existing text summarization techniques shows that the indexing is done on the basis of the words present in the document and consists of an array of the posting lists. Document features such as term frequency, text length are used to allocate indexing weight to words. Specifically, topical rank is used to compute the subject stage rating of files, which indicates the significance of documents on special topics. By taking flight the topical ranking of a file as the opportunity of the record concerned in corresponding subject matter, a generalized relation is created between ranking and subject matter modeling. In this thesis, can implement topic discovery model for large number of medical database. The datasets are trained and extract the key terms based text mining and fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can maintain health & medical corpora redundancy problem and provides a new method to estimate the number of topics.

References

  1. Brown, Gavin, et al. "Conditional likelihood maximisation: a unifying framework for information theoretic feature selection." Journal of machine learning research 13.Jan (2012): 27-66.
  2. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016.
  3. Chorowski, Jan, and Jacek M. Zurada. "Learning understandable neural networks with nonnegative weight constraints." IEEE transactions on neural networks and learning systems 26.1 (2015): 62-69.
  4. Lau, Jey Han, et al. "Automatic labelling of topic models." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.
  5. Aletras, Nikolaos, et al. "Representing topics labels for exploring digital libraries." Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press, 2014.
  6. Chien, Jen-Tzung. "Hierarchical theme and topic modeling." IEEE transactions on neural networks and learning systems27.3 (2016): 565-578.
  7. Scaiella, Ugo, et al. "Topical clustering of search results." Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 2012.
  8. Tseng, Yuen-Hsien. "Generic title labeling for clustered documents." Expert Systems with Applications 37.3 (2010): 2247-2254.
  9. Kummamuru, Krishna, et al. "A hierarchical monothetic document clustering algorithm for summarization and browsing search results." Proceedings of the 13th international conference on World Wide Web. ACM, 2004.
  10. Xie, Pengtao, and Eric P. Xing. "Integrating document clustering and topic modeling." arXiv preprint arXiv:1309.6874(2013).

Downloads

Published

2019-06-30

Issue

Section

Research Articles

How to Cite

[1]
Dr. K. Syed Kousar Niasi, P. Sidheshwari, " Self-Tuned Descriptive Document Clustering using a Predictive Network, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 6, Issue 3, pp.320-331, May-June-2019. Available at doi : https://doi.org/10.32628/IJSRSET21841135