Self-Tuned Descriptive Document Clustering using a Predictive Network

Authors(2) :-Dr. K. Syed Kousar Niasi, P. Sidheshwari

Document network is defined as a collection of documents that are connected by links. Document clustering become ubiquitous nowadays due to the widespread use of online databases, such as academic search engines. Topic modeling has become a widely used tool for document management because of its superior performance. However, there are few topic models differentiate the importance of documents on different topics. In this survey, can implement text rank algorithms of documents to improve topic modeling and propose to incorporate link based ranking into topic modeling. Text summarization provides an important role in information retrieval. Snippets generated by web search engines for every query result is an application of text summarization. Existing text summarization techniques shows that the indexing is done on the basis of the words present in the document and consists of an array of the posting lists. Document features such as term frequency, text length are used to allocate indexing weight to words. Specifically, topical rank is used to compute the subject stage rating of files, which indicates the significance of documents on special topics. By taking flight the topical ranking of a file as the opportunity of the record concerned in corresponding subject matter, a generalized relation is created between ranking and subject matter modeling. In this thesis, can implement topic discovery model for large number of medical database. The datasets are trained and extract the key terms based text mining and fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can maintain health & medical corpora redundancy problem and provides a new method to estimate the number of topics.

Authors and Affiliations

Dr. K. Syed Kousar Niasi
Associate Professor, Department of Computer Science, Jamal Mohamed College. Trichy, Tamil Nadu, India
P. Sidheshwari
Research Scholar, Department of Computer Science, Jamal Mohamed College. Trichy, Tamil Nadu, India

Text Ranking, Text Mining, FLSA, Document Clustering, Text Summarization

  1. Brown, Gavin, et al. "Conditional likelihood maximisation: a unifying framework for information theoretic feature selection." Journal of machine learning research 13.Jan (2012): 27-66.
  2. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016.
  3. Chorowski, Jan, and Jacek M. Zurada. "Learning understandable neural networks with nonnegative weight constraints." IEEE transactions on neural networks and learning systems 26.1 (2015): 62-69.
  4. Lau, Jey Han, et al. "Automatic labelling of topic models." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.
  5. Aletras, Nikolaos, et al. "Representing topics labels for exploring digital libraries." Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press, 2014.
  6. Chien, Jen-Tzung. "Hierarchical theme and topic modeling." IEEE transactions on neural networks and learning systems27.3 (2016): 565-578.
  7. Scaiella, Ugo, et al. "Topical clustering of search results." Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 2012.
  8. Tseng, Yuen-Hsien. "Generic title labeling for clustered documents." Expert Systems with Applications 37.3 (2010): 2247-2254.
  9. Kummamuru, Krishna, et al. "A hierarchical monothetic document clustering algorithm for summarization and browsing search results." Proceedings of the 13th international conference on World Wide Web. ACM, 2004.
  10. Xie, Pengtao, and Eric P. Xing. "Integrating document clustering and topic modeling." arXiv preprint arXiv:1309.6874(2013).

Publication Details

Published in : Volume 6 | Issue 3 | May-June 2019
Date of Publication : 2019-06-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 320-331
Manuscript Number : IJSRSET21841135
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Dr. K. Syed Kousar Niasi, P. Sidheshwari, " Self-Tuned Descriptive Document Clustering using a Predictive Network, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 6, Issue 3, pp.320-331, May-June-2019. Available at doi : https://doi.org/10.32628/IJSRSET21841135
Journal URL : http://ijsrset.com/IJSRSET21841135

Article Preview