Handling WSD using Hierarchical Clustering Algorithm with sentences

Authors

  • Mohana Priya K  ME CSE, Jansons Institute of Technology,Coimbatore,Tamil Nadu, India
  • Pooja Ragavi S  MBA, GRG School of Management Studies, Coimbatore,Tamil Nadu, India
  • Krishna Priya G  Assistant Professor, Department of CSE, Jansons Institute of Technology,Coimbatore, Tamil Nadu, India

DOI:

https://doi.org//10.32628/IJSRSET1841120

Keywords:

NLP-natural language processing, POS-part of speech, sentence clustering, K-means.

Abstract

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

References

  1. Lingling Meng, Runqing Huang, JunzhongGu [2013]" A Review of Semantic Similarity Measures in WordNet" International Journal of Hybrid Information Technology . Vol. 6, No. 1,PP-1-12.
  2. Patheja.P.S , Akhilesh A. Waoo ,RichaGarg [2012] ,"Part of speech tagging " International journal of computer science & information Technology (IJCSIT) Vol.3 No 4,PP..
  3. Liang Wen1, Juan Li1, Yaohong Jin1, Yongjie Lu2 Kogilavani.A , "A Method for Word Sense Disambiguation Combining Contextual Semantic Features" c 2016 IEEE Vol.No.978-1-5090-0922-0/16/$31.00_
  4. Saha Diganta, Alok Ranjan Pal, "Word Sense Disambiguation In Bengali: An Unsupervised Approach" ©2017IEEE Vol.No 978-1-5090-3239- 6/17/$31.00
  5. Steinbach, M., Karypis, G., Kumar, V., "A Comparison of Document Clustering Techniques," University of Minnesota, Technical Report #00-034 (2000).
  6. Robert C. Moore, "An Improved Tag Dictionary for Faster Part-ofSpeech Tagging" Conference on Empirical Methods in Natural Language Processing, pages 1303–1308,Lisbon, Portugal, 17-21 September 2015. c 2015 Association for Computational Linguistics.
  7. Asim M. El Tahir Ali, Hussam M. Dahwa Abdulla, and Vaclav Snasel [2012] , "Overview and Comparison of Plagiarism Detection Tools "International journal of computer science& information Technology (IJCSIT) Vol 134,No.3,PP-161-172.
  8. A. and Dr.P.Balasubrama,[2014] "Clustering and feature specific sentence extraction based summarization of multiple documents " International journal of computer science & InformationTechnology (IJCSIT) Vol.2, No.4,PP-99-111.
  9. Clustering Algorithms for Sentence Level Text" International journal of computer trends and technology Vol 10 No2,PP-61-66.
  10. Mujawar Nilofar Shabbir, Prof. Amrit Priyadarshi [June 2016]," Clustering Sentence Level Text using Hierarchical FRECCA Algorithm " International Journal of Advanced Research in Computer and Communication Engineering Vol. 5, Issue 6,
  11. Wordnet manual, A Lexical Database-Princeton university-WordNet 2.1 - https://wordnet.princeton.edu/download/current-version
  12. POS tagger - stanford university https://nlp.stanford.edu/software/tagger.shtml

Downloads

Published

2018-11-30

Issue

Section

Research Articles

How to Cite

[1]
Mohana Priya K, Pooja Ragavi S, Krishna Priya G, " Handling WSD using Hierarchical Clustering Algorithm with sentences, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 11, pp.83-88, November-December-2018. Available at doi : https://doi.org/10.32628/IJSRSET1841120