Handling WSD using Hierarchical Clustering Algorithm with sentences
DOI:
https://doi.org/10.32628/IJSRSET1841120Keywords:
NLP-natural language processing, POS-part of speech, sentence clustering, K-means.Abstract
Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%
References
- Lingling Meng, Runqing Huang, JunzhongGu [2013]" A Review of Semantic Similarity Measures in WordNet" International Journal of Hybrid Information Technology . Vol. 6, No. 1,PP-1-12.
- Patheja.P.S , Akhilesh A. Waoo ,RichaGarg [2012] ,"Part of speech tagging " International journal of computer science & information Technology (IJCSIT) Vol.3 No 4,PP..
- Liang Wen1, Juan Li1, Yaohong Jin1, Yongjie Lu2 Kogilavani.A , "A Method for Word Sense Disambiguation Combining Contextual Semantic Features" c 2016 IEEE Vol.No.978-1-5090-0922-0/16/$31.00_
- Saha Diganta, Alok Ranjan Pal, "Word Sense Disambiguation In Bengali: An Unsupervised Approach" ©2017IEEE Vol.No 978-1-5090-3239- 6/17/$31.00
- Steinbach, M., Karypis, G., Kumar, V., "A Comparison of Document Clustering Techniques," University of Minnesota, Technical Report #00-034 (2000).
- Robert C. Moore, "An Improved Tag Dictionary for Faster Part-ofSpeech Tagging" Conference on Empirical Methods in Natural Language Processing, pages 1303–1308,Lisbon, Portugal, 17-21 September 2015. c 2015 Association for Computational Linguistics.
- Asim M. El Tahir Ali, Hussam M. Dahwa Abdulla, and Vaclav Snasel [2012] , "Overview and Comparison of Plagiarism Detection Tools "International journal of computer science& information Technology (IJCSIT) Vol 134,No.3,PP-161-172.
- A. and Dr.P.Balasubrama,[2014] "Clustering and feature specific sentence extraction based summarization of multiple documents " International journal of computer science & InformationTechnology (IJCSIT) Vol.2, No.4,PP-99-111.
- Clustering Algorithms for Sentence Level Text" International journal of computer trends and technology Vol 10 No2,PP-61-66.
- Mujawar Nilofar Shabbir, Prof. Amrit Priyadarshi [June 2016]," Clustering Sentence Level Text using Hierarchical FRECCA Algorithm " International Journal of Advanced Research in Computer and Communication Engineering Vol. 5, Issue 6,
- Wordnet manual, A Lexical Database-Princeton university-WordNet 2.1 - https://wordnet.princeton.edu/download/current-version
- POS tagger - stanford university https://nlp.stanford.edu/software/tagger.shtml
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.