A Graph Based Approach for Efficient Document Similarity Detection

Authors

  • G. Padmaja  PG Scholar, Department of MCA, St.Ann's College Of Engineering and Technology, Chirala, Andhra Pradesh, India
  • M. Sarada  Assistant professor, Department of MCA, St.Ann's College of Engineering and Technology, Chirala, Andhra Pradesh, India

Keywords:

Commonsense Knowledge Representation and Reasoning, Natural Language Processing, Semantic Similarity

Abstract

Commonsense knowledge representation and thinking bolster a wide assortment of potential applications in fields, for example, record auto-order, Web seek improvement, theme gisting, social process demonstrating, and idea level conclusion and assessment examination. Answers for these issues, notwithstanding, request vigorous information bases fit for supporting adaptable, nuanced thinking. Populating such information bases is profoundly tedious, making it important to create procedures for deconstructing regular dialect writings into conventional ideas. In this work, we propose an approach for viable multi-word realistic articulation extraction from unlimited English content, notwithstanding a semantic likeness discovery strategy permitting extra matches to be found for particular ideas not officially show in knowledge bases.

References

  1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. The Semantic Web, pages 722-735, 2007. 3 http://sentic.net/parser.zip
  2. E. Cambria, N. Howard, J. Hsu, and A. Hussain. Sentic blending: Scalable multimodal fusion for continuous interpretation of semantics and sentics. In IEEE SSCI, Singapore, 2013.
  3. E. Cambria and A. Hussain. Sentic Computing: Techniques, Tools, and Applications. Springer, Dordrecht, Netherlands, 2012.
  4. E. Cambria, D. Rajagopal, D. Olsher, and D. Das. Big social data analysis. In R. Akerkar, editor, Big Data Computing, chapter 13. Chapman and Hall/CRC, 2013.
  5. E. Cambria, Y. Song, H. Wang, and N. Howard. Semantic multi-dimensional scaling for open-domain sentiment analysis. IEEE Intelligent Systems, doi: 10.1109/MIS.2012.118, 2013.
  6. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka, and T. Mitchell. Toward an architecture for never-ending language learning. In AAAI, pages 1306-1313, Atlanta, 2010.
  7. G. Carroll and E. Charniak. Two experiments on learning probabilistic dependency grammars from corpora. AAAI technical report WS-92-01, Department of Computer Science, Univ., 1992.
  8. E. Charniak. Statistical parsing with a context-free grammar and word statistics. In AAAI, pages 598-603, Providence, 1997.
  9. S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6):391-407, 1990.
  10. C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211-218, 1936.
  11. C. Fellbaum. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, 1998.
  12. M. Grassi, E. Cambria, A. Hussain, and F. Piazza. Sentic web: A new paradigm for managing social media affective information. Cognitive Computation, 3(3):480-489, 2011.
  13. R. Hwa. Sample selection for statistical grammar induction. In EMNLP, pages 45-52, Hong Kong, 2000.
  14. J. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. Advances in neural information processing systems, 15:657-664, 2002.
  15. D. Lenat and R. Guha. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, Boston, 1989.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
G. Padmaja, M. Sarada, " A Graph Based Approach for Efficient Document Similarity Detection, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 7, pp.65-69, March-April-2018.