Manuscript Number : IJSRSET162361
A Research on Web Content Extraction and Noise Reduction through Text Density Using Malicious URL Pattern Detection
Authors(2) :-Charmi Patel, Prof. Hiteishi Diwanji
A Web Page has large amount of information including some additional contents like hyperlinks, header footer, navigational panel; advertisements which may cause the content extraction to be complicated. Page Segmentation is used to detect the noisy content block by detecting malicious URL from Web Pages. Main aim of this research is detecting malicious URL during content extraction by checking different patterns of URL. Performance is analysed based on precision, recall, execution time and noise detected using proposed algorithm.
Charmi Patel
Page segmentation, Malicious URL, URL patterns, Text density
Publication Details
Published in :
Volume 2 | Issue 3 | May-June 2016 Article Preview
Information Technology, L. D. Engineering College, Ahmedabad, Gujarat, India
Prof. Hiteishi Diwanji
Information Technology, L. D. Engineering College, Ahmedabad, Gujarat, India
Date of Publication :
2016-06-30
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) :
128-132
Manuscript Number :
IJSRSET162361
Publisher : Technoscience Academy
Journal URL :
https://ijsrset.com/IJSRSET162361