2015
DOI: 10.1007/s11277-015-2596-7
|View full text |Cite
|
Sign up to set email alerts
|

Effectual Web Content Mining using Noise Removal from Web Pages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…The Jaccard coefficient [29] was, for example, used by Sivakumar [30] in 2015. The purpose of the study was to ultimately improve search results by comparing blocks of text within web pages, as well as identifying and removing duplicate advertisements, headers, and other recurring features present in web sites.…”
Section: Related Workmentioning
confidence: 99%
“…The Jaccard coefficient [29] was, for example, used by Sivakumar [30] in 2015. The purpose of the study was to ultimately improve search results by comparing blocks of text within web pages, as well as identifying and removing duplicate advertisements, headers, and other recurring features present in web sites.…”
Section: Related Workmentioning
confidence: 99%
“…Several hashing techniques such as minhash [24], simhash [25] and hybrid hash [26] techniques are widely used to eliminate the noises present in the web pages and also extracting the duplicates and near-duplicate blocks present in the web pages. Noisy Data Cleaner (NDC) algorithm [27] was introduced to extract core content and to eliminate the noises present in the web pages.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The comparative analysis has also been made with the proposed method by comparing it with other existing methods such as N-gram approach [7], sentence level features with fingerprints (SLF-FP) [22], SimSeerX [23], enhanced weighted approach [19], Simhash [25], hybrid hash [26], and NDC algorithm [27]. The comparative analysis for the proposed model and the other mentioned exiting model is carried out and 100 documents from the three datasets DS1, DS2, and DS3 are taken for the analysis having varied number of RD and DD documents.…”
Section: Performance Analysismentioning
confidence: 99%
“…are discussed. Duplicate contents and noise content as per block importance are removed [16]. Global and local noises are removed [17].…”
Section: Noise Removal From Web Pagesmentioning
confidence: 99%