2021
DOI: 10.14569/ijacsa.2021.0120886
|View full text |Cite
|
Sign up to set email alerts
|

A Systematic Review Web Content Mining Tools and its Applications

Abstract: In recent years, the emergence of WWW (World Wide Web) led to the accumulation of huge amount of information and data. Hence the web is found to consist of unstructured and structured information that impacts the day to day life of the society. Because of such availability of huge information, utilization of the required information becomes more challenging. This paper provided a comprehensive survey on the current situation and recent trends on web content mining (WCM) and its applications thereby contributin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…Naturally, solutions should limit human-in-the-loop presence by, e.g., handling multi-domain data, and be flexible with regard to possible data sources. Specifically, considering that many systems have web interfaces, the concept of web content mining (mining information from web documents, often using NLP techniques) is applicable [3]. The solution, outlined in this work, extends web content mining, since it is applicable to screenshots that can originate from any system, supporting business analytics and process mining [4].…”
Section: Introductionmentioning
confidence: 99%
“…Naturally, solutions should limit human-in-the-loop presence by, e.g., handling multi-domain data, and be flexible with regard to possible data sources. Specifically, considering that many systems have web interfaces, the concept of web content mining (mining information from web documents, often using NLP techniques) is applicable [3]. The solution, outlined in this work, extends web content mining, since it is applicable to screenshots that can originate from any system, supporting business analytics and process mining [4].…”
Section: Introductionmentioning
confidence: 99%
“…The extension has several functionalities, such as creating a dataset, curating data, executing JavaScript/TypeScript extraction algorithms, and evaluating those algorithms using multiple measures. Additionally, we found in the literature several systematic reviews [98,89] and comparisons [106,115] of content extraction algorithms. We used the datasets and metrics proposed by those comparisons to evaluate our page-level content extraction algorithm so we could compare the obtained results with them.…”
Section: Part Vmentioning
confidence: 99%