DERIN: A data extraction method based on rendering information and n-gram

Figueiredo, Leandro Neiva Lopes; Assis, Guilherme Tavares de; Ferreira, Anderson A.

doi:10.1016/j.ipm.2017.04.007

Cited by 16 publications

(5 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the literature, there are many proposals to extract data from HTML documents in general, not specifically tables (Ferrara, de Meo, Fiumara, & Baumgartner, 2014;Sleiman & Corchuelo, 2013a). They rely on text alignment (Sleiman & Corchuelo, 2013b), neural networks (Sleiman & Corchuelo, 2014), learning first-order rules (Jiménez & Corchuelo, 2016a), inferring propositiorelational rules (Jiménez & Corchuelo, 2016b), learning decision trees (Uzun, Agun, & Yerlikaya, 2013), embedding graphs (Jiménez, Roldán, Gallego, & Corchuelo, 2020), or using n-grams and rendering information (Figueiredo, Assis, & Ferreira, 2017), to mention a few. Unfortunately, they do not seem to be appropriate to extract the underlying relationships between the cells in HTML tables (Cafarella et al, 2018), which motivated much work on table-understanding (Roldán et al, 2020;Zhang & Balog, 2020).…”

Section: Context and Motivationmentioning

confidence: 99%

A clustering approach to extract data from HTML tables

Jiménez

Roldán²,

Corchuelo³

2021

Information Processing & Management

View full text Add to dashboard Cite

Section: Context and Motivationmentioning

confidence: 99%

A clustering approach to extract data from HTML tables

Jiménez

Roldán²,

Corchuelo³

2021

Information Processing & Management

View full text Add to dashboard Cite

“…Bu is the backlink start pointed towards node u, and N v is the number of links of each node v. One node v divides its own ranking by N v and delivers to page u, which is connected through the links. Nodes with backlinks from important nodes (high ranking) are ranked high [35].…”

Section: = (mentioning

confidence: 99%

Preemptive Prediction-Based Automated Cyberattack Framework Modeling

et al. 2021

View full text Add to dashboard Cite

As the development of technology accelerates, the Fourth Industrial Revolution, which combines various technologies and provides them as one service, has been in the spotlight, and services using big data, Artificial Intelligence (AI) and Internet of Things (IoT) are becoming more intelligent and helpful to users. As these services are used in various fields, attacks by attackers also occur in various areas and ways. However, cyberattacks by attackers may vary depending on the attacking pattern of the attacker, and the same vulnerability can be attacked from different perspectives. Therefore, in this study, by constructing a cyberattack framework based on preemptive prediction, we can collect vulnerability information based on big data existing on the network and increase the accuracy by applying machine learning to the mapping of keywords frequently mentioned in attack strategies. We propose an attack strategy prediction framework.

show abstract

“…Data mining principles can be independent of a particular domain for knowledge extraction [11] since their methods are able to learn how to extract the data, perform a given analysis domain independently and detect different record structures and their attributes based on rendering information [18]. It is increased the importance of understanding correlations between data, and data mining methods are interesting to find some patterns and association rules for various analyses and decision aids such as product category recommendations and determination of possible behavioral changes [31].…”

Section: Data Mining and Meteorologymentioning

confidence: 99%

Explainability with Association Rule Learning for Weather Forecast

2021

View full text Add to dashboard Cite

The reliability of the weather forecast models is a complex issue since it depends on numerous parameters and the technical infrastructure which supports them. In doing so, there is a need for advanced works oriented towards a better understanding of these models and the analysis of main associated parameters. Our approach is to study the applicability of the extracted association rules to provide a clearer understanding of atmospheric exchanges. In this work, the proposed methodology is based on the discovery of the interesting interpretable relationships between measured meteorological parameters at the Atmospheric Research Center of Lannemezan (South-West of France). In the preprocessing step, the proposed method is considered to be effectively flexible to account for data uncertainties, unlike the majority of classical evaluation methods mainly directed towards the reduction of variables and data redundancy. In postprocessing, the advantage of our approach is that the extracted rules are a metamodeling of interpretable useful knowledge for the clarity and conciseness of its representation. Moreover, in the processing, the interpretability in data sciences is recent and still in its infancy. The generated association rules with their statistical and semantic interpretations have globally highlighted the possibilities of explicit analysis of meteorological parameters. This study showed that among the generated relevant rules, three parameters (temperature, humidity, wind speed) have a high frequency in the antecedents of the rules and that the only consequence is rain. This is useful for the identification of potential improvements and gaps in the existing models of atmospheric observations, in particular, to understand the related parameterizations to the productivity of the rain phenomenon.

show abstract

DERIN: A data extraction method based on rendering information and n-gram

Cited by 16 publications

References 12 publications

A clustering approach to extract data from HTML tables

A clustering approach to extract data from HTML tables

Preemptive Prediction-Based Automated Cyberattack Framework Modeling

Explainability with Association Rule Learning for Weather Forecast

Contact Info

Product

Resources

About