DOI: 10.1007/978-3-540-72667-8_3
|View full text |Cite
|
Sign up to set email alerts
|

The Lixto Systems Applications in Business Intelligence and Semantic Web

Abstract: Abstract. This paper shows how technologies for Web data extraction, syndication and integration allow for new applications and services in the Business Intelligence and the Semantic Web domain. First, we demonstrate how knowledge about market developments and competitor activities on the market can be extracted dynamically and automatically from semi-structured information sources on the Web. Then, we show how the data can be integrated in Business Intelligence Systems and how data can be classified, re-assig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 4 publications
0
9
0
Order By: Relevance
“…To tackle this challenge of extracting document internal structures, we will integrate our approach with the wrapper induction for information extraction approach proposed by (Kushmerick et al, 1997). The approach has been successfully applied to web site data extraction such as the Lixto system (Baumgartner et al, 2007). Since such wrapper induction approaches can dynamically and automatically extract structured knowledge from semi-structured information sources (Baumgartner et al, 2007;Kushmerick et al, 1997), there will be little manual labor involved when applying our expert finding approach to web sites of different organizations.…”
Section: Discussionmentioning
confidence: 99%
“…To tackle this challenge of extracting document internal structures, we will integrate our approach with the wrapper induction for information extraction approach proposed by (Kushmerick et al, 1997). The approach has been successfully applied to web site data extraction such as the Lixto system (Baumgartner et al, 2007). Since such wrapper induction approaches can dynamically and automatically extract structured knowledge from semi-structured information sources (Baumgartner et al, 2007;Kushmerick et al, 1997), there will be little manual labor involved when applying our expert finding approach to web sites of different organizations.…”
Section: Discussionmentioning
confidence: 99%
“…This is in contrast to HTML-aware systems, which exploit the tree-structure of HTML explicitly. This began with some interactive programming approaches where the user provided various structural constraints [4,27,38], and since then there has been greater focus on learning wrappers from examples in standard HTML query languages such as XPath or CSS [2, 10, 28-30, 32, 42], which has also been our focus in this work. XPath alignment approaches [28,29] work by aligning and merging the steps within the XPaths of sample nodes based on edit distances, while least general generalization methods [32] produce largest conjunctions of all common node attributes.…”
Section: Related Workmentioning
confidence: 99%
“…Although many specialized automated web extraction tools and services have become available in recent years (e.g. WIEN [21], STALKER [26], Lixto [4], Mozenda [18], import.io [16], SelectorGadget [39]), such technologies have generally targeted web extraction as an isolated task in specialized tools and have seen little adoption within the environments that data analysts commonly work in, as is evident from numerous online discussions in help forums as well as requests made to product teams. For example, data scientists working in Python environments (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…There are a few proposals that are a little surprising because they do not report on any experimental results [32][33][34] or report on very few [35,36], which does not contribute at all to drawing solid conclusions. Most of the remaining proposals provide enough empirical results, which helps support the conclusions better, but the methods used to evaluate and compare them were not solid enough.…”
Section: Review Of the Literaturementioning
confidence: 99%