Intelligent integration of information from semistructured web data sources on the basis of ontology and meta-models

Arnicans, Guntis; Karnītis, Ģirts

doi:10.1109/dbis.2006.1678494

Cited by 8 publications

(12 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sometimes one text (web pages in text form) has the same information found in other text or lesser for the same product, for example text 8 and text 10 have the same number of sub attributes. In this case, RIA deletes one of the texts for reducing the space of storage in the universal database (IS UDB) (Guntis Arnicans & Girts Karnitis 2006). IS UDB receives the rest of the texts from RIA and saved them in universal database.…”

Section: Relevant Information Analyzer (Ria)mentioning

confidence: 99%

“…David Buttler et al (2001) observed in their tests of 50 web sites with over 2000 web pages that the tag <TABLE> is used as object separator (18% of time) more than the other tags such as tag <P> 10% of time, tag <li> 8% of time, tag <hr> 6% of time, tag <ul> 2% of time, tag <DIV> 2% of time, and tag <a> 2% of time. Therefore, the relevant information in a web page that the user needs which must be extracted by IE are found between the tag <TABLE> and </TABLE> (Guntis Arnicans and Girts Karnitis 2006;Fatima Ashraf et al 2008). Each table is formatted in rows and columns, whereas it is distinguished in head and body according to meaning.…”

Section: Concepts Of Information Extraction (Ie)mentioning

confidence: 99%

“…Step 1: Based on the standard classification of Nokia products such as General, Size, Display, Ringtones, Memory, Data, Features, and Battery (Guntis Arnicans & Girts Karnitis 2006;Domenico Beneventano & Stefania Magnani 2004) (the attributes are shown in Figure 5) which is stored in database, IE extracts and classifies the web pages. Each kind of product is classified depending on the attributes.…”

Section: Information Extraction (Ie)mentioning

confidence: 99%

“…When Internet users want to get information about Nokia products for example, they first visit search engines such as Yahoo and Google, and then visit all web sites suggested by the search engine. Many researchers such as Guntis Arnicans and Girts Karnitis 2006;Sung Won Jung et al 2001;Srinivas Vadrevu et al 2007;and Horacio Saggion et al 2008 work on extraction of information from web data sources in different domains (traveling, products, business intelligence) but these researches deal with limited web data sources and users still need to use the search engines such as Yahoo and Google to collect more information. We proposed a framework for extracting information from different web data sources.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A Framework for Extracting Information from Semi-Structured Web Data Sources

Shaker¹,

Ibrahim²,

Nurliyana³

2010

Convergence and Hybrid Information Technologies

View full text Add to dashboard Cite

Section: Relevant Information Analyzer (Ria)mentioning

confidence: 99%

Section: Concepts Of Information Extraction (Ie)mentioning

confidence: 99%

Section: Information Extraction (Ie)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Framework for Extracting Information from Semi-Structured Web Data Sources

Shaker¹,

Ibrahim²,

Nurliyana³

2010

Convergence and Hybrid Information Technologies

View full text Add to dashboard Cite

“…Many researchers such as [7,10,16,17] research on extraction of information from web pages in different domains (traveling, products, business intelligence) but these researches deal with limited web pages and the user still need to use the search engines such as Yahoo and Google to collect more information.…”

Section: Introductionmentioning

confidence: 99%

Information Extraction from Hypertext Mark-Up Language Web Pages

Shaker¹,

Ibrahim²,

Abdullah³

2009

J. of Computer Science

View full text Add to dashboard Cite

Problems statement: Nowadays, many users use web search engines to find and gather information. User faces an increasing amount of various HTML information sources. The issue of correlating, integrating and presenting related information to users becomes important. When a user uses a search engine such as Yahoo and Google to seek specific information, the results are not only information about the availability of the desired information, but also information about other pages on which the desired information is mentioned. The number of selected pages is enormous. Therefore, the performance capabilities, the overlap among results for the same queries and limitations of web search engines are an important and large area of research. Extracting information from the web pages also becomes very important because the massive and increasing amount of diverse HTML information sources in the internet that are available to users and the variety of web pages making the process of information extraction from web a challenging problem. Approach: This study proposed an approach for extracting information from HTML web pages which was able to extract relevant information from different web pages based on standard classifications. Results: Proposed approach was evaluated by conducting experiments on a number of web pages from different domains and achieved increment in precision and F measure as well as decrement in recall. Conclusion: Experiments demonstrated that our approach extracted the attributes besides the sub attributes that described the extracted attributes and values of the sub attributes from various web pages. Proposed approach was able to extract the attributes that appear in different names in some of the web pages.

show abstract

Multidimensional modeling driven from a domain language

2022

View full text Add to dashboard Cite

The multidimensional model is based on the concepts of facts (business phenomena to be analyzed), dimensions (coordinates for analyzing a fact), hierarchies (descriptions of each dimension at progressively coarser levels of aggregation), and measures (numerical attributes that quantify a fact), and it is commonly adopted for representing data to support the decision-making process. Though multidimensional modeling has been widely investigated, requirements elicitation is still an open issue mainly due to the poor knowledge end-users have of the multidimensional model on the one hand, to the lack of a domain language shared with designers on the other. In the direction of bridging this gap, this paper proposes an approach to obtain a multidimensional schema from the language of the domain captured through a Language Extended Lexicon (LEL). LELs have been introduced as structured glossaries to describe the language used in the application domain, aimed at facilitating requirements elicitation in software engineering. Methods: Our approach consists of two steps. In the first one, end-users apply a set of derivation rules to the LEL in order to obtain draft multidimensional schemata. The second step relies on Multidimensional Modeling Driven From a Domain Language the interaction of end-users and designers to review and edit these draft multidimensional schemata so as to obtain the final ones. Results: The approach is validated via an experiment made on a case study, showing that end-users who apply our rules tend to produce multidimensional schemata that are more correct than those produced by end-users who work freely. Conclusion: Our rules provide a structured context where subjectivity has a smaller impact than in the case of designing with no guidelines, thus effectively supporting the collaboration between end-users and designers.

show abstract

Intelligent integration of information from semistructured web data sources on the basis of ontology and meta-models

Cited by 8 publications

References 8 publications

A Framework for Extracting Information from Semi-Structured Web Data Sources

A Framework for Extracting Information from Semi-Structured Web Data Sources

Information Extraction from Hypertext Mark-Up Language Web Pages

Multidimensional modeling driven from a domain language

Contact Info

Product

Resources

About