2003
DOI: 10.28945/509
|View full text |Cite
|
Sign up to set email alerts
|

HTML Tags as Extraction Cues for Web Page Description Construction

Abstract: Using four previously identified samples of Web pages containing meta-tagged descriptions, the value of meta-tagged keywords, the first 200 characters of the body, and text marked with common HTML tags as extracts helpful for writing summaries was estimated by applying two measures: density of description words and density of two-word description phrases. Generally, titles and keywords showed the highest densities. Parts of the body showed densities not much different from the body as a whole: somewhat higher … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…Raghavan and GarciaMolina [16] discuss the techniques to extract information from HTML form (i.e., web form). Craven [15] proposes techniques for processing information in the meta tags of a web page. Xiao et al [4] describe an approach to extract descriptive tags from WSDL file for SOAP-based Web Services.…”
Section: Analysis and Discussion Of Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Raghavan and GarciaMolina [16] discuss the techniques to extract information from HTML form (i.e., web form). Craven [15] proposes techniques for processing information in the meta tags of a web page. Xiao et al [4] describe an approach to extract descriptive tags from WSDL file for SOAP-based Web Services.…”
Section: Analysis and Discussion Of Resultsmentioning
confidence: 99%
“…Specifically, objectives of the case study are: (1) verify if our approach can correctly describe web resources. The techniques for extracting information from SOAP-based Web Service, web form and informational web page have been discussed and evaluated in the existing literature (e.g., [4][16] [15]). Hence, we focus on evaluating our technique that extracts information and describe HTTPbased API; (2) evaluate whether the unified description schema can help discover similar web resources of different types.…”
Section: Case Studymentioning
confidence: 99%
“…Researchers interested in this question, such as Craven and Sokvitne, focus their studies on determining the type, amount, and quality of the metadata produced by those posting web pages on the Internet. Craven studies metadata use in general and the description and Title fields in particular (Craven 2000, Craven 2001a, Craven 2001b, Craven 2001c, Craven 2001d, Craven 2002a, Craven 2002b, Craven 2003. This research shows that the content of the description field is very similar to traditional abstracts in terms of language characteristics (Craven 2000) and that these descriptions often change over time as the site is updated (Craven 2001a).…”
Section: Introductionmentioning
confidence: 82%
“…Earlier work has shown some indications that other elements may also have positive or negative effects on the relevance of their contents, such as text style elements (bold, italic, etc.) or anchor text [11,30]. Additionally, we could experiment with different weights for the different sizes of headers (h1 to h6).…”
Section: Discussionmentioning
confidence: 99%