A segmentation method for web page analysis using shrinking and dividing

Cao, Jiuxin; Mao, Bo; Luo, Junzhou

doi:10.1080/17445760802429585

Cited by 46 publications

(30 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our segmentation technique involves directing the Web page classifier to bounded areas of a Web Page via recursive division; a technique also utilized by Cao et al [2010] in what they described as their "iterative shrinking and dividing" strategy. These bounded areas are defined by the longest frequent patterns (LFPs) of HTML sequences within each region.…”

Section: Discussionmentioning

confidence: 99%

Automated classification and localization of daily deal content from the Web

Cuzzola¹,

Jovanović

Bagheri³

et al. 2015

Applied Soft Computing

View full text Add to dashboard Cite

Please cite this article as: J. Cuzzola, J. Jovanovic, E. Bagheri, D. Gasevic, Automated classification and localization of daily deal content from the Web, Applied Soft Computing Journal (2015), http://dx. AbstractWebsites offering daily deal offers have received widespread attention from the end-users. The objective of such Websites is to provide time limited discounts on goods and services in the hope of enticing more customers to purchase such goods or services. The success of daily deal Websites has given rise to meta-level daily deal aggregator services that collect daily deal information from across the Web. Due to some of the unique characteristics of daily deal Websites such as high update frequency, time sensitivity, and lack of coherent information representation, many deal aggregators rely on human intervention to identify and extract deal information. In this paper, we propose an approach where daily deal information is identified, classified and properly segmented and localized. Our approach is based on a semi-supervised method that uses sentence-level features of daily deal information on a given Web page. Our work offers i) a set of computationally inexpensive discriminative features that are able to effectively distinguish Web pages that contain daily deal information; ii) the construction and systematic evaluation of machine learning techniques based on these features to automatically classify daily deal Web pages; and iii) the development of an accurate segmentation algorithm that is able to localize and extract individual deals from within a complex Web page. We have extensively evaluated our approach from different perspectives, the results of which show notable performance.

show abstract

Section: Discussionmentioning

confidence: 99%

Automated classification and localization of daily deal content from the Web

Cuzzola¹,

Jovanović

Bagheri³

et al. 2015

Applied Soft Computing

View full text Add to dashboard Cite

show abstract

“…The results for this experiment are given in Table 2. The highlighted rows ( Scene 1,2,17,18,19 and 20) refer to scenes in which no errors were introduced. As in the first experiment, the two last columns are the most interesting ones.…”

Section: Resultsmentioning

confidence: 99%

Proceedings of the Third Workshop on Vision and Language

2014

View full text Add to dashboard Cite

PrefaceThe Workshop on Vision and Language 2014 (VL'14) took place in Dublin on 23rd July 2014, as part of COLING'14. It was the joint 3rd meeting of the EPSRC Network On Vision and Language and 1st technical meeting of the new European Network on Integrating Vision and Language which is funded as a European COST Action. The VL workshops have the general aims:1. to provide a forum for reporting and discussing planned, ongoing and completed research that involves both language and vision; and 2. to enable NLP and computer vision researchers to meet, exchange ideas, expertise and technology, and form new research partnerships.As funding for the V&L EPSRC Network (EP/H018557) ends and funding for the iV&L Net European COST Action (IC1307) starts, the focus of the VL workshops will shift onto integration and joint modelling of language and vision. iV&L Net will take over the organisation of annual VL workshops for the next four years as the flagship workshop of this new COST Action.The call for papers for VL'14 was issued in May 2014 and elicited a good number of highquality submissions, each of which was peer-reviewed by three members of the programme committee. The interest in the workshop from leading NLP and computer vision researchers and the quality of submissions was high, so we aimed to be as inclusive as possible within the practical constraints of the workshop. In the end we accepted 14 submissions as long papers, and eight as short papers.The resulting workshop programme packed a lot of exciting content into one day. We were delighted to be able to include in the programme a keynote presentation by Alex Jaimes of Yahoo! Inc., an internationally leading vision researcher. Our technical programme combined seven oral papers, seven long poster papers and seven short poster papers. Some thematic clusters emerged: combined text and image processing (Nguyen et al., Sakaki et al., Jones et al., Zhang et al., HaCohen-Kerner et al.), image description, annotation and labelling (Elliott, Liparas et al., Wang et al., Jokinen and Wilcock), data set creation (Weiland et al., Le et al., McGuinness et al.), situated dialogue (Summers-Stay et al., Schütte et al.), video analysis (Bhat and Olszewska, Shrestha et al.), aids for visually impaired people (Safi et al., Belz and Bharath), and visual analysis supported by text/speech features (Anbarjafari and Aabloo). The programme also included a discussion session on future directions for the VL community and workshops, including plans for shared task competitions.We would like to thank all the people who have contributed to the organisation and delivery of this workshop: the authors who submitted such high quality papers; the programme committee for their prompt and effective reviewing; our keynote speaker, Alex Jaimes; the COLING 2014 organising committee, especially the workshops chairs, Jennifer Foster, Dan Gildea, and Tim Baldwin; the participants in the workshop; and future readers of these proceedings for your shared interest in this exciting new area of research. Aug...

show abstract

“…An image processing based segmentation approach is illustrated in [19]. The segmentation process based text density of the contents is explained in [20].…”

Section: Web Page Segmentationmentioning

confidence: 99%