NET – A System for Extracting Web Data from Flat and Nested Data Records

Liu, Bing; Zhai, Yanhong

doi:10.1007/11581062_39

Cited by 58 publications

(42 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It develops the new technique that correlates HTML pages and produces a wrapper with respect to their similarities and variations. Bing Liu, YanhongZhai [9], it explains the issue of automatic web data retrieval from several structured data records. They also explain how to segment the QRR, extracting the records from the data region and put them in Tabular form.…”

Section: Literature Reviewmentioning

confidence: 99%

“…The case when QRR contains the multi valued attribute, then a few of the data values may not be arranged to other data values. The proposed system does not use the this type of arrangement before the data records are arranged as it is aligned in DeLa [8] and NET [9], it uses it later the data records are arranged. Using this arrangement before the data records are arranged, it makes them unsafe to optional attribute so due to what it makes the tag structure irregular.…”

Section:  Nested Structure Processingmentioning

confidence: 99%

See 1 more Smart Citation

Data Extraction and Alignment by using Combining Tag and Values Similarity

Pathak¹,

Chidrawar²

2017

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Section: Literature Reviewmentioning

confidence: 99%

Section:  Nested Structure Processingmentioning

confidence: 99%

Data Extraction and Alignment by using Combining Tag and Values Similarity

Pathak¹,

Chidrawar²

2017

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

“…This feature helps the parsers of search engines to interact with the web pages' contents more efficiently (Ma et al, 2003). One of the useful techniques is wrappers as specified by Palmieri et al (2004), and Liu and Zhai (2005). Wrappers are responsible for converting HTML documents into semantically meaningful XML files to simplify the operation of extracting data.…”

Section: Related Workmentioning

confidence: 99%

“…The suggested method by Park and Barbosa (2007) avoids those weaknesses by using the web data extractor algorithm which depends on clustering and the weighted tree matching metric to extract data. Liu and Zhai (2005) realised the importance of extracting data records that were retrieved from databases and displayed on web pages. They analysed the disadvantage of the approaches that were used for extracting data i.e., wrapper induction and automatic extraction, then they proposed a method called nested data extraction using tree matching and visual cues (NET) for extracting flat or nested data records automatically.…”

Section: Related Workmentioning

confidence: 99%

Online social network profile data extraction for vulnerability analysis

Alim

Abdulrahman

Neagu

et al. 2011

IJITST

View full text Add to dashboard Cite

Abstract:The increase in social computing has provided the situation where large amounts of personal information are being posted online. This makes people vulnerable to social engineering attacks because their personal details are readily available. Our automated approach for personal data extraction was developed to extract personal details and top friends from MySpace profiles and place them into a repository. An online social network graph was generated from the repository data where nodes represent peoples' profiles. Analysis was carried out into what factors affect node vulnerability. The graph analysis identified structural features of the nodes, e.g., clustering coefficient, indegree and outdegree, which contribute towards vulnerability. From this, it was found that the number of neighbours and the clustering coefficient were major factors in making a node vulnerable because of the potential to spread personal details around the network. These results provide a good foundation for future work on online vulnerability in online social networks (OSNs).Keywords: online social network; OSN; vulnerability; information disclosure; automated data retrieval.Reference to this paper should be made as follows: Alim, S., Abdulrahman, R., Neagu, D. and Ridley, M. (2011) 'Online social network profile data extraction for vulnerability analysis ', Int. J. Internet Technology and Secured Transactions, Vol. 3, No. 2, Biographical notes: Sophia Alim graduated in 2006 with BSc (Hons.) in Business Information Systems from the University of Salford, UK. In 2007, she received her MSc in Computing from the University of Bradford UK. At the same university, currently, she is working towards a PhD with Dr. Daniel Neagu and Mr. Mick Ridley as her supervisors. Her research focuses on the ever evolving area of social networking and how the issue of privacy is going to affect the structure and information disclosure of these networks. Her motivation for her research comes from her desire to reflect the multidisciplinary areas of computing. Her research interests include web accessibility and social networking. Online social network profile data extraction for vulnerability analysis 195Ruqayya Abdulrahman is a Lecturer in Computer Science at Taibah University, Saudi Arabia. In 2002, she obtained her BSc (Hons.) in Computer Science from King Abdulaziz University in Saudi Arabia. In 2007; she was awarded an MSc with distinction in Software Engineering by the University of Bradford, UK. Currently, she is a PhD student at the School of Computing, Informatics and Media of the University of Bradford. Her research addresses software agents, web database processing, data retrieval, online social network and software engineering.Daniel Neagu is a Senior Lecturer in Computing at the University of Bradford. His research interests include knowledge discovery, information retrieval, data mining applications in multidisciplinary projects (with a focus in online social networks, healthcare and web profiling) by fusion of human experts knowledge and...

show abstract

“…These nodes constitute a similar sub-tree and then are divided into different data region, Where each node corresponds to a data record, through the analysis of the DOM structure of the page define some extraction rules for data ex-traction. Based on MDR, Zhai Y [2], Liu B [3], Simon K [4], Lausen G and other algorithms have been proposed DEPTA, NET, and VIPER algorithm. These algorithms are all based on the analysis of DOM structure to define corresponding rules for extraction, which need to traverse a large number of DOM nodes and cost a lot of time.…”

Section: Introductionmentioning

confidence: 99%

A Vision Recognition Based Method for Web Data Extraction

Cai¹,

Liu²,

Xu³

et al. 2017

Advanced Science and Technology Letters

View full text Add to dashboard Cite

Abstract. This paper proposes a data extraction method based on visual recognition and Document Object Model(DOM) tree for Deep Pages to extract a large number of Deep Web data in-formation. By utilizing the characteristics of the presentation of Deep Web data and the characteristics of the visual information of the web page, the data region of multiple targets is located, and the data of the data region is extracted accurately by DOM analysis. Experiments were conducted on several travel websites, and test results show that efficiency and accuracy of the extraction are higher than those of the traditional methods.

show abstract

NET – A System for Extracting Web Data from Flat and Nested Data Records

Cited by 58 publications

References 19 publications

Data Extraction and Alignment by using Combining Tag and Values Similarity

Data Extraction and Alignment by using Combining Tag and Values Similarity

Online social network profile data extraction for vulnerability analysis

A Vision Recognition Based Method for Web Data Extraction

Contact Info

Product

Resources

About