A Survey of Automatic Deep Web Classification Techniques

Noor, Umara; Rashid, Zahid; Rauf, Abdul

doi:10.5120/2362-3099

Cited by 24 publications

(20 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A number of query interface clustering algorithms have been developed [Noor et al, 2011]. They can be distinguished by the type of data they employ for clustering and by their underlying clustering approach.…”

Section: Query Interface Clusteringmentioning

confidence: 99%

Deep Web Query Interface Understanding and Integration

Dragut

Meng

2012

Synthesis Lectures on Data Management

View full text Add to dashboard Cite

Section: Query Interface Clusteringmentioning

confidence: 99%

Deep Web Query Interface Understanding and Integration

Dragut

Meng

2012

Synthesis Lectures on Data Management

View full text Add to dashboard Cite

“…Such similarity matching is also called as semantic mapping. Semantic mapping between query schemas is not an easy job as discussed in the context of data integration in [24].…”

Section: Visible Form Featuresmentioning

confidence: 99%

Todweb

Noor

Rashid

Rauf

2011

Proceedings of the 13th International Conference on Information Integration and Web-Based Applications and Services

Self Cite

View full text Add to dashboard Cite

Today, deep web comprises of a large part of web contents. Because of this large volume of data, the technologies related to deep web have gained larger attention in recent years. Deep web mostly comprises of online domain specific databases, which are accessed by using web query interfaces. These highly relevant domain specific databases are more suitable for satisfying the information needs of the users. In order to make the extraction of relevant information easier, there is a need to classify the deep web databases into subject-specific self-descriptive categories. In this paper we present a novel training-less classification approach TODWEB based on common sense world knowledge (in the form of ontology or any external lexical resource) for the automatic deep web source classification; which will help in building highly scalable, domain focused and efficient semantic information retrieval systems (i.e. metasearch engine and search engine directories). One of the important aspects of this approach is the classification method which is completely training less and uses Wikipedia category network and domain-independent ontologies to analyze the semantics in the meta-information of the deep web sources. The large number of fine grained Wikipedia categories are employed to analyze semantic relatedness among concepts and finally the URL of deep web search source is mapped to the category hierarchy offered by Wikipedia. The experiments conducted on a collection of search sources shows that this approach results in a highly accurate and fine grained classification as compared to existing approaches, nearly identical to the results achieved by manual classification.

show abstract

“…Generally, users search for their desired information by looking at the default groups or directories that appeared on the search engine screen, such as entertainment, sports, computers, books, etc. The advantage of these directories is the development of a high quality directory service; however, the approach cannot be applied to this research due to the limitation of scalability [3].…”

Section: Introductionmentioning

confidence: 99%

Enhancing the Performance of Proxy Cache Management through Browsing Behavior-based Learning Mechanism

Hiranpongsin¹,

Bhattarakosol²

2013

IJIPM

View full text Add to dashboard Cite

Proxy caching is an effective technique that improves the quality of service (QoS) over the Internet. However, the existing methods for the cache replacement management cannot effectively support this process when the number of requests rapidly increases without boundaries. This paper proposes a new caching architecture model, called the Web Usage Pattern-Based Caching Architecture (WUPCA). The WUPCA implements a behavior-based learning mechanism that applies the concepts of the recommender system in the browsing procedure. Moreover, this learning mechanism leads to the grouping of caches that can better utilize the browsing characteristics and improve the performance of the Internet services. The experiments indicate that the proposed technique has much better performance than the traditional one in the quantitative metrics, such as hit rate, byte hit rate, and average response time of accessed websites. For example, the hit rate and the average response time of the WUPCA are enhanced approximately 30% while the byte hit rate increases more than 52%.

show abstract

A Survey of Automatic Deep Web Classification Techniques

Cited by 24 publications

References 15 publications

Deep Web Query Interface Understanding and Integration

Deep Web Query Interface Understanding and Integration

Todweb

Enhancing the Performance of Proxy Cache Management through Browsing Behavior-based Learning Mechanism

Contact Info

Product

Resources

About