Web Page Classification with an Ant Colony Algorithm

Holden, Nicholas; Freitas, Alex A.

doi:10.1007/978-3-540-30217-9_110

Cited by 39 publications

(15 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A number of techniques have been proposed based on extracted terms and textual information [2,3,4]. A set of words extracted from web documents are used as features for classification algorithms.…”

Section: Related Workmentioning

confidence: 99%

Classification of News Web Documents Based on Structural Features

Tongchim

Sornlertlamvanich

Isahara

2006

Advances in Natural Language Processing

View full text Add to dashboard Cite

Abstract. The motivation of this work comes from the need of a Thai web corpus for testing our information retrieval algorithm. Two collections of news web documents are gathered from two different Thai newspaper web sites. Our goal is to find a simple yet effective method to extract news articles from these web collections. We explore the use of machine learning methods to distinguish article pages from non-article pages, e.g. table of contents, advertisements. Then, the selected web articles are compared in a fine-grained manner in order to find informative structures. Both steps of information extraction utilize the structural features of web documents rather than the extracted keywords or terms. Thus, the inherent errors of word segmentation, one of the major problems in Thai text processing, do not affect to this method.

show abstract

Section: Related Workmentioning

confidence: 99%

Classification of News Web Documents Based on Structural Features

Tongchim

Sornlertlamvanich

Isahara

2006

Advances in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…IDF TF × = weight system and the paper [18], in which the authors describe a web page classification system using an ant colony algorithm for classification but relying heavily on WordNet for processing of web pages.…”

Section: Wordnetmentioning

confidence: 99%

AISIID: An artificial immune system for interesting information discovery on the web

Secker

Freitas

Timmis

2008

Applied Soft Computing

Self Cite

View full text Add to dashboard Cite

There exist numerous systems for mining the web in search of relevant information but few exist for the discovery of interesting information. The discovery of interesting information is an advance on basic text mining in that it aims to identify text that is novel, unexpected or surprising to a user, whilst still being relevant. This article investigates the use of Artificial Immune Systems (AIS) applied to discovery of interesting information. AIS are thought to confer the adaptability and learning required for this task. AISIID (Artificial Immune system for Interesting Information Discovery) is described in some detail, then an evaluative study is undertaken involving the subjective evaluation of the results by users. AISIID is found to discover pages rated more interesting by users than a comparative system.

show abstract

“…The algorithm implements the basic idea of awarding the best attributes (used by the ants to construct the best rules) with pheromone, which increases the probability of those attributes being selected by the next ants to construct other rules. A simple high-level pseudocode of Ant-Miner is shown in Pseudocode 1, adapted from [7]. A more detailed description of Ant-Miner can be found in [3].…”

Section: The Original Ant-miner Classification Algorithmmentioning

confidence: 99%

“…All attributes within these datasets are binary, where each attribute denotes whether or not a given word occurs in a given web page (example). These datasets have been collected by and previously been experimented with Ant-Miner by Holden & Freitas [7].…”

Section: Experimental Setup and Datasets Used In The Experimentsmentioning

confidence: 99%

A New Classification-Rule Pruning Procedure for an Ant Colony Algorithm

Chan

Freitas

2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. This work proposes a new rule pruning procedure for Ant-Miner, an Ant Colony algorithm that discovers classification rules in the context of data mining. The performance of Ant-Miner with the new pruning procedure is evaluated and compared with the performance of the original Ant-Miner across several datasets. The results show that the new pruning procedure has a mixed effect on the performance of Ant-Miner. On one hand, overall it tends to decrease the classification accuracy more often than it improves it. On the other hand, the new pruning procedure in general leads to the discovery of classification rules that are considerably shorter, and so simpler (more easily interpretable by the users) than the rules discovered by the original Ant-Miner.

show abstract

Web Page Classification with an Ant Colony Algorithm

Cited by 39 publications

References 11 publications

Classification of News Web Documents Based on Structural Features

Classification of News Web Documents Based on Structural Features

AISIID: An artificial immune system for interesting information discovery on the web

A New Classification-Rule Pruning Procedure for an Ant Colony Algorithm

Contact Info

Product

Resources

About