Gehad Galal scite author profile

The large amount of data collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. In particular, scientific discovery systems focus on the utilization of richer data representation, sometimes without regard for scalability. This research investigates approaches for scaling a particular knowledge discovery in databases (KDD) system, SUBDUE, using parallel and distributed resources. SUBDUE has been used to discover interesting and repetitive concepts in graph‐based databases from a variety of domains, but requires a substantial amount of processing time. Experiments that demonstrate scalability of parallel versions of the SUBDUE system are performed using CAD circuit databases and artificially‐generated databases, and potential achievements and obstacles are discussed.

show abstract

Exploiting parallelism in knowledge discovery systems to improve scalability

Galal

Cook

Holder

View full text Add to dashboard Cite

Discovering Concepts in Structural Data

Cook¹,

Holder²,

Galal³

1999

View full text Add to dashboard Cite

The large amount of data collected today is quickly overwhelming researchers’ abilities to interpret the data and discover interesting patterns. In response to this problem, a number of researchers have developed techniques for discovering concepts in databases. These techniques work well for data expressed in a nonstructural, attribute-value representation and address issues of data relevance, missing data, noise and uncertainty, and utilization of domain knowledge (Fisher, 1987; Cheeseman and Stutz, 1996). However, recent data acquisition projects are collecting structural data describing the relationships among the data objects. Correspondingly, there exists a need for techniques to analyze and discover concepts in structural databases (Fayyad et al., 1996b). One method for discovering knowledge in structural data is the identification of common substructures. The goal is to find substructures capable of compressing the data and to identify conceptually interesting substructures that enhance the interpretation of the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. Once discovered, the substructure concept can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that can be accessed based on the goals of the data analysis. We describe a system called Subdue that discovers interesting substructures in structural data based on the minimum description length (MDL) principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gehad Galal

Approaches to Parallel Graph-Based Knowledge Discovery

Exploiting parallelism in a structural scientific discovery system to improve scalability

Exploiting parallelism in knowledge discovery systems to improve scalability

Discovering Concepts in Structural Data

Contact Info

Product

Resources

About