2002
DOI: 10.1007/3-540-45681-3_1
|View full text |Cite
|
Sign up to set email alerts
|

Optimized Substructure Discovery for Semi-structured Data

Abstract: In this paper, we consider the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery. We model semi-structured data and patterns with labeled ordered trees, and present an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data. We give theoretical analyses of the c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
48
0

Year Published

2005
2005
2014
2014

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 68 publications
(57 citation statements)
references
References 22 publications
0
48
0
Order By: Relevance
“…The binding sites of all the proteins are known, but the binding site extraction is conducted under the assumption that their binding sites are unknown. We ranked the optimal graphs extracted from each pocket based on the value of the evaluation function defined in (6). Fig.…”
Section: Experiments and Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The binding sites of all the proteins are known, but the binding site extraction is conducted under the assumption that their binding sites are unknown. We ranked the optimal graphs extracted from each pocket based on the value of the evaluation function defined in (6). Fig.…”
Section: Experiments and Discussionmentioning
confidence: 99%
“…These intuitive statements are formalized as follows. Certain subgraph I of a pocket in target protein T divides a set of referential proteins R into two subsets, We introduce an evaluation function, where a subgraph satisfying the above requirements has better value, based on a method proposed by Abe et al [6]:…”
Section: A Detecting Optimal Graphsmentioning
confidence: 99%
See 1 more Smart Citation
“…This results in a high memory consumption [17,1]. To control memory consumption in our analysis we restrict ourselves to mining neighborhoods of a particular size l, i.e., neighborhoods that include no more that l function calls.…”
Section: Filtering Proceduresmentioning
confidence: 99%
“…In our experimental tests we use the frequent pattern mining algorithm FREQT, independently introduced by Asai et al [1] and Zaki [17]. It efficiently extracts frequent ordered labeled subtrees from a database of rooted ordered labeled trees.…”
Section: Time Complexitymentioning
confidence: 99%