Proceedings of the 22nd International Database Engineering &Amp; Applications Symposium on - IDEAS 2018 2018
DOI: 10.1145/3216122.3216126
|View full text |Cite
|
Sign up to set email alerts
|

Practical Study of Deterministic Regular Expressions from Large-scale XML and Schema Data

Abstract: Regular expressions are a fundamental concept in computer science and widely used in various applications. In this paper we focused on deterministic regular expressions (DREs). Considering that researchers didn't have large datasets as evidence before, we first harvested a large corpus of real data from the Web then conducted a practical study to investigate the usage of DREs. One feature of our work is that the data set is sufficiently large compared with previous work, which is obtained using several data co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…A regular expressions is said deterministic if we always know definitely the next symbol we will match in the expression without looking ahead in the string, when we match a string from left to right against the expression [31] . For instance, "(a|b) * a" is not deterministic as the first symbol in the string "aaa" could be matched by either the first or the second a in the expression.…”
Section: Determinismmentioning
confidence: 99%
“…A regular expressions is said deterministic if we always know definitely the next symbol we will match in the expression without looking ahead in the string, when we match a string from left to right against the expression [31] . For instance, "(a|b) * a" is not deterministic as the first symbol in the string "aaa" could be matched by either the first or the second a in the expression.…”
Section: Determinismmentioning
confidence: 99%
“…14: The proportion of subclasses on Relax NG. The dataset used for this statistical experiment is acquired from [28], with 509, 267 regular expressions from 4, 526 Rleax NG schemas.…”
Section: Usage Of Soire In Practicementioning
confidence: 99%
“…Most existing subclasses of regular expressions for XML are defined on standard regular expressions, e.g., [5,7,6,16,35] which were analyzed together in [31,28]. For single occurrence regular expressions (SOREs), in which each symbol occurs at most once and its subclass chain regular expressions (CHAREs), Bex et al proposed two inference algorithms RWR and CRX [7,8].…”
Section: Introductionmentioning
confidence: 99%
“…Such kinds of applications motivate us to investigate the problem of learning RE(&) from positive and negative examples. Most researchers have studied subclasses of REs, which are expressive enough to cover the vast majority of real-world applications [6,7,22] and perform better on several decision problems than general ones [6,7,19,20,25,27]. Bex et al [3] proposed learning algorithms for two subclasses of REs: SOREs and CHAREs, which capture many practical DTDs/XSDs and are both single occurrence REs.…”
Section: Introductionmentioning
confidence: 99%