On the complexity of sampling query feedback restricted database repair of functional dependency violations

Miao, Dongjing; Liu, Xianmin; Li, Jianzhong

doi:10.1016/j.tcs.2015.02.010

Cited by 26 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many sampling methods are first proposed to process static data, like those stored in database systems [27]. Conventional methods include random sampling, weight sampling [28], stratified sampling [29], etc.…”

Section: Related Workmentioning

confidence: 99%

Online Adaptive Approximate Stream Processing With Customized Error Control

et al. 2019

View full text Add to dashboard Cite

In approximate processing on stream data, most works focus on how to approximate online arrival data. However, the efficiency of approximation needs to consider multiple aspects. Generally, customers submit their requests with specific quality requirements (e.g., maximum error). This raises a critical problem that online quality control is required to meet the desired quality of service. Since the continuous arriving data may not be entirely stored and needs to be processed immediately, it brings the difficulty of acquiring knowledge online which significantly affects the quality of results. To address these problems, we present an online adaptive approximate processing framework with a delicate combination of data learning, sampling, and quality control. We first design an online data learning strategy for stream data. With the real-time learning results, we propose a dynamic sampling strategy that switches to different sampling methods based on the change of the load. Finally, we present a double-check error control strategy to monitor and correct large errors. Each operation module is correlated through online learning and feedback. The experiments with both synthetic and real-world datasets show that the proposed approximate framework is not only applicable to different data distributions but also provides a customized error control.

show abstract

Section: Related Workmentioning

confidence: 99%

Online Adaptive Approximate Stream Processing With Customized Error Control

et al. 2019

View full text Add to dashboard Cite

show abstract

“…The algorithm has been used to generate CFDs automatically. Miao et al 12 systematically studied the problem of data consistency determination using CFD and measured the consistency quality of data sets. The ratio of the tuples in the most substantial subset of the rule set that satisfies CFD to the tuple number of the data set.…”

Section: Related Workmentioning

confidence: 99%

DRAV: Detection and repair of data availability violations in Internet of Things

Wang

et al. 2019

International Journal of Distributed Sensor Networks

View full text Add to dashboard Cite

The application of the Internet of Things has produced large amounts of data in different scenarios, which are accompanied with problems, such as consistency and integrity violations. Existing research on dealing with data availability violations is insufficient. In this work, the detection and repair of data availability violations (DRAV) framework is proposed to detect and repair data violations in Internet of Things with a distributed parallel computing environment. DRAV uses algorithms in the MapReduce programming framework, and these include detection and repair algorithms based on enhanced conditional function dependency for data consistency violation, MapJoin, and ReduceJoin algorithms based on master data for k-nearest neighbor–based integrity violation detection, and repair algorithms. Experiments are conducted to determine the effect of the algorithms. Results show that DRAV improves data availability in Internet of Things compared with existing methods by detecting and repairing violations.

show abstract

“…Technically, to the best of our knowledge, there is no existing work considering this aspect. There are some detection techniques [18][19][20]27] but they are not able to reveal how dirty the data is directly. For confidence computation [28], our problem generalizes the confidence of a single CFD; actually, this measurement is also the confidence of a set of CFDs.…”

Section: Motivationmentioning

confidence: 99%

“…Given the data edit operations (including tuple-level and cell-level), minimum cost repair will output repaired data with minimizing the difference between it and the original one. Our problem can be seen as a special case of [29], because the complementary minimum culprit can be seen as C-repair (cardinality repair) of an inconsistent database; however, it is much more expensive using the techniques of the authors of [27] directly, especially for dynamic data, and the algorithm given in this paper is more efficient and seems optimal. There are some other repair definitions, such as “minimum description length (MDL)” [23] and “relative trust” [21].…”

Section: Related Workmentioning

confidence: 99%

Data Inconsistency Evaluation for Cyberphysical System

Wang

Gao

2016

International Journal of Distributed Sensor Networks

Self Cite

View full text Add to dashboard Cite

Cyberphysical systems (CPSs) have been widely applied in a variety of applications to collect data, while data is often dirty in reality. We pay attention to the way of evaluating data inconsistency which is a major concern for evaluating quality of data and its source. This paper is the first study on data inconsistency evaluation problem for CPS based on conditional functional dependencies. Given a database instance including tuples and a CFD set Σ including CFDs, data inconsistency is defined as the ratio of the size of minimum culprit in , where a culprit is a set of tuples leading to integrity errors. Firstly, we give a sufficient analysis on the complexity and inapproximability of minimum culprit problem. Then, we provide a practical algorithm that gives a 2-approximation of the data dirtiness in ( log ) time based on independent residual subgraph. To deal with the large dynamic data, we provide a compact structure based on B-tree for storing independent residual subgraph in order to update inconsistency efficiently. At last, we test our algorithm on both synthetic and real-life datasets; the experiment results show the scalability of our algorithm and the quality of the evaluation result.

show abstract

On the complexity of sampling query feedback restricted database repair of functional dependency violations

Cited by 26 publications

References 14 publications

Online Adaptive Approximate Stream Processing With Customized Error Control

Online Adaptive Approximate Stream Processing With Customized Error Control

DRAV: Detection and repair of data availability violations in Internet of Things

Data Inconsistency Evaluation for Cyberphysical System

Contact Info

Product

Resources

About