Runtime optimization of join location in parallel data management systems

Chandra, Bikash; Sudarshan, S.

doi:10.14778/3137628.3137656

Cited by 3 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Health ○ ● Cao et al. (2010) Proceedings of the VLDB Endowment Computer Science ● ○

Chai and Nayak (2018) Electronic Journal of Statistics Statistics & Probability ○ ● Chandra and Sudarshan (2017) Proceedings of the VLDB Endowment Computer Science ○

Chen et al. (2012) Communications of the Association for Information Systems Information Systems ○ Cheng et al.…”

Section: Methodology Developmentmentioning

confidence: 99%

Towards a new era of mass data collection: Assessing pandemic surveillance technologies to preserve user privacy

Ribeiro‐Navarrete

Saura

Palacios‐Marqués

2021

Technological Forecasting and Social Change

162

View full text Add to dashboard Cite

Controlling the coronavirus pandemic is triggering a cross-border strategy by which national governments attempt to control the spread of the COVID-19 pandemic. A response based on sharing facts about millions of private movements and a call to study information behavior during the global health crisis has been advised worldwide. The present study aims to identify the technologies to control the COVID-19 and future pandemics with massive data collection from users’ mobile devices. This research undertakes a Systematic Literature Review (SLR) of the studies about the currently available methods, strategies, and actions to collect and analyze data from users’ mobile devices. In a total of 76 relevant studies, 13 technologies that are classified based on the following aspect of data and data management have been identified: (1) security; (2) destruction; (3) voluntary access; (4) time span; and (5) storage. In addition, in order to understand how these technologies can affect user privacy, 25 data points that these technologies could have access to if installed through mobile applications have been detected. The paper concludes with a discussion of important theoretical and practical implications of preserving user privacy and curbing COVID-19 infections in the global public health emergency situation.

show abstract

“…Health ○ ● Cao et al. (2010) Proceedings of the VLDB Endowment Computer Science ● ○

Chai and Nayak (2018) Electronic Journal of Statistics Statistics & Probability ○ ● Chandra and Sudarshan (2017) Proceedings of the VLDB Endowment Computer Science ○

Chen et al. (2012) Communications of the Association for Information Systems Information Systems ○ Cheng et al.…”

Section: Methodology Developmentmentioning

confidence: 99%

Towards a new era of mass data collection: Assessing pandemic surveillance technologies to preserve user privacy

Ribeiro‐Navarrete

Saura

Palacios‐Marqués

2021

Technological Forecasting and Social Change

162

View full text Add to dashboard Cite

show abstract

“…This can be mitigated by prefetching asynchronously, and dynamically deciding to prefetch only after a certain number of accesses to minimize the overhead of prefetching. This is similar to the classical ski-rental problem [19] and has been applied earlier in the context of join optimizations in parallel data management systems [20]. Extending COBRA to adapt heuristics from [14] to efficiently handle alternatives generated due to caching is part of future work, and dynamic approaches for prefetching are part of future work.…”

Section: Transformationsmentioning

confidence: 98%

Cobra: A Framework for Cost-Based Rewriting of Database Applications

Emani

Sudarshan

2018

2018 IEEE 34th International Conference on Data Engineering (ICDE)

Self Cite

View full text Add to dashboard Cite

Database applications are typically written using a mixture of imperative languages and declarative frameworks for data processing. Application logic gets distributed across the declarative and imperative parts of a program. Often, there is more than one way to implement the same program, whose efficiency may depend on a number of parameters. In this paper, we propose a framework that automatically generates all equivalent alternatives of a given program using a given set of program transformations, and chooses the least cost alternative. We use the concept of program regions as an algebraic abstraction of a program and extend the Volcano/Cascades framework for optimization of algebraic expressions, to optimize programs. We illustrate the use of our framework for optimizing database applications. We show through experimental results, that our framework has wide applicability in real world applications and provides significant performance benefits.

show abstract

“…The approach was provided better performance, but however, it failed in enhancing query processing performance. To address this issue, a systematic process that was considered by load testing and profiling data was presented in [10,11] that utilized a software refactoring process to reduce the run time.…”

Section: Related Workmentioning

confidence: 99%

Random Forest Bagging and X‐Means Clustered Antipattern Detection from SQL Query Log for Accessing Secure Mobile Data

Dhanaraj

Ramakrishnan

Poongodi

et al. 2021

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

In the current ongoing crisis, people mostly rely on mobile phones for all the activities, but query analysis and mobile data security are major issues. Several research works have been made on efficient detection of antipatterns for minimizing the complexity of query analysis. However, more focus needs to be given to the accuracy aspect. In addition, for grouping similar antipatterns, a clustering process was performed to eradicate the design errors. To address the above-said issues and further enhance the antipattern detection accuracy with minimum time and false positive rate, in this work, Random Forest Bagging X-means SQL Query Clustering (RFBXSQLQC) technique is proposed. Different patterns or queries are initially gathered from the input SQL query log, and bootstrap samples are created. Then, for each pattern, various weak clusters are constructed via X-means clustering and are utilized as the weak learner (clusters). During this process, the input patterns are categorized into different clusters. Using the Bayesian information criterion, the similarity measure is employed to evaluate the similarity between the patterns and cluster weight. Based on the similarity value, patterns are assigned to either relevant or irrelevant groups. The weak learner results are aggregated to form strong clusters, and, with the aid of voting, a majority vote is considered for designing strong clusters with minimum time. Experiments are conducted to evaluate the performance of the RFBXSQLQC technique using the IIT Bombay dataset using the metrics like antipattern detection accuracy, time complexity, false-positive rate, and computational overhead with respect to the differing number of queries. The results revealed that the RFBXSQLQC technique outperforms the existing algorithms by 19% with pattern detection accuracy, 34% minimized time complexity, 64% false-positive rate, and 31% in terms of computational overhead.

show abstract

Runtime optimization of join location in parallel data management systems

Cited by 3 publications

References 25 publications

Towards a new era of mass data collection: Assessing pandemic surveillance technologies to preserve user privacy

Towards a new era of mass data collection: Assessing pandemic surveillance technologies to preserve user privacy

Cobra: A Framework for Cost-Based Rewriting of Database Applications

Random Forest Bagging and X‐Means Clustered Antipattern Detection from SQL Query Log for Accessing Secure Mobile Data

Contact Info

Product

Resources

About