Abstract. The requirements of wide-area distributed database systems differ dramatically from those of local-area network systems. In a wide-area network (WAN) configuration, individual sites usually report to different system administrators, have different access and charging algorithms, install site-specific data type extensions, and have different constraints on servicing remote requests. Typical of the last point are production transaction environments, which are fully engaged during normal business hours, and cannot take on additional load. Finally, there may be many sites participating in a WAN distributed DBMS.In this world, a single program performing global query optimization using a cost-based optimizer will not work well. Cost-based optimization does not respond well to sitespecific type extension, access constraints, charging algorithms, and time-of-day constraints. Furthermore, traditional cost-based distributed optimizers do not scale well to a large number of possible processing sites. Since traditional distributed DBMSs have all used cost-based optimizers, they are not appropriate in a WAN environment, and a new architecture is required.We have proposed and implemented an economic paradigm as the solution to these issues in a new distributed DBMS called Mariposa. In this paper, we present the architecture and implementation of Mariposa and discuss early feedback on its operating characteristics.
The Mariposa distributed data manager uses an economic model for managing the allocation of both storage objects and queries to servers. In this paper, we present extensions to the economic model which support replica management, as well as our mechanisms for propagating updates among replicas. We show h o w our replica control mechanism can be used to provide consistent, although potentially stale, views of data across many machines without expensive per-transaction synchronization. We present a rule-based conict resolution mechanism, which can be used to enhance traditional time-stamp serialization. We discuss the eects of our replica system on query processing for both read-only and read-write queries. We further demonstrate how the replication model and mechanisms naturally support name service in Mariposa.
Figure 1. A scalable scatterplot visualization created by Kyrix-S and its Kyrix-S specifications. One billion comments made by users on Reddit.com from Jan 2013 to Feb 2015 are visualized on 15 zoom levels. On every level, X and Y axes are respectively the posting time and length of the comments. Each circle represents a cluster of comments. The number inside each circle is the size of the cluster and also encodes the radius of the circle. Using pan or zoom, the user can get either an overview (left) or inspect an area of interest (middle). One can hover over a circle to see three highest-scored comments in the cluster, as well as a bounding box showing the boundary of the cluster.
In order to reduce the possibility of neural injury from seizures and sidestep the need for a neurologist to spend hours on manually reviewing the EEG recording, it is critical to automatically detect and classify "interictal-ictal continuum" (IIC) patterns from EEG data. However, the existing IIC classification techniques are shown to be not accurate and robust enough for clinical use because of the lack of high quality labels of EEG segments as training data. Obtaining high-quality labeled data is traditionally a manual process by trained clinicians that can be tedious, time-consuming, and errorprone. In this work, we propose Smile, an industrial scale system that provides an end-to-end solution to the IIC pattern classification problem. The core components of Smile include a visualizationbased time series labeling module and a deep-learning based active learning module. The labeling module enables the users to explore and label 350 million EEG segments (30TB) at interactive speed. The multiple coordinated views allow the users to examine the EEG signals from both time domain and frequency domain simultaneously. The active learning module first trains a deep neural network that automatically extracts both the local features with respect to each segment itself and the long term dynamics of the EEG signals to classify IIC patterns. Then leveraging the output of the deep learning model, the EEG segments that can best improve the model are selected and prompted to clinicians to label. This process is iterated until the clinicians and the models show high degree of agreement. Our initial experimental results show that our Smile system allows the clinicians to label the EEG segments at will with a response time below 500 ms. The accuracy of the model is progressively improved as more and more high quality labels are acquired over time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.