Pinpointing subcellular protein localizations from microscopy images is easy to the trained eye, but challenging to automate. Based on the Human Protein Atlas image collection, we held a competition to identify deep learning solutions to solve this task. Challenges included training on highly imbalanced classes and predicting multiple labels per image. Over 3 months, 2,172 teams participated. Despite convergence on popular networks and training techniques, there was considerable variety among the solutions. Participants applied strategies for modifying neural networks and loss functions, augmenting data and using pretrained networks. The winning models far outperformed our previous effort at multi-label classification of protein localization patterns by ~20%. These models can be used as classifiers to annotate new images, feature extractors to measure pattern similarity or pretrained networks for a wide range of biological applications.
Measuring the indirect cost of context switch is a challenging problem. In this paper, we show our results of experimentally quantifying the indirect cost of context switch using a synthetic workload. Specifically, we measure the impact of program data size and access stride on context switch cost. We also demonstrate the potential impact of OS background interrupt handling on the measurement accuracy. Such impact can be alleviated by using a multi-processor system on which one processor is employed for context switch measurement while the other runs OS background tasks.
During concurrent I/O workloads, sequential access to one I/O stream can be interrupted by accesses to other streams in the system. Frequent switching between multiple sequential I/O streams may severely affect I/O efficiency due to long disk seek and rotational delays of disk-based storage devices. Aggressive prefetching can improve the granularity of sequential data access in such cases, but it comes with a higher risk of retrieving unneeded data. This paper proposes a competitive prefetching strategy that controls the prefetching depth so that the overhead of disk I/O switch and unnecessary prefetching are balanced. The proposed strategy does not require a-priori information on the data access pattern, and achieves at least half the performance (in terms of I/O throughput) of the optimal offline policy. We also provide analysis on the optimality of our competitiveness result and extend the competitiveness result to capture prefetching in the case of random-access workloads.We have implemented the proposed competitive prefetching policy in Linux 2.6.10 and evaluated its performance on both standalone disks and a disk array using a variety of workloads (including two common file utilities, Linux kernel compilation, the TPC-H benchmark, the Apache web server, and index searching). Compared to the original Linux kernel, our competitive prefetching system improves performance by up to 53%. At the same time, it trails the performance of an oracle prefetching strategy by no more than 42%.
Today's processors provide a rich source of statistical information on application execution through hardware counters. In this paper, we explore the utilization of these statistics as request signatures in server applications for identifying requests and inferring highlevel request properties (e.g., CPU and I/O resource needs). Our key finding is that effective request signatures may be constructed using a small amount of hardware statistics while the request is still in an early stage of its execution. Such on-the-fly request identification and property inference allow guided operating system adaptation at request granularity (e.g., resource-aware request scheduling and on-the-fly request classification). We address the challenges of selecting hardware counter metrics for signature construction and providing necessary operating system support for per-request statistics management. Our implementation in the Linux 2.6.10 kernel suggests that our approach requires low overhead suitable for runtime deployment. Our on-the-fly request resource consumption inference (averaging 7%, 3%, 20%, and 41% prediction errors for four server workloads, TPC-C, TPC-H, J2EE-based RUBiS, and a trace-driven index search, respectively) is much more accurate than the online running-average based prediction (73-82% errors). Its use for resource-aware request scheduling results in a 15-70% response time reduction for three CPU-bound applications. Its use for on-the-fly request classification and anomaly detection exhibits high accuracy for the TPC-H workload with synthetically generated anomalous requests following a typical SQL-injection attack pattern.
Complex system software allows a variety of execution conditions on system configurations and workload properties. This paper explores a principled use of reference executions-those of similar execution conditions from the target-to help identify the symptoms and causes of performance anomalies. First, to identify anomaly symptoms, we construct change profiles that probabilistically characterize expected performance deviations between target and reference executions. By synthesizing several single-parameter change profiles, we can scalably identify anomalous reference-totarget changes in a complex system with multiple execution parameters. Second, to narrow the scope of anomaly root cause analysis, we filter anomaly-related low-level system metrics as those that manifest very differently between target and reference executions. Our anomaly identification approach requires little expert knowledge or detailed models on system internals and consequently it can be easily deployed. Using empirical case studies on the Linux I/O subsystem and a J2EE-based distributed online service, we demonstrate our approach's effectiveness in identifying performance anomalies over a wide range of execution conditions as well as multiple system software versions. In particular, we discovered five previously unknown performance anomaly causes in the Linux 2.6.23 kernel. Additionally, our preliminary results suggest that online anomaly detection and system reconfiguration may help evade performance anomalies in complex online systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.