This paper describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions permits us to implement some algorithms that are simpler, more accurate, and sometimes faster than possible without these features. The new BLAS are challenging to implement and test because there are many more subroutines than in the existing Standard, and because we must be able to assess whether a higher precision is used for internal computations than is used either for input or output variables. So we have developed an automated process of generating and systematically testing these routines. Our methodology is applicable to languages besides C. In particular, our algorithms used in the testing code would be very valuable to all the other BLAS implementors. Our extra precision routines achieve excellent performance-close to half of the machine peak Megaflop rate even for the Level 2 BLAS, when the data access is stride one. *
Cloud computing offers users the ability to access large pools of computational and storage resources on-demand without the burden of managing and maintaining their own IT assets. Today's cloud providers charge users based upon the amount of resources used or reserved, with only minimal guarantees of the quality-of-service (QoS) experienced by the users applications. As virtualization technologies proliferate among cloud providers, consolidating multiple user applications onto multi-core servers increases revenue and improves resource utilization. However, consolidation introduces performance interference between co-located workloads, which significantly impacts application QoS.A critical requirement for effective consolidation is to be able to predict the impact of application performance in the presence of interference from on-chip resources, e.g., CPU and last-levelcache (LLC)/memory bandwidth sharing, to storage devices and network bandwidth contention. In this work, we propose an interference model which predicts the application QoS metric.The key distinctive feature is the consideration of time-variant inter-dependency among different levels of resource interference. We use applications from a test suite and SPECWeb2005 to illustrate the effectiveness of our model and an average prediction error of less than 8% is achieved. Furthermore, we demonstrate using the proposed interference model to optimize the cloud provider's metric (here the number of successfully executed applications) to realize better workload placement decisions and thereby maintaining the user's application QoS.
Abstract-Traditional distributed algorithms for QoS routing in wired networks may not work in the ad-hoc domain, due to interference between links. We propose a heuristic interferenceaware QoS routing algorithm (IQRouting) that chooses candidate paths based on localized information at the source nodes. The candidate paths are compared in a distributed manner using probe packets, with the best path confirmed by the destination node. Simulations demonstrate significant (up to 30%) improvements in admission ratio over traditional shortest path algorithms, and better performance than other QoS routing algorithms in literature as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.