Jan S. Rellermeyer scite author profile

The demand for articial intelligence has grown signicantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, rst and foremost the ecient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the eld by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available. 3:3200x over conventional CPUs for an image recognition algorithm using a pretrained multilayer perceptron (MLP).An alternative to generic GPUs for acceleration is the use of Application Specic Integrated Circuits (ASICs) which implement specialized functions through a highly optimized design. In recent times, the demand for such chips has risen signicantly [100]. When applied to e.g. Bitcoin mining, ASICs have a signicant competitive advantage over GPUs and CPUs due to their high performance and power eciency [145]. Since matrix multiplications play a prominent role in many machine learning algorithms, these workloads are highly amenable to acceleration through ASICS. Google applied this concept in their Tensor Processing Unit (TPU) [129], which, as the name suggests, is an ASIC that specializes in calculations on tensors (n-dimensional arrays), and is designed to accelerate their Tensorow [1][2] framework, a popular building block for machine learning models. The most important component of the TPU is its Matrix Multiply unit based on a systolic array. TPUs use a MIMD (Multiple Instructions, Multiple Data) [51] architecture which, unlike GPUs, allows them to execute diverging branches eciently. TPUs are attached to the server system through the PCI Express bus. This provides them with a direct connection with the CPU which allows for a high aggregated bandwidth of 63GB/s (PCI-e5x16). Multiple TPUs can be used in a data center and the individual units can collaborate in a distributed setting. The benet of the TPU over regular CPU/GPU setups is not only its increased processing power but also its power eciency, which is important in large-scale applications due to the cost of energy and the lim...

show abstract

R-OSGi: Distributed Applications Through Software Modularization

Rellermeyer

Alonso

Roscoe

2007

130

View full text Add to dashboard Cite

Abstract. In this paper we take advantage of the concepts developed for centralized module management, such as dynamic loading and unloading of modules, and show how they can be used to support the development and deployment of distributed applications. We do so through R-OSGi, a distributed middleware platform that extends the centralized, industry-standard OSGi specification to support distributed module management. To the developer, R-OSGi looks like a conventional module management tool. However, at deployment time, R-OSGi can be used to turn the application into a distributed application by simply indicating where the different modules should be deployed. At run time, R-OSGi represents distributed failures as module insertion and withdrawal operations so that the logic to deal with failures is the same as that employed to deal with dependencies among software modules. In doing so, R-OSGi greatly simplifies the development of distributed applications with no performance cost. In the paper we describe R-OSGi and several use cases. We also show with extensive experiments that R-OSGi has a performance comparable or better than that of RMI or UPnP, both commonly used distribution mechanisms with far less functionality than R-OSGi.

show abstract

An empirical study of the robustness of Inter-component Communication in Android

Maji

Arshad

Bagchi

et al. 2012

View full text Add to dashboard Cite

Abstract-Over the last three years, Android has established itself as the largest-selling operating system for smartphones. It boasts of a Linux-based robust kernel, a modular framework with multiple components in each application, and a securityconscious design where each application is isolated in its own virtual machine. However, all of these desirable properties would be rendered ineffectual if an application were to deliver erroneous messages to targeted applications and thus cause the target to behave incorrectly. In this paper, we present an empirical evaluation of the robustness of Inter-component Communication (ICC) in Android through fuzz testing methodology, whereby, parameters of the inter-component communication are changed to various incorrect values. We show that not only exception handling is a rarity in Android applications, but also it is possible to crash the Android runtime from unprivileged user processes. Based on our observations, we highlight some of the critical design issues in Android ICC and suggest solutions to alleviate these problems.

show abstract

AlfredO: An Architecture for Flexible Interaction with Electronic Devices

Rellermeyer

Riva

Alonso

2008

View full text Add to dashboard Cite

Abstract. Mobile phones are rapidly becoming the universal access point for computing, communication, and digital infrastructures. In this paper we explore the software architectures necessary to make the mobile phone a truly universal access point to any electronic infrastructure. We propose AlfredO, a lightweight middleware architecture that allows developers to construct applications in a modular way, organizing the applications into detachable tiers that can be distributed at will to dynamically configure multi-tier architectures between mobile phones and service providers. Through AlfredO, a phone can lease on-the-fly the client side of an application and immediately become a fully tailored client. Our experimental results indicate that AlfredO has very little overhead, it is scalable, and yields very low latency. To demonstrate the feasibility and potential of the platform, in the paper we also describe AlfredOShop, a prototype application for spontaneously controlling information screens from a mobile phone.

show abstract

The Software Fabric for the Internet of Things

Rellermeyer

Duller

Gilmer³

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.