Tim Verbelen scite author profile

The demand for articial intelligence has grown signicantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, rst and foremost the ecient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the eld by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available. 3:3200x over conventional CPUs for an image recognition algorithm using a pretrained multilayer perceptron (MLP).An alternative to generic GPUs for acceleration is the use of Application Specic Integrated Circuits (ASICs) which implement specialized functions through a highly optimized design. In recent times, the demand for such chips has risen signicantly [100]. When applied to e.g. Bitcoin mining, ASICs have a signicant competitive advantage over GPUs and CPUs due to their high performance and power eciency [145]. Since matrix multiplications play a prominent role in many machine learning algorithms, these workloads are highly amenable to acceleration through ASICS. Google applied this concept in their Tensor Processing Unit (TPU) [129], which, as the name suggests, is an ASIC that specializes in calculations on tensors (n-dimensional arrays), and is designed to accelerate their Tensorow [1][2] framework, a popular building block for machine learning models. The most important component of the TPU is its Matrix Multiply unit based on a systolic array. TPUs use a MIMD (Multiple Instructions, Multiple Data) [51] architecture which, unlike GPUs, allows them to execute diverging branches eciently. TPUs are attached to the server system through the PCI Express bus. This provides them with a direct connection with the CPU which allows for a high aggregated bandwidth of 63GB/s (PCI-e5x16). Multiple TPUs can be used in a data center and the individual units can collaborate in a distributed setting. The benet of the TPU over regular CPU/GPU setups is not only its increased processing power but also its power eciency, which is important in large-scale applications due to the cost of energy and the lim...

show abstract

AIOLOS: Middleware for improving mobile application performance through cyber foraging

Verbelen

Simoens

Turck

et al. 2012

Journal of Systems and Software

View full text Add to dashboard Cite

As the popularity of smartphones and tablets increases, the mobile platform is becoming a very important target for application developers. Despite recent advances in mobile hardware, most mobile devices fail to execute complex multimedia applications (such as image processing) with an acceptable level of user experience. Cyber foraging is a well-known computing technique to enhance the capabilities of mobile devices, where the mobile device offloads parts of the application to a nearby discovered server in the network.Although first introduced in 2001, cyber foraging is still not widely adopted in current smartphone platforms or applications. In this respect, two major challenges are to be tackled. First, a suitable adaptive decision engine is needed to determine the optimal offloading decision, that takes into account the potentially high and variable latency between the device and the server. Second, an integrated cyber foraging platform with sufficient support for application developers is not publicly available on popular mobile platforms such as Android.In this paper, we present AIOLOS, a mobile middleware framework for cyber foraging on the Android platform. AIOLOS uses an estimation model that takes into account server resources and network state to decide at runtime whether or not a method call should be offloaded. We also introduce developer tools to integrate the AIOLOS framework in the Android platform, enabling easy development of cyber foraging enabled applications. A prototype implementation is presented and evaluated in detail by means of both a chess application and a newly developed photo editor application.

show abstract

Graph partitioning algorithms for optimizing software deployment in mobile cloud computing

Verbelen

Stevens

Turck

et al. 2013

Future Generation Computer Systems

View full text Add to dashboard Cite

As cloud computing is gaining popularity, an important question is how to optimally deploy software applications on the offered infrastructure in the cloud. Especially in the context of mobile computing where software components could be offloaded from the mobile device to the cloud, it is important to optimize the deployment, by minimizing the network usage. Therefore we have designed and evaluated graph partitioning algorithms that allocate software components to machines in the cloud while minimizing the required bandwidth. Contrary to the traditional graph partitioning problem our algorithms are not restricted to balanced partitions and take into account infrastructure heterogenity. To benchmark our algorithms we evaluated their performance and found they produce 10 to 40% smaller graph cut sizes than METIS 4.0 for typical mobile computing scenarios.

show abstract

Foundation structural health monitoring of an offshore wind turbine—a full-scale case study

Weijtjens

Verbelen

Sitter

et al. 2016

Structural Health Monitoring

View full text Add to dashboard Cite

In this contribution, first, the results in the development of a structural health monitoring approach for the foundations of an offshore wind turbine based on its resonance frequencies will be presented. Key problems are the operational and environmental variability of the resonance frequencies of the turbine that potentially conceal any structural change. This article uses a (non-)linear regression model to compensate for the environmental variations. An operational case-by-case monitoring strategy is suggested to cope with the dynamic variability between different operational cases of the turbine. Real-life data obtained from an offshore turbine on a monopile foundation are used to validate the presented strategy and to demonstrate the performance of the presented approach. First, the results indicate an overall stiffening of the investigated structure.

show abstract

The cascading neural network: building the Internet of Smart Things

et al. 2017

View full text Add to dashboard Cite

Most of the research on deep neural networks (DNNs) so far has been focused on obtaining higher accuracy levels by building increasingly large and deep architectures. Training and evaluating these models is only feasible when large amounts of resources such as processing power and memory are available. Typical applications that could benefit from these models are however executed on resource constrained devices. Mobile devices such as smartphones already use deep learning techniques but they often have to perform all processing on a remote cloud. We propose a new architecture called a Cascading network that is capable of distributing a deep neural network between a local device and the cloud while keeping the required communication network traffic to a minimum. The network begins processing on the constrained device and only relies on the remote part when the local part does not provide an accurate enough result. The Cascading network allows for an early stopping mechanism during the recall phase of the network. We evaluated our approach in an Internet Of Things (IoT) context where a deep neural network adds intelligence to a large amount of heterogeneous connected devices. This technique enables a whole variety of autonomous systems where sensors, actuators and computing nodes can work together. We show that the Cascading architecture allows for a substantial improvement in evaluation speed on constrained devices while the loss in accuracy is kept to a minimum.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tim Verbelen

A Survey on Distributed Machine Learning

AIOLOS: Middleware for improving mobile application performance through cyber foraging

Graph partitioning algorithms for optimizing software deployment in mobile cloud computing

Foundation structural health monitoring of an offshore wind turbine—a full-scale case study

The cascading neural network: building the Internet of Smart Things

Contact Info

Product

Resources

About