Matheus L. da Silva scite author profile

The representation of words by means of vectors, also called Word Embeddings (WE), has been receiving great attention from the Natural Language Processing (NLP) field. WE models are able to express syntactic and semantic similarities, as well as relationships and contexts of words within a given corpus. Although the most popular implementations of WE algorithms present low scalability, there are new approaches that apply High-Performance Computing (HPC) techniques. This is an opportunity for an analysis of the main differences among the existing implementations, based on performance and scalability metrics. In this paper, we present a study which addresses resource utilization and performance aspects of known WE algorithms found in the literature. To improve scalability and usability we propose a wrapper library for local and remote execution environments that contains a set of optimizations such as the pWord2vec, pWord2vec MPI, Wang2vec and the original Word2vec algorithm. Utilizing these optimizations it is possible to achieve an average performance gain of 15x for multicores and 105x for multinodes compared to the original version. There is also a big reduction in the memory footprint compared to the most popular python versions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Matheus L. da Silva

An Interference-Aware Application Classifier Based on Machine Learning to Improve Scheduling in Clouds

ML-driven classification scheme for dynamic interference-aware resource scheduling in cloud infrastructures

IADA: A dynamic interference-aware cloud scheduling architecture for latency-sensitive workloads

Evaluating the performance and improving the usability of parallel and distributed Word Embeddings tools

Contact Info

Product

Resources

About