Samir Anter scite author profile

Hidden Markov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark’s resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.

show abstract

A Study on Big Data Frameworks and Machine Learning Tool Kits

Sassi¹,

Anter²

2019

View full text Add to dashboard Cite

Big Data is an extremely large amount of structured and unstructured data, gathered from a wide range of sources which often require a fast processing and real time analysis. In this new context, the performances of the traditional techniques are limited. However, to handle these bulky quantities of data, new technologies emerged, called Big Data technologies. In fact, the characteristics of Big Data made the exploration process of these data a painful task. This process is called Big Data Analytics. One of the important challenges of Big Data is to search new technologies or to improve and extend the existing platforms, infrastructures and standard techniques to manage the Big Data. Hadoop / MapReduce paradigm and the Spark framework are among the most prominent solutions for large-scale parallel distributed data processing alongside Machine Learning techniques, in particularly, Deep Learning for performing powerful statistical and predictive analysis. In this paper, we first, give an overview, a classification and a comparison of main Big Data technologies. Then, we focus in particular on Machine Learning platforms and libraries, especially those for Deep Learning. The results show that Spark is a general-purpose computation engine thanks to its very generalized solutions.

show abstract

Towards a Data Quality Assessment in Big Data

Reda

Sassi

Zellou

et al. 2020

View full text Add to dashboard Cite

An Overview of Big Data and Machine Learning Paradigms

Sassi¹,

Anter²,

Bekkhoucha³

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Samir Anter

Adaptation of Classical Machine Learning Algorithms to Big Data Context: Problems and Challenges : Case Study: Hidden Markov Models Under Spark

A spark-based parallel distributed posterior decoding algorithm for big data hidden Markov models decoding problem

A Study on Big Data Frameworks and Machine Learning Tool Kits

Towards a Data Quality Assessment in Big Data

An Overview of Big Data and Machine Learning Paradigms

Contact Info

Product

Resources

About