<span lang="EN-US">Hidden </span><span lang="IN">M</span><span lang="EN-US">arkov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark’s resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.</span>
Big Data is an extremely large amount of structured and unstructured data, gathered from a wide range of sources which often require a fast processing and real time analysis. In this new context, the performances of the traditional techniques are limited. However, to handle these bulky quantities of data, new technologies emerged, called Big Data technologies. In fact, the characteristics of Big Data made the exploration process of these data a painful task. This process is called Big Data Analytics. One of the important challenges of Big Data is to search new technologies or to improve and extend the existing platforms, infrastructures and standard techniques to manage the Big Data. Hadoop / MapReduce paradigm and the Spark framework are among the most prominent solutions for large-scale parallel distributed data processing alongside Machine Learning techniques, in particularly, Deep Learning for performing powerful statistical and predictive analysis. In this paper, we first, give an overview, a classification and a comparison of main Big Data technologies. Then, we focus in particular on Machine Learning platforms and libraries, especially those for Deep Learning. The results show that Spark is a general-purpose computation engine thanks to its very generalized solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.