Abstract. This paper presents a programmable system-on-chip implementation to be used for acceleration of computations within hidden Markov models. The high level synthesis (HLS) and "divide-and-conquer" approaches are presented for parallelization of Baum-Welch and Viterbi algorithms. To avoid arithmetic underflows, all computations are performed within the logarithmic space. Additionally, in order to carry out computations efficiently -i.e. directly in an FPGA system or a processor cache -we postulate to reduce the floating-point representations of HMMs. We state and prove a lemma about the length of numerically unsafe sequences for such reduced precision models. Finally, special attention is devoted to the design of a multiple logarithm and exponent approximation unit (MLEAU). Using associative mapping, this unit allows for simultaneous conversions of multiple values and thereby compensates for computational efforts of logarithmic-space operations. Design evaluation reveals absolute stall delay occurring by multiple hardware conversions to logarithms and to exponents, and furthermore the experiments evaluation reveals HMMs computation boundaries related to their probabilities and floating-point representation. The performance differences at each stage of computation are summarized in performance comparison between hardware acceleration using MLEAU and typical software implementation on an ARM or Intel processor. to the network, and as the operation of the network is affected by external factors, real time computation is not guaranteed. Hence, for low latency applications, full HMM processing and computation within a dedicated embedded system is a reliable choice, and this has been shown for diverse systems such as speech recognition [11][12], pattern detection [13] and for AAV/ AUV [14]. Unfortunately, the increasing complexity of HMMs is paralleled by the growing demand for computational resources, especially memory and data throughput. Hence, when considering the hardware acceleration of HMM algorithms (forward-backward, Viterbi, Baum-Welch), the size of the HMM has to be minimized while the stability of numerical calculation still has to be guaranteed. As mentioned in [15], typical calculations associated with HMM rapidly exhaust the precision of numerical representation. This is related to the necessity of performing long sequences of multiplications of probability values (which are close to zero) for state transitions or observation emissions. Therefore, key algorithms are computed within the logarithmic space instead of applying some scaling factors [16]. Decreasing the size of an HMM can be immediately achieved by reducing the precision of its numerical representation (transition and emission matrices), e.g. down to: 32 bits (single), 16 bits (half) or 8 bits (quarter). This action directly affects and restricts the maximum length of the observation sequences examined, which means that computations on longer sequences may become numerically unstable.