The rapid emergence of low-power embedded devices and modern machine learning (ML) algorithms has created a new Internet of Things (IoT) era where lightweight ML frameworks such as TinyML have created new opportunities for ML algorithms running within edge devices. In particular, the TinyML framework in such devices aims to deliver reduced latency, efficient bandwidth consumption, improved data security, increased privacy, lower costs and overall network cost reduction in cloud environments. Its ability to enable IoT devices to work effectively without constant connectivity to cloud services, while nevertheless providing accurate ML services, offers a viable alternative for IoT applications seeking cost-effective solutions. TinyML intends to deliver on-premises analytics that bring significant value to IoT services, particularly in environments with limited connection. This review article defines TinyML, presents an overview of its benefits and uses and provides background information based on up-to-date literature. Then, we demonstrate the TensorFlow Lite framework which supports TinyML along with analytical steps for an ML model creation. In addition, we explore the integration of TinyML with network technologies such as 5G and LPWAN. Ultimately, we anticipate that this analysis will serve as an informational pillar for the IoT/Cloud research community and pave the way for future studies.
In this work, we introduce an innovative Markov Chain Monte Carlo (MCMC) classifier, a synergistic combination of Bayesian machine learning and Apache Spark, highlighting the novel use of this methodology in the spectrum of big data management and environmental analysis. By employing a large dataset of air pollutant concentrations in Madrid from 2001 to 2018, we developed a Bayesian Logistic Regression model, capable of accurately classifying the Air Quality Index (AQI) as safe or hazardous. This mathematical formulation adeptly synthesizes prior beliefs and observed data into robust posterior distributions, enabling superior management of overfitting, enhancing the predictive accuracy, and demonstrating a scalable approach for large-scale data processing. Notably, the proposed model achieved a maximum accuracy of 87.91% and an exceptional recall value of 99.58% at a decision threshold of 0.505, reflecting its proficiency in accurately identifying true negatives and mitigating misclassification, even though it slightly underperformed in comparison to the traditional Frequentist Logistic Regression in terms of accuracy and the AUC score. Ultimately, this research underscores the efficacy of Bayesian machine learning for big data management and environmental analysis, while signifying the pivotal role of the first-ever MCMC Classifier and Apache Spark in dealing with the challenges posed by large datasets and high-dimensional data with broader implications not only in sectors such as statistics, mathematics, physics but also in practical, real-world applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.