With the advent of Internet and computing, we entered into an era with more people exchanging information over the Internet using devices like desktops, laptops, tablets, mobile phones, and similar data transmitting and receiving gadgets. This was a host centric communication approach. Internet of Things (IoT) is the next stage of technological advancement in computation, networking and communication with physical objects around the world getting connected to the network and exchanging data. This is an information centric approach. Thus it can be defined as an expanding physical network of dynamically increasing physical objects. The objects share information derived from their environments, reliably, and securely over the communication medium leveraging multiple protocols. The protocols involved are getting standardized to address the compatibility and interoperability issues. Each object connected is uniquely identified and controlled in the network. IoT finds its applications in many fields as environment monitoring, logistics, health care, automobile, controlled industrial environment, smart cities, and many more. As the devices, their cardinality and alignment, the data type, the data rate and specifics of the domain of the IoT applications vary; there is also a need to define architectures that incorporate the devices, communication mediums, storage, and analysis capabilities and the consumers of the derived output. Advancements in data analytics, machine learning, and digital technologies offer possibilities of derive meaningful intelligence for actionable output and in creating useful and context aware applications. This article is categorized under: Algorithmic Development > Web Mining Application Areas > Internet Application Areas > Data Mining Software Tools Application Areas > Industry Specific Applications
In the era of automatic task processing or designing complex algorithms, to analyse data, it is always pertinent to find real-life solutions using cutting-edge tools and techniques to generate insights into the data. The data-driven machine learning models are now offering more or less worthy results when they are certainly balanced in the input data sets. Imbalanced data occurs when an unequal distribution of classes occurs in the input datasets. Building a predictive model on the imbalanced data set would cause a model that appears to yield high accuracy but does not generalize well to the new data in the minority class. Now the time has come to look into the datasets which are not so-called 'balanced' in nature but such datasets are generally encountered frequently in a workspace. To prevent creating models with false levels of accuracy, the imbalanced data should be rearranged before creating a predictive model. Those data are, sometimes, voluminous, heterogeneous and complex in nature and generate from different autonomous sources with distributed and decentralized control. The driving force is to efficiently handle these data sets using latest tools and techniques for research and commercial insights. The present article provides different such tools and techniques, in different computing frameworks, to handle such Internet of Things and other related datasets to review common techniques for handling imbalanced data in data ecosystems and offers a comparative data modelling framework in Keras for balanced and imbalanced datasets.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
With the advancement of the General Purpose Graphics Processing Unit (GPGPU) for parallel processing, the Central Processing Unit (CPU) based architecture industry focused its development on software accelerator based performance maximizer so as to provide the parallel environment in a software way. This set of accelerator libraries support all stages of data science or data analysis pipelines viz. preprocessing, transformation, analysis, modelling, validation, and decision-making. The recent trends of large dataset based computation models are to use all the available features to create rule-bases for validation rather than feature summarization or reduction based on subset selection statistical techniques. Deep learning allows these computational models that are composed of multiple processing layers to learn and train from the large dataset with multiple levels of abstraction without reduction in features. Intel's Data Analytics Acceleration Library (Intel DAAL) enables applications to process large datasets to make faster and better predictions with available Intel processors in a lab environment. In this paper, we will discuss the enabling factors of Intel's DAAL and how it optimizes the performance in a Python computational environment with and without Anaconda frameworks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.