Design Issues of Big Data Parallelisms

Mondal, Koushik

doi:10.1007/978-81-322-2752-6_20

Cited by 7 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moraes and Martínez (2015) considered data science knowledge generalization and distillation out of different elements, techniques, and theories from interdisciplinary fields to create new knowledge products. Mondal (2016) considered data science as big data modeling, mainly through applying computation, statistical analysis, and visualization, to gain insights into data. As can be seen, despite the differences, all of the definitions above derived from the core of data science generate knowledge from data to support decision making and indicate that data science is the principles, techniques, and methods around this core.…”

Section: Data Science and Data Scientistmentioning

confidence: 99%

What should we teach? A human‐centered data science graduate curriculum model design for iField schools

Sun

et al. 2022

Asso for Info Science & Tech

View full text Add to dashboard Cite

The information schools, also referred to as iField schools, are leaders in data science education. This study aims to develop a data science graduate curriculum model from an information science perspective to support iField schools in developing data science graduate education. In June 2020, information about 96 data science graduate programs from iField schools worldwide was collected and analyzed using a mixed research method based on inductive content analysis. A wide range of data science competencies and skills development and 12 knowledge topics covered by the curriculum were obtained. The humanistic model is further taken as the theoretical and methodological basis for course model construction, and 12 course knowledge topics are reconstructed into 4 course modules, including (a) data-driven methods and techniques; (b) domain knowledge; (c) legal, moral, and ethical aspects of data; and (d) shaping and developing personal traits, and human-centered data science graduate curriculum model is formed. At the end of the study, the wide application prospect of this model is discussed.

show abstract

Section: Data Science and Data Scientistmentioning

confidence: 99%

What should we teach? A human‐centered data science graduate curriculum model design for iField schools

Sun

et al. 2022

Asso for Info Science & Tech

View full text Add to dashboard Cite

show abstract

“…The authors in [5] crafted detailed deliberations on different IoT-driven engineering fields and their use-cases with the ever-increasing demands of the application areas. Authors, in [6], presented different issues in designing big data models in parallel environments. Nonstandard machine learning models provide more fruitful results in different big data domains as presented in [7].…”

Section: Review Of Literaturementioning

confidence: 99%

Different hybrid machine intelligence techniques for handling IoT‐based imbalanced data

Mohindru

Mondal

Banka

2021

CAAI Trans on Intel Tech

Self Cite

View full text Add to dashboard Cite

In the era of automatic task processing or designing complex algorithms, to analyse data, it is always pertinent to find real-life solutions using cutting-edge tools and techniques to generate insights into the data. The data-driven machine learning models are now offering more or less worthy results when they are certainly balanced in the input data sets. Imbalanced data occurs when an unequal distribution of classes occurs in the input datasets. Building a predictive model on the imbalanced data set would cause a model that appears to yield high accuracy but does not generalize well to the new data in the minority class. Now the time has come to look into the datasets which are not so-called 'balanced' in nature but such datasets are generally encountered frequently in a workspace. To prevent creating models with false levels of accuracy, the imbalanced data should be rearranged before creating a predictive model. Those data are, sometimes, voluminous, heterogeneous and complex in nature and generate from different autonomous sources with distributed and decentralized control. The driving force is to efficiently handle these data sets using latest tools and techniques for research and commercial insights. The present article provides different such tools and techniques, in different computing frameworks, to handle such Internet of Things and other related datasets to review common techniques for handling imbalanced data in data ecosystems and offers a comparative data modelling framework in Keras for balanced and imbalanced datasets.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

show abstract

“…Intel DAAL-induced gradient boosting can achieve 6.5 times faster results, in comparison with XGBoost library, on the same training data. The general gradient-boosting decision tree algorithm is computationally intensive and resource expensive [17] when it is dealing with large datasets and continuous features. Intel DAAL provides a highly tuned implementation of gradient boosting algorithm for classification and regression problem domains.…”

Section: Literature Surveymentioning

confidence: 99%

Performance Analysis of Software Enabled Accelerator Library for Intel Architecture

Mohindru

Mondal

Banka

2021

Lecture Notes in Electrical Engineering

Self Cite

View full text Add to dashboard Cite

With the advancement of the General Purpose Graphics Processing Unit (GPGPU) for parallel processing, the Central Processing Unit (CPU) based architecture industry focused its development on software accelerator based performance maximizer so as to provide the parallel environment in a software way. This set of accelerator libraries support all stages of data science or data analysis pipelines viz. preprocessing, transformation, analysis, modelling, validation, and decision-making. The recent trends of large dataset based computation models are to use all the available features to create rule-bases for validation rather than feature summarization or reduction based on subset selection statistical techniques. Deep learning allows these computational models that are composed of multiple processing layers to learn and train from the large dataset with multiple levels of abstraction without reduction in features. Intel's Data Analytics Acceleration Library (Intel DAAL) enables applications to process large datasets to make faster and better predictions with available Intel processors in a lab environment. In this paper, we will discuss the enabling factors of Intel's DAAL and how it optimizes the performance in a Python computational environment with and without Anaconda frameworks.

show abstract

Design Issues of Big Data Parallelisms

Cited by 7 publications

References 10 publications

What should we teach? A human‐centered data science graduate curriculum model design for iField schools

What should we teach? A human‐centered data science graduate curriculum model design for iField schools

Different hybrid machine intelligence techniques for handling IoT‐based imbalanced data

Performance Analysis of Software Enabled Accelerator Library for Intel Architecture

Contact Info

Product

Resources

About