2017 International Conference on Computer, Communication and Signal Processing (ICCCSP) 2017
DOI: 10.1109/icccsp.2017.7944072
|View full text |Cite
|
Sign up to set email alerts
|

Survey of pre-processing techniques for mining big data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(19 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…To solve this problem and clean the data, a double smoothing method (DSM) is utilized, in which the binning method is followed by a clustering based technique. The binning method is used for data smoothing to eliminate noisy data [20], and it consists of two main steps: (1) sorting the data and partitioning them into (equal frequency) bins; and (2) smoothing the data (the boundary based method is used in this study [21]). The goal of the clustering method is to detect and eliminate the outliers, which are considered as the most negative type of noises that can be located within the data.…”
Section: B Used Data Setmentioning
confidence: 99%
“…To solve this problem and clean the data, a double smoothing method (DSM) is utilized, in which the binning method is followed by a clustering based technique. The binning method is used for data smoothing to eliminate noisy data [20], and it consists of two main steps: (1) sorting the data and partitioning them into (equal frequency) bins; and (2) smoothing the data (the boundary based method is used in this study [21]). The goal of the clustering method is to detect and eliminate the outliers, which are considered as the most negative type of noises that can be located within the data.…”
Section: B Used Data Setmentioning
confidence: 99%
“…However, in the present paper, the authors pay an attention only on remarkable findings of techniques and technologies involved in preparing the big web data suitable to analytics. In this direction, some of the authors [10,11,12,14,27] described various methodologies discovered by both research and industry community to pre-process the weblog data efficiently in Big Data environment. According to the research works made by the authors [10,20,27,31] are endorsed that big data storage, big data cleansing, unique user identification, session identification and so on are important and crucial tasks in the big data preprocessing model.…”
Section: Related Workmentioning
confidence: 99%
“…Only the records having 200 as the status code is treated as relevant data. Many authors in the literature [10,11,12,14,20,27] are also prominently expressed that removal of failure requests, irrelevant file requests, inappropriate access method requests, web robots requests, internal dummy connection requests and irrelevant log fields to extract more relevant data for the analysis of weblog.…”
Section: Architecture Of Hadoop For Processing Of Web Logsmentioning
confidence: 99%
“…In last few years data cleaning and preprocessing is viewed as an important action and topic of research, as various supervised and unsupervised methods have been experimented for number of domains for analysis purpose. Hariharakrishnan, J. et al [15] reviewed numerous techniques of text preprocessing and highlighted the significance and multi-aspects of preprocessing such as Noise reduction, outlier identification, and inconsistent data. They raised a very logical point that most of the experiments for text preprocessing are performed either in data collection phase or for homogeneous data.…”
Section: Related Workmentioning
confidence: 99%