2021
DOI: 10.1109/tkde.2019.2946162
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective

Abstract: Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amount… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
319
0
8

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 643 publications
(327 citation statements)
references
References 132 publications
(117 reference statements)
0
319
0
8
Order By: Relevance
“…As a future work, Hadoop approach able to concentrates comprehensively on both the stages: distributed data storage & parallel processing of weblog data and to leverage the strengths of techniques and technologies of individual stages [11,16,21]. In addition, the comprehensive approach is planned to test with different weblogs that cover a large spectrum of various applications, such as, web usage analysis for improvements in fraud detection, product analysis and customer segmentation.…”
Section: Conclusion and Future Scope And Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…As a future work, Hadoop approach able to concentrates comprehensively on both the stages: distributed data storage & parallel processing of weblog data and to leverage the strengths of techniques and technologies of individual stages [11,16,21]. In addition, the comprehensive approach is planned to test with different weblogs that cover a large spectrum of various applications, such as, web usage analysis for improvements in fraud detection, product analysis and customer segmentation.…”
Section: Conclusion and Future Scope And Resultsmentioning
confidence: 99%
“…However, in the present paper, the authors pay an attention only on remarkable findings of techniques and technologies involved in preparing the big web data suitable to analytics. In this direction, some of the authors [10,11,12,14,27] described various methodologies discovered by both research and industry community to pre-process the weblog data efficiently in Big Data environment. According to the research works made by the authors [10,20,27,31] are endorsed that big data storage, big data cleansing, unique user identification, session identification and so on are important and crucial tasks in the big data preprocessing model.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Option 6 -Various models of data collections for the purpose of running ML algorithms [33] could also be investigated. However, the sensitivity of this problem domain and the fact that we need the highest level of classifier's accuracy upon the training data set might not open a door to data from shared sources.…”
Section: Would Rail Companies Need This?mentioning
confidence: 99%