Machine learning (ML) is increasingly useful as data increases in both volume and accessibility. Broadly, ML uses computational methods and algorithms to learn to perform tasks, such as categorisation, decision making or anomaly detection through experience and without explicit instruction. ML is most effective in situations where non-computational means or conventional algorithms are impractical or impossible, such as when the data are vast, complex, highly variable and/or full of errors [1,2]. Thus, ML is useful for analysing natural language, images, or other types of complex and messy data that are now available in ever-growing and impractically large volumes. Some ML methods are suitable for analysing explicitly ordered or timedependent data, although these tend to be less tolerant of errors or data asymmetry. Nevertheless, most data has at least an implicit order or temporal context. Selecting an appropriate ML algorithm depends on the properties of the data to analyse and the aims of the project as algorithms vary in the supervision needed, tolerable error levels, and ability to account for order or temporal context, among many other things. Using non-temporal ML algorithms may obscure but not remove the order or temporal features of the data, potentially allowing the hidden 'arrow of time' to affect performance. This research takes the first step in exploring the interaction of ML algorithms and implicit temporal representations in training data. Thus, this research addresses the suitability of ML for analysing the kind of data that is accumulating daily from every social media platform, Internet of Things device, businesses report, transport tracker or other modern data source. Two supervised ML algorithms are selected and described before experiments are run to train those ML algorithms to perform automatic classification tasks under a variety of conditions that balance volume and complexity of data. In this way, the experiments explore whether more data is always better for ML models or whether implicit temporal features of data can influence performance. The research shows that ML algorithms can be sensitive to subtle or implicit temporal context with consequences for the accuracy in classification tasks. This means that researchers should carefully consider the implications of time within their data when selecting appropriate algorithms, even when the algorithms of choice are not expected to explicitly address order or temporal context.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.