Multivariate time series classification is a machine learning problem that can be applied to automate a wide range of real-world data analysis tasks. ROCKET proved to be an outstanding algorithm capable to classify time series accurately and quickly. The textbook variant of the multivariate time series classification problem assumes that time series to be classified are all of the same length, while in realworld applications this assumption not necessarily holds. The literature of this domain does not pay enough attention to data processing pipelines for variable-length time series. Thus, in this paper, we present a thorough analysis of three preprocessing pipelines that handle variable-length time series that need to be classified with a method that requires the data to be of equal length. These three methods are truncation, padding, and forecasting of missing value. Experiments conducted on benchmark datasets, showed that the recommended procedure involves padding. Forecasting ensures similar classification accuracy, but comes at a much higher computational cost. Truncation is not a viable option. Furthermore, in the paper, we present a novel domain of application of multivariate time series classification algorithms, that is incident detection in cash transactions. This area poses substantive challenges for automated model training procedures since the data is not only variable-length, but also heavily imbalanced. In the study, we list various incident types and present trained classifiers capable to aid human auditors in their daily work.
INDEX TERMS classification, incident detection, multivariate time series, ROCKET, varying-length time series
I. INTRODUCTIONT IME series classification has become a vital domain of machine learning. The multitude of exciting real-life applications drives the development of the field and inspires fruitful research that aim at delivering new approaches, improving the existing ones, and adapting them to new types of data. This paper focuses on the task of classifying multivariate time series of unequal lengths. That is, each multivariate series can have a different length.The study presented in this paper is related to our project that deals with incident detection in cash transaction data. The business context of the project is loss prevention in quick-service restaurant operations. In our context, an incident is when a cashier (server, employee) is under-ringing items, voiding items, or performing other intentional operations that cause money losses for the restaurant.Thanks to many years of successful cooperation with our clients in the quick-service restaurant industry, we have accumulated an adequate database of transactions, and we rely on human experts to conduct incident audits. The humanauditor-based loss prevention scheme employed at the moment is the cornerstone of the daily operations of our company. However, there is a pressing need to automate audit tasks.We want to build an automated classifier that detects operations in which a cashier intentionally mishandles t...