Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of interest. Since the formalism of time series motifs in 2002, dozens of researchers have used them for diverse applications in many different domains. Because the obvious algorithm for computing motifs is quadratic in the number of items, more than a dozen approximate algorithms to discover motifs have been proposed in the literature. In this work, for the first time, we show a tractable exact algorithm to find time series motifs. As we shall show through extensive experiments, our algorithm is up to three orders of magnitude faster than brute-force search in large datasets. We further show that our algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, near-duplicate detection and summarization, and we consider detailed case studies in domains as diverse as electroencephalograph interpretation and entomological telemetry data mining.
Tail biting is a major welfare and economic problem for indoor pig producers worldwide. Low tail posture is an early warning sign which could reduce tail biting unpredictability. Taking a precision livestock farming approach, we used Time-of-flight 3D cameras, processing data with machine vision algorithms, to automate the measurement of pig tail posture. Validation of the 3D algorithm found an accuracy of 73.9% at detecting low vs. not low tails (Sensitivity 88.4%, Specificity 66.8%). Twenty-three groups of 29 pigs per group were reared with intact (not docked) tails under typical commercial conditions over 8 batches. 15 groups had tail biting outbreaks, following which enrichment was added to pens and biters and/or victims were removed and treated. 3D data from outbreak groups showed the proportion of low tail detections increased pre-outbreak and declined post-outbreak. Pre-outbreak, the increase in low tails occurred at an increasing rate over time, and the proportion of low tails was higher one week pre-outbreak (-1) than 2 weeks pre-outbreak (-2). Within each batch, an outbreak and a non-outbreak control group were identified. Outbreak groups had more 3D low tail detections in weeks -1, +1 and +2 than their matched controls. Comparing 3D tail posture and tail injury scoring data, a greater proportion of low tails was associated with more injured pigs. Low tails might indicate more than just tail biting as tail posture varied between groups and over time and the proportion of low tails increased when pigs were moved to a new pen. Our findings demonstrate the potential for a 3D machine vision system to automate tail posture detection and provide early warning of tail biting on farm.
This paper proposes a new approach to dynamically determine the tree span for tree kernel-based semantic relation extraction. It exploits constituent dependencies to keep the nodes and their head children along the path connecting the two entities, while removing the noisy information from the syntactic parse tree, eventually leading to a dynamic syntactic parse tree. This paper also explores entity features and their combined features in a unified parse and semantic tree, which integrates both structured syntactic parse information and entity-related semantic information. Evaluation on the ACE RDC 2004 corpus shows that our dynamic syntactic parse tree outperforms all previous tree spans, and the composite kernel combining this tree kernel with a linear state-of-the-art feature-based kernel, achieves the so far best performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.