Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and propose in this article to use a coverage criterion to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.
Standard signal extraction results for both stationary and nonstationary time series are expressed as linear filters applied to the observed series. Computation of the filter weights, and of the corresponding frequency response function, is relevant for studying properties of the filter and of the resulting signal extraction estimates. Methods for doing such computations for symmetric, doubly infinite filters are well established. This study develops an algorithm for computing filter weights for asymmetric, semiinfinite signal extraction filters, including the important case of the concurrent filter (for signal extraction at the current time point). The setting is where the time series components being estimated follow autoregressive integrated moving-average (ARIMA) models. The algorithm provides expressions for the asymmetric signal extraction filters as rational polynomial functions of the backshift operator. The filter weights are then readily generated by simple expansion of these expressions, and the corresponding frequency response function is directly evaluated. Recursive expressions are also developed that relate the weights for filters that use successively increasing amounts of data. The results for the filter weights are then used to develop methods for computing mean squared error results for the asymmetric signal extraction estimates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.