The top-k dominating (TKD) query returns the k objects that dominate the maximum number of the objects in a given dataset. Incomplete data exists in a wide spectrum of real datasets, due to device failure, privacy preservation, data loss, etc. In this paper, for the first time, we carry out a systematic study of TKD queries on incomplete data, which involves the data having some missing dimensional value(s). We formalize this problem, and propose a suite of efficient algorithms for supporting it. Our methods utilize some novel techniques, such as upper bound score pruning and bitmap binning strategy, to boost query efficiency. Extensive experiments with both real and synthetic data sets demonstrate the efficiency of our presented algorithms.
The missing values, widely existed in multivariate time series data, hinder the effective data analysis. Existing time series imputation methods do not make full use of the label information in real-life time series data. In this paper, we propose a novel semi-supervised generative adversarial network model, named SSGAN, for missing value imputation in multivariate time series data. It consists of three players, i.e., a generator, a discriminator, and a classifier. The classifier predicts labels of time series data, and thus it drives the generator to estimate the missing values (or components), conditioned on observed components and data labels at the same time. We introduce a temporal reminder matrix to help the discriminator better distinguish the observed components from the imputed ones. Moreover, we theoretically prove that, SSGAN using the temporal reminder matrix and the classifier does learn to estimate missing values converging to the true data distribution when the Nash equilibrium is achieved. Extensive experiments on three public real-world datasets demonstrate that, SSGAN yields a more than 15% gain in performance, compared with the state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.