Synthesizing tabular data is attracting much attention these days for various purposes. With sophisticate synthetic data, for instance, one can augment its training data. For the past couple of years, tabular data synthesis techniques have been greatly improved. Recent work made progress to address many problems in synthesizing tabular data, such as the imbalanced distribution and multimodality problems. However, the data utility of state-of-the-art methods is not satisfactory yet. In this work, we significantly improve the utility by designing our generator and discriminator based on neural ordinary differential equations (NODEs). After showing that NODEs have theoretically preferred characteristics for generating tabular data, we introduce our designs. The NODE-based discriminator performs a hidden vector evolution trajectory-based classification rather than classifying with a hidden vector at the last layer only. Our generator also adopts an ODE layer at the very beginning of its architecture to transform its initial input vector (i.e., the concatenation of a noisy vector and a condition vector in our case) onto another latent vector space suitable for the generation process. We conduct experiments with 13 datasets, including but not limited to insurance fraud detection, online news article prediction, and so on, and our presented method outperforms other state-of-the-art tabular data synthesis methods in many cases of our classification, regression, and clustering experiments.
CCS CONCEPTS• Computing methodologies → Machine learning; Neural networks.