A major impediment to the application of deep learning to real-world problems is the scarcity of labeled data. Small training sets are in fact of no use to deep networks as, due to the large number of trainable parameters, they will very likely be subject to overfitting phenomena. On the other hand, the increment of the training set size through further manual or semi-automatic labellings can be costly, if not possible at times. Thus, the standard techniques to address this issue are transfer learning and data augmentation, which consists of applying some sort of "transformation" to existing labeled instances to let the training set grow in size. Although this approach works well in applications such as image classification, where it is relatively simple to design suitable transformation operators, it is not obvious how to apply it in more structured scenarios. Motivated by the observation that in virtually all application domains it is easy to obtain unlabeled data, in this paper we take a different perspective and propose a label augmentation approach. We start from a small, curated labeled dataset and let the labels propagate through a larger set of unlabeled data using graph transduction techniques. This allows us to naturally use (second-order) similarity information which resides in the data, a source of information which is typically neglected by standard augmentation techniques. In particular, we show that by using known game theoretic transductive processes we can create larger and accurate enough labeled datasets which use results in better trained neural networks. Preliminary experiments are reported which demonstrate a consistent improvement over standard image classification datasets.
In this paper we analyze the practical implications of Szemerédi's regularity lemma in the preservation of metric information contained in large graphs. To this end, we present a heuristic algorithm to find regular partitions. Our experiments show that this method is quite robust to the natural sparsification of proximity graphs. In addition, this robustness can be enforced by graph densification.The aim of this work is to analyze the ideal density regime where the regularity lemma can find useful applications. In particular, we use the regularity lemma to reduce an input graph and we then exploit the key lemma to obtain an expanded version which preserves some topological properties of the original graph. If we are out of the ideal density regime, we have to densify the graph before applying the regularity lemma. Among the many topological measures we test the effective resistance (or equivalently the scaled commute time), one of the most important metrics between the vertices in the graph, which has been very recently questioned. In [12] it is argued that this measure is meaningless for large graphs. However, recent experimental results show that the graph can be pre-processed (densified) to provide some informative estimation of this metric [5] [4]. Therefore, in this paper, we analyze the practical implications of the key lemma in the estimation of commute time in large graphs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.