Automatic classification of structured data is a challenging task and its relevance to many domains is evident. However, collecting labeled data may turn to be a quite expensive task and sometimes even prone to mislabeling. A technical solution to this problem consists in combining few labeled data samples and a significant amount of unlabeled data samples to train a classifier. Likewise, the present paper deals with the classification of partially labeled tree-like structured data. To carry on this task, we suggest an adapted variant of recursive neural networks (RNNs) that is equipped with semi-supervision mechanisms capable of learning from labeled and unlabeled tree-like data. Accordingly RNNs rely on selflearning to actively pre-label data which will be combined with originally labeled one during the learning process. The semi-supervised RNNs approach is presented and evaluated on real-world eXtensible Markup Language (XML) collection of documents in the context of digital libraries. The initial empirical experiments show high quality results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.