The workshops in the TextGraphs series have published and promoted the synergy between the field of Graph Theory (GT) and Natural Language Processing (NLP) for over a decade. The target audience of our workshop comprises of researchers working on problems related to either Graph Theory or graphbased algorithms applied to Natural Language Processing, social media, and the Semantic Web.TextGraphs addresses a broad spectrum of research areas within NLP. This is because, besides traditional NLP applications like parsing, word sense disambiguation, semantic role labeling, and information extraction, graph-based solutions also target web-scale applications like information propagation in social networks, rumor proliferation, e-reputation, language dynamics learning, and future events prediction. Following this tradition, this year's TextGraphs also presents research from diverse topics such as lexical and computational semantics, text clustering and classification, and text compresion and summarization, to name a few.The selection process was competitive -we received 17 submissions (9 long and 8 short submissions) and accepted for presentation 8 of them (5 long and 3 short papers), resulting in the overall acceptance rate of 47%.We are pleased to have two excellent invited speakers for this year's event. We thank Gabor Melli and Maximilian Nickel for their enthusiastic acceptance of our invitation. Finally, we are thankful to the members of the program committee for their valuable and high quality reviews. All submissions have benefited from their expert feedback. Their timely contribution was the basis for accepting an excellent list of papers and making the twelfth edition of TextGraphs a success.
AbstractWe introduce a machine learning approach for the identification of "white spaces" in scientific knowledge. Our approach addresses this task as link prediction over a graph that contains over 2M influence statements such as "CTCF activates FOXA1", which were automatically extracted using open-domain machine reading. We model this prediction task using graph-based features extracted from the above influence graph, as well as from a citation graph that captures scientific communities. We evaluated the proposed approach through backtesting. Although the data is heavily unbalanced (50 times more negative examples than positives), our approach predicts which influence links will be discovered in the "near future" with a F1 score of 27 points, and a mean average precision of 68%.
IntroductionThe amount of scientific knowledge that is publicly available has increased dramatically in the past few years. For example, PubMed, a search engine of biomedical publications, 1 now indexes over 25 million papers, 17 million of which were published between 1990 and the present. This information overload yields two critical problems. First, this exceeds the human capacity to aggregate and interpret the fragments of knowledge published in these papers, which may result in existing solutions to critical problems being overlooked. Swan...