IntroductionThere is a vast wealth of information available in textual format that the Semantic Web cannot yet tap into: 80% of data on the Web and on internal corporate intranets is unstructured, hence analysing and structuring the data -social analytics and next generation analytics -is a large and growing endeavour. Here, the Information Extraction community could help as they specialise in mining the nuggets of information from text. Information Extraction techniques could be enhanced by annotated data or domain-specific resources. The Semantic Web community has taken great strides in making these resources available through the Linked Open Data cloud, which are now ready for uptake by the Information Extraction community. Following the previous two SWAIE workshops at EKAW 2012 and RANLP 2013 respectively, we have focused our attention on fostering awareness of how Semantic Web technologies can benefit the traditional IE and NLP communities.The workshop invited contributions around three particular topics: 1) Semantic Web-driven Information Extraction, 2) Information Extraction for the Semantic Web, and 3) applications and architectures on the intersection of Semantic Web and Information Extraction SWAIE 2014 had a number of high-quality submissions. From these, 6 high quality papers were selected.
AbstractOntologies have proven to be useful to enhance NLP-based applications such as information extraction. In the biomedical domain rich ontologies are available and used for semantic annotation of texts. However, most of them have either no or only few non-English concept labels and cannot be used to annotate non-English texts. Since translations need expert review, a full translation of large ontologies is often not feasible. For semantic annotation purpose, we propose to use the corpus to be annotated to identify high occurrence terms and their translations to extend respective ontology concepts. Using our approach, the translation of a subset of ontology concepts is sufficient to significantly enhance annotation coverage. For evaluation, we automatically translated RadLex ontology concepts from English into German. We show that by translating a rather small set of concepts (in our case 433), which were identified by corpus analysis, we are able to enhance the amount of annotated words from 27.36 % to 42.65 %.