We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i.e., use of limited annotations and interpretability of extraction patterns, with the robust learning approaches proposed in representation learning. Our algorithm iteratively learns custom embeddings for both the multi-word entities to be extracted and the patterns that match them from a few example entities per category. We demonstrate that this representation-based approach outperforms three other state-of-theart bootstrapping approaches on two datasets: CoNLL-2003 and OntoNotes. Additionally, using these embeddings, our approach outputs a globally-interpretable model consisting of a decision list, by ranking patterns based on their proximity to the average entity embedding in a given class. We show that this interpretable model performs close to our complete bootstrapping model, proving that representation learning can be used to produce interpretable models with small loss in performance. This decision list can be edited by human experts to mitigate some of that loss and in some cases outperform the original model.
Coherence is a semantic property of the text to make sense to readers or listeners and is crucial for any text. Various coherence measures have been developed for assessment of discourse abilities in different clinical populations. However, the results of decades of research on coherence of speech of individuals with brain damage have yielded contradictive results. We suggest that this might be due to the different sensitivity of the methods.In this study we two measures of global coherence and five measures of local coherence on the same set of texts by healthy speakers of Russian and people with dynamic aphasia in order to find which methods allow to distinguish between the two groups and how these results correlate.The material for the study is texts from the Russian CliPS corpus which is a collection of oral retellings of the pear film by individuals with brain damage and healthy speakers of Russian language.
The goal of this study is to confirm a correlation between the asymmetric object marking in Modern Hebrew and two factors that license the marking of the referential expression its encoding, namely “referential status” and “animation”. To achieve this goal, interrogative and relative pronouns that encode the O-participant in a transitive clause in the Hebrew language, are considered. They constitute the subject of the study and justify its scientific novelty, since this type of referential expressions encoding the patient participant of the situation in Modern Hebrew has not been the subject of research until now. To conduct a quantitative and comparative analysis, the author formed an experimental Hebrew Objects Targeted Corpus, with a volume of about 49,000 words. As a result of the study, it was concluded that there is a correlation between the asymmetric object marking, referential status and animation in Modern Hebrew. The study showed that the asymmetric object marking of referents encoded by interrogative pronouns in the vast majority of cases (98%) is regulated by the animacy of the referent, while the variability of the marking of referents encoded by relative pronouns is licensed both by the animacy and the referential status of the object. A hypothesis was also put forward about the existence of an additional factor that licenses the asymmetric object marking of interrogative and relative pronouns, which lies in the area of pragmatic characteristics of the statement, in particular, the degree of topicality of the referent encoding the patient participant in the situation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.