This article presents the main results of a project, which explored ways to recognize and classify a narrative feature-speech, thought, and writing representation (ST&WR)-automatically, using surface Information and methods of computational linguistics. The task was to detect and distinguish four typesdirect, free indirect, indirect, and reported ST&WR-in a corpus of manually annotated German narrative texts. Rule-based as well as machine-leaming methods were tested and compared. The results were best for recognizing direct ST&WR (best Fl score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and offen gave the most accurate results. Machine leaming was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machineleaming methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.
This contribution presents the newest version of our 'Wortverbindungsfelder' (fields of multi-word expressions), an experimental lexicographic resource that focusses on aspects of MWEs that are rarely addressed in traditional descriptions: Contexts, patterns and interrelations. The MWE fields use data from a very large corpus of written German (over 6 billion word forms) and are created in a strictly corpus-based way. In addition to traditional lexicographic descriptions, they include quantitative corpus data which is structured in new ways in order to show the usage specifics. This way of looking at MWEs gives insight in the structure of language and is especially interesting for foreign language learners.
The paper explores factors that influence the distribution of constituent words of compounds over the head and modifier position. The empirical basis for the study is a large database of German compounds, annotated with respect to the morphological structure of
the compound and the semantic category of the constituents. The study shows that the polysemy of the constituent word, its constituent family size, and its semantic category account for tendencies of the constituent word to occur in either modifier or head position. Furthermore, the paper
explores the degree to which the semantic category combination of head and modifier word, e.g., x=substance and y=artifact, indicates the semantic relation between the constituents, e.g., y_consists_of_x.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.