In this article, we present an automatic semantic role labeling system in Persian consisting of two modules: argument identification for specifying argument spans and argument classification for categorizing their semantic roles. Our modules have been trained on Persian Proposition Bank in which predicate-argument information is manually added as a layer on top of Persian Dependency Treebank with about 30,000 sentences. Therefore, our system was trained on 216,871 verbal predicates and 42,386 nonverbal ones consisting of 40,813 nouns and 1,573 adjectives with 33 semantic classes. As a supervised method, we used maximum entropy for building an argument identifier that results in human-level accuracy of 99% and support vector machine for an argument classifier with an F1 of 84. Regarding both verbal and nonverbal predicates with an expanded role set, we achieved reasonable results.
Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.