Part-of-speech tagging using decision trees

Màrquez, Lluı́s; Rodríguez, Horacio

doi:10.1007/bfb0026668

Cited by 55 publications

(22 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Book and audio book consist of a total number of 40460 tokens (number of words) and 6117 types (number of unique words). The distribution of single word classes and bi-gram word class combinations occurring in the (audio) book were analysed and compared to a number of German reference corpora [59], and in addition, other German novels, by applying part-of-speech (POS) tagging [60][61][62] as implemented in the python library spaCy [63]. The similarities, or dissimilarities respectively, of all distributions are visualized using multi-dimensional scaling (MDS) [64][65][66][67].…”

Section: Speech Stimuli and Natural Language Text Datamentioning

confidence: 99%

Analysis of continuous neuronal activity evoked by natural speech with computational corpus linguistics methods

Schilling

Tomasello

Henningsen-Schomers

et al. 2020

Preprint

View full text Add to dashboard Cite

In the field of neurobiology of language, neuroimaging studies are generally based on stimulation paradigms consisting of at least two different conditions. Depending on the desired evaluation, these conditions, in turn, have to contain dozens of items to achieve a good signal to noise ratio. Designing those paradigms can be very time-consuming. Subsequently, a group of participants is stimulated with the new paradigm, while brain activity is assessed, e.g. with EEG/MEG. The measured data are then pre-processed and finally contrasted according to the different stimulus conditions. In this way, only a limited number of analyses and hypothesis tests can be performed, while for alternative or further analyses, completely new paradigms usually need to be designed. This traditional approach is necessarily data-limited, and the cost-benefit ratio is therefore rather poor. In contrast, in computational linguistics analyses are based on text corpora, which allow a vast variety of hypotheses to be tested by repeatedly re-evaluating the data set. Furthermore, text corpora also allow exploratory data analysis in order to generate new hypotheses. By combining the two approaches, we here present a unified approach of continuous natural speech and MEG to generate a corpuslike database of speech-evoked neuronal activity.

show abstract

Section: Speech Stimuli and Natural Language Text Datamentioning

confidence: 99%

Analysis of continuous neuronal activity evoked by natural speech with computational corpus linguistics methods

Schilling

Tomasello

Henningsen-Schomers

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…In principle, however, it would be possible to produce a complete tagger on the basis of a learned statistical decision tree. Recently, this approach has indeed been explored (Marquez & Rodriguez, 1998). (Rumelhart et al 1986) are the most popular neural network architecture.…”

Section: Discussionmentioning

confidence: 99%

Syntactic Wordclass Tagging

Halteren¹

1999

Text, Speech and Language Technology

View full text Add to dashboard Cite

“…Márquetz et al [38] develop decision tree tagger for English POS tagging. They use non-incremental supervised learning from examples of TDIDT (Top Down Induction of Decision Tree) to construct the decision tree.…”

Section: Decision Tree (Dt) Modelmentioning

confidence: 99%

Classifiers combination to arabic morphosyntactic disambiguation

Albared

Omar

Aziz

2009

2009 International Conference on Electrical Engineering and Informatics

View full text Add to dashboard Cite

Parts of speech tagging forms the important preprocessing step in many of the natural language processing applications like text summarization, question answering and information retrieval system. MorphoSyntactic disambiguation (part of speech tagging) is the process of classifying every word in a given context to its appropriate part of speech. In this paper, we first review all the supervised machine learning approaches that have been used in the part of speech tagging. Then we review all the Arabic works to compare and to confirm our need to develop an accurate and efficient Arabic MorphoSyntactic Disambiguation system. Finally we propose a classifiers combination experimental framework for Arabic part of speech tagger in which three diverse probabilistic classifiers (Hidden Markov, Maximum Entropy and Transformation Based Learning) are combined using many different combination strategies to exploit their advantages Keywords-natural language processing, MorphoSyntactic disambiguation, machine learning.

show abstract

Part-of-speech tagging using decision trees

Cited by 55 publications

References 13 publications

Analysis of continuous neuronal activity evoked by natural speech with computational corpus linguistics methods

Analysis of continuous neuronal activity evoked by natural speech with computational corpus linguistics methods

Syntactic Wordclass Tagging

Classifiers combination to arabic morphosyntactic disambiguation

Contact Info

Product

Resources

About