We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introduced information bottleneck method, which generates a more efficient word-cluster representation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Comparing the accuracy of our method with other techniques, we observe significant dependency of the results on the data set. We discuss the potential reasons for this dependency.
An investigation of the logical flexibility principles needed for a formal semantic account of coordination, plurality, and scope in natural language.
Since the early work of Montague, Boolean semantics and its subfield of generalized quantifier theory have become the model-theoretic foundation for the study of meaning in natural languages. This book uses this framework to develop a new semantic theory of central linguistic phenomena involving coordination, plurality, and scope. The proposed theory makes use of the standard Boolean interpretation of conjunction, a choice-function account of indefinites, and a novel semantics of plurals that is not based on the distributive/collective distinction. The key to unifying these mechanisms is a version of Montagovian semantics that is augmented by flexibility principles: semantic operations that have no counterpart in phonology.
This is the first book to cover these areas in a way that is both linguistically comprehensive and formally explicit. On one hand, it addresses questions of primarily linguistic concern: the semantic functions of words like and and or in different languages, the interpretation of indefinites and their scope, and the semantic typology of noun phrases and predicates. On the other hand, it addresses formal questions that are motivated by the treatment of these linguistic problems: the use of Boolean algebras in linguistics, the proper formalization of choice functions within generalized quantifier theory, and the extension of this theory to the domain of plurality. While primarily intended for readers with a background in theoretical linguistics, the book will also be of interest to researchers and advanced students in logic, computational linguistics, philosophy of language, and artificial intelligence.
The Strongest Meaning Hypothesis of Dalrymple et al (1994,1998), which was originally proposed as a principle for the interpretation of reciprocals, is extended in this paper into a general principle of plural predication. This principle applies to complex predicates that are composed of lexical predicates that hold of atomic entities, and determines the pluralities in the extension of the predicate. The meaning of such a complex predicate is claimed to be the truth-conditionally strongest meaning that does not contradict lexical properties of the simple predicates it contains. Weak interpretations of reciprocals (as in the books are stacked on top of each other), plural predicate conjunction (e.g. the books are old and new) and 'atomic' distributivity in general are derived by a unified mechanism, which 'weakens' the basic universal meanings of strong reciprocals, boolean conjunction and quantification over atomic entities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.