Thorsten Joachims scite author profile

Thorsten Joachims

5Publications

8,276Citation Statements Received

20Citation Statements Given

How they've been cited

10,214

8,222

How they cite others

Affiliations

Cornell University, TU Dortmund University, Computer Science Department

Publications

Order By: Most citations

Text categorization with Support Vector Machines: Learning with many relevant features

Joachims

1998

5,446

3,372

View full text Add to dashboard Cite

Abstract. This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs arc appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore, they are fully automatic, eliminating the need for manual parameter tuning. I n t r o d u c t i o nWith the rapid growth of online information, text categorization has become one of the key techniques for handling and organizing text data. Text categorization techniques are used to classify news stories, to find interesting information on the WWW, and to guide a user's search through hypertext. Since building text classifiers by hand is difficult and time-consuming, it is advantageous to learn classifiers from examples.In this paper I will explore and identify the benefits of Support Vector Machines (SVMs) for text categorization. SVMs are a new learning method introduced by V. Vapnik et al. [9] [1]. They are well-founded in terms of computational learning theory and very open to theoretical understanding and analysis.After reviewing the standard feature vector representation of text, I will identify the particular properties of text in this representation in section 4. I will argue that SVMs are very well suited for learning in this setting. The empirical results in section 5 will support this claim. Compared to state-of-the-art methods, SVMs show substantial performance gains. Moreover, in contrast to conventional text classification methods SVMs will prove to be very robust, eliminating the need for expensive parameter tuning. T e x t C a t e g o r i z a t i o nThe goal of text categorization is the classification of documents into a fixed number of predefined categories. Each document can be in multiple, exactly one, or no category at all. Using machine learning, the objective is to learn classifiers

show abstract

Optimizing search engines using clickthrough data

Joachims

2002

1,223

2,053

View full text Add to dashboard Cite

Training linear SVMs in linear time

Joachims

2006

1,455

1,112

View full text Add to dashboard Cite

Learning to Classify Text Using Support Vector Machines

Joachims

2002

1,178

826

View full text Add to dashboard Cite

Support vector machine learning for interdependent and structured output spaces

et al. 2004

View full text Add to dashboard Cite

Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernel-based methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs such as multiple dependent output variables and structured output spaces. We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem. We demonstrate the versatility and effectiveness of our method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.