Interactive high-quality text classification

Liu, Rey-Long

doi:10.1016/j.ipm.2007.11.002

Cited by 11 publications

(11 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The identification is based on the χ 2 (chi‐square) statistics, which is popular in TC (Himmel, Reincke, & Michelmann, 2009; Liu, 2008; Yang & Pedersen, 1997). For a term t and a category c , χ 2 ( t,c )=[N×(A×D−B×C) 2 ]/[(A+B)×(A+C)×(B+D)×(C+D)], where N is the total number of training documents, A is the number of training documents that are in c and contain t , B is the number of training documents that are not in c but contain t , C is the number of training documents that are in c but do not contain t , and D is the number of training documents that are not in c and do not contain t (Liu, 2008; Yang & Pedersen, 1997). The term‐category correlation falls into two types: positively correlated type and negatively correlated type .…”

Section: Ctfamentioning

confidence: 99%

“…To make the acceptance and rejection decisions, the classifier needs to estimate the degree of acceptance (DOA) of each document d with respect to c (e.g., similarity between d and c , or probability of d belonging to c ). However, perfect DOA estimation cannot be expected (Liu, 2008; Zhang & Callan, 2001; Arampatzis, Beney, Koster, & Weide, 2000), mainly due to the difficulties in identifying and properly encoding all helpful TC evidence with limited computational resources (e.g., memory and training documents).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Context‐based term frequency assessment for text classification

Liu

2009

J. Am. Soc. Inf. Sci.

Self Cite

View full text Add to dashboard Cite

Automatic text classification (TC) is essential for the management of information. To properly classify a document d, it is essential to identify the semantics of each term t in d, while the semantics heavily depend on context (neighboring terms) of t in d. Therefore, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. The results of the term context recognition are used to assess term frequencies of terms, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies, without needing to modify the classifiers. Moreover, CTFA is efficient, and neither huge memory nor domainspecific knowledge is required. Empirical results show that CTFA successfully enhances performance of several kinds of text classifiers on different experimental data.

show abstract

Section: Ctfamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Context‐based term frequency assessment for text classification

Liu

2009

J. Am. Soc. Inf. Sci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The test results are assessed by recall, precision and F-index [11]. Specific calculation formulas are shown as:…”

Section: Test and Analysismentioning

confidence: 99%

Patent Retrieval Method Based on Similarity Calculation of a Single Patent

Qiu¹,

Yao²,

Ji³

et al. 2017

dtcse

View full text Add to dashboard Cite

For reverse engineering or infringement avoiding design, designers have to access relevant patents efficiently and accurately based on a specific patent (referred to "the target patent" below) in hand. A retrieval method based on a single patent is proposed. A batch of patents (referred to "the preliminary patents" below) is obtained by the keywords retrieval and retrieved again using the target patent. Then the most similar patents to the target patent are acquired by similarity calculation and sorted by the values. The effect of keyword form and its location on the similarity calculation is considered. The efficiency and accuracy of the improved algorithm are verified by tests.

show abstract

“…There are other text classification techniques that have explored in other applications such as in [19][20] [21].…”

Section: Related Workmentioning

confidence: 99%

A Novel Rhetorical Structure Approach for Classifying Arabic Security Documents

Mathkour¹

2009

IJCTE

View full text Add to dashboard Cite

Abstract-Security Documents classification is aimed at securing documents from being illegally disclosed. Classifying a portion of a document as a 'secret' depends on the type of effect its disclosure will have in an organization. In this respect, Information is classified according to their critical semantic (i.e. its context or value and intended uses or audience at particular time or situation). Understanding the semantic of a document is not an easy task. The rhetorical structure theory (RST) is one of the leading theories that have been applied successfully in text processing and understanding. In this paper, we will describe a novel approach to automatically classify Arabic Security documents using RST.

show abstract

Interactive high-quality text classification

Cited by 11 publications

References 22 publications

Context‐based term frequency assessment for text classification

Context‐based term frequency assessment for text classification

Patent Retrieval Method Based on Similarity Calculation of a Single Patent

A Novel Rhetorical Structure Approach for Classifying Arabic Security Documents

Contact Info

Product

Resources

About