“…The identification is based on the χ 2 (chi‐square) statistics, which is popular in TC (Himmel, Reincke, & Michelmann, 2009; Liu, 2008; Yang & Pedersen, 1997). For a term t and a category c , χ 2 ( t,c )=[N×(A×D−B×C) 2 ]/[(A+B)×(A+C)×(B+D)×(C+D)], where N is the total number of training documents, A is the number of training documents that are in c and contain t , B is the number of training documents that are not in c but contain t , C is the number of training documents that are in c but do not contain t , and D is the number of training documents that are not in c and do not contain t (Liu, 2008; Yang & Pedersen, 1997). The term‐category correlation falls into two types: positively correlated type and negatively correlated type .…”