Summary
Text representation is a necessary and primary procedure in performing text classification (TC), which first needs to be obtained through an information‐rich term weighting scheme to achieve higher TC performance. So far, term frequency‐inverse document frequency (TF‐IDF) is the most widely used term weighting scheme, but it suffers from two deficiencies. First, the global weighting factors IDF in TF‐IDF approaches infinity if a certain term does not occur in a text. Second, the IDF is equal to zero if a certain term appears in any text. To offset these drawbacks, we first conduct an in‐depth analysis of the current term weighting schemes, and subsequently, an improved term weighting scheme called term frequency‐inverse exponential frequency (TF‐IEF) and its various variants are proposed. The proposed method replaces IDF with the new global weighting factor IEF to characterize the global weighting factor log‐like IDF in the corpus, which can greatly reduce the effect of feature (term) with high local weighting factor TF in term weighting. As a result, a more representative feature can be generated. We carried out a series of experiments on two commonly used data sets (corpora) utilizing Naïve Bayes and support vector machine classifiers to validate the performance of our proposed schemes. Experimental results explicitly reveal that the proposed term weighting schemes come with better performance than the compared schemes.