Summary
Various online contents on Internet platforms or search engines are related to the corporate reputation. Facing the huge amount of online contents, we need a mining method that can automatically extract and analyze a large number of network‐related information and obtain the real reliability of aspect for the content claimed by companies. In this paper, we propose to generate a ranking model to verify whether the sales‐rankings claimed by companies are trustworthy. The key idea is that the company that has higher confidence score should be supported by the online media. We use a unique data set of public opinion data related with a specific company, which we supplement with data from various online news platform and retrieval webpages using a distributed and generic Web crawler. Meanwhile, basic information and open financial data of companies are also collected for auxiliary analysis. We present a Maximal Marginal Relevance‐based ranking model to compute the confidence score of each company, taking into consideration the two technologies of word embedding and KL‐Divergence to filter the irrelevant documents. Extensive experiments show that the proposed method outperforms the state‐of‐the‐art MMR‐based method, and we showcase three representative cases about the corporate reputation built by us that gives positive, neutral, and negative support respectively to the sales‐ranking claim of companies.