“…For instance, the clustering quality in [12] is evaluated using the purity metric. In [17], the accuracy of the solutions, retrieved from a case-base, in response to a query is estimated according to the average word similarity score and the mean average precision (MAP). The performance of other applications, such as text categorization [16,15], sentiment analysis [12], named entity recognition [14], is estimated using the standard measures of precision, recall and F1-score, the latter being a harmonic mean that balances precision and recall [47].…”