“…We re-use features that are commonly used for mention pair classification (see e.g., [23], [4]), including grammatical type and subtypes, string and substring matches, apposition and copula, distance (number of separating mentions/sentences/words), gender and number match, synonymy/hypernym and animacy (based on WordNet), family name (based on closed lists), named entity types, syntactic features and anaphoricity detection. Evaluation metrics The systems' outputs are evaluated using the three standard coreference resolution metrics: MUC [29], B 3 [2], and Entity-based CEAF (or CEAF e ) [20]. Following the convention used in CoNLL-2012, we report a global F1-score (henceforth, CoNLL score), which corresponds to an unweighted average of the MUC, B 3 and CEAF e F1 scores.…”