“…Same as existing efforts (Yang et al, 2015(Yang et al, , 2019), we use Precision(P), Recall(R) and F-score(F) as metrics when evaluating the performance of dropped pronoun models. Baselines We compared DiscProReco against ex-isting baselines, including: (1) MEPR (Yang et al, 2015), which leverages a Maximum Entropy classifier to predict the type of dropped pronoun before each token; (2) NRM , which employs two MLPs to predict the position and type of a dropped pronoun separately; (3) Bi-GRU, which utilizes a bidirectional GRU to encode each token in a pro-drop sentence and then makes prediction; (4) NDPR (Yang et al, 2019), which models the referents of dropped pronouns from a large context with a structured attention mechanism; (5) Transformer-GCRF (Yang et al, 2020), which jointly recovers the dropped pronouns in a conversational snippet with general conditional random fields; (6) XLM-RoBERTa-NDPR, which utilizes the pre-trained multilingual masked language model (Conneau et al, 2020) to encode the pro-drop utterance and its context, and then employs the attention mechanism in NDPR to model the referent semantics.…”