“…The corpora of clinical records used in previous studies ranged from admission notes (Gundersen et al, 1996) to radiology or pathology reports (Aronson et al, 2007;Crammer et al, 2007;Farkas and Szarvas, 2008;Goldstein et al, 2007;Matykiewicz et al, 2006;Oleynik et al, 2017;Rizzo et al, 2015;Suominen et al, 2008;Zhang, 2008), discharge summaries (Delamarre et al, 1995;Dinwoodie and Howell, 1973;Franz et al, 2000;Friedman et al, 2004;Kevers and Medori, 2010;Kukafka et al, 2006;Larkey and Croft, 1995;Li et al, 2011;Lussier et al, 2000,0;Medori and Fairon, 2010), death certificates (Koopman et al, 2015,1) and entire medical records (Kavuluru et al, 2015;Lita et al, 2008;Morris et al, 2000;Pakhomov et al, 2006;Ruch et al, 2008), with variable structure and level of curation. Moreover, the majority of studies has been based on English texts, with the exception of particular studies in French (Kevers and Medori, 2010;Medori and Fairon, 2010;Pereira et al, 2006;Ruch et al, 2008), Spanish (Pérez et al, 2015), Italian (Chiaravalloti et al, 2014;Rizzo et al, 2015) or German (Franz et al, 2000), while information extraction from Portuguese medical texts is still emmerging (Ferreira, 2011;Rijo et al, 2014). The scope of clinical conditions comprised in each study also varied greatly, ranging from limited sets of respiratory…”