“…In flat classification (e.g., FIGER (Ling and Weld, 2012), Attentive Encoder (Shimaoka et al, 2016;Shimaoka et al, 2017)), the task is formalized as a flat multiclass multilabel problem. In local classification (Gillick et al, 2014;Yosef et al, 2012;Yogatama et al, 2015), a separate local classifier is learned for each node of the hierarchy. In both approaches, some form of postprocessing is necessary to make the decisions consistent, e.g., an entity can only be a celebrity if they are also a person.…”