“…Hierarchical information is important in many other applications such as food recognition [55], [25], protein function prediction [6], [7], [56], [57], [58], [59] , image annotation [60], text classification [61], [62], [63]. Some major approaches include imposing logical constraints [4], using hyperbolic embeddings [64], prototype learning [14], label smearing and soft labels, loss modifications [3], multiple learning heads for different levels of the hierarchy [5], hierarchical post-processing [65] and others [66], [67], [68].…”