“…In some of the tasks like data labelling [373]- [375], [383], text classification [144], relation extraction [156], question answering [132], [179], keyphrase generation [217], etc., these models achieved even SOTA results. However, some of the recent research works exposed the brittleness of these models towards out-of-distribution inputs [456], [461], adversarial prompts [458]- [460] and inputs [425], [455], [457], [462] . For example, Liu et al [461] reported that ChatGPT and GPT-4 perform well in multiple choice question answering but struggle to answer out-of-distribution questions.…”