“…Selection of n-grams as fragments of NP chunks that can form part of multiple token concepts. For this task, we formed the PoS-patterns based on Penn Treebank tagset 10 , which were inherited from the patterns for multiword expression detection introduced in [4] and expanded here resulting in the following set: P = {N N, J N, V N, N J, J J, V J, N of N, N of DT N, N of J, N of DT J, N of V, N of DT V, CD N, CD J}, where N stands for "noun", i.e., NN|NNS|NNP|NNPS, J stands for "adjective", i.e., JJ|JJR|JJS, V -"verb" but limited to VBD|VBG|VN, CD -"cardinal number", DT -"determiner", and "of" is an exact pronoun. Each pattern matches an n-gram with two open-class lexical items and at most two auxiliary tokens between them.…”