“…systematic collections of artificially constructed and manually annotated reference data) have long been acknowledged as suitable for fine-grained diagnosis, progress evaluation, and benchmarking (see e.g. Flickinger, Nerbonne, Sag & Wassow, 1987;Nerbonne, Netter, Diagne, Dickmann & Klein, 1993; and Sparck Jones & Galliers, 1995). Most of the available data sets, however, follow the traditional design as flat text files listing test sentences annotated with, if at all, grammaticality judgements plus, in some cases, informal section headings grouping sets of sentences according to linguistic phenomena (or sometimes application-specific criteria).…”