In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow-up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2-fold. First, we designed a detector that combs the scientific literature for grammar-based computer-generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen-papers from 19 publishers. We estimate the prevalence of SCIgen-papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer-review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.
| INTRODUCTIONScience is a cumulative process: new discoveries and developments build on the body of literature. The quality and credibility of future scientific results depend on the soundness of the past published research. It also influences the trust people place in science.And yet, despite having passed peer-review, nonsensical published papers get retracted regularly. More than 120 nonsensical papers in the field of engineering were retracted from major publishers such as IEEE and Springer (Van Noorden, 2014b). These passed peerreview, were included in conference proceedings, and distributed for a fee on the publishers' platforms. Any reader with cursory knowledge in engineering instantly notices the nonsensical nature of these papers: They were generated by SCIgen, 1 a software designed by three MIT PhD students in 2005 to "maximize amusement rather than coherence" (Ball, 2005). It takes as input authors' names and generates meaningless sentences full of technical jargon, diagrams with random data, and nonexisting references with random titles and venues. It