8Background: Fraud is a pervasive problem and can occur as fabrication, falsification, 9 plagiarism or theft. The scientific community is not exempt from this universal problem and 10 several studies have recently been caught manipulating or fabricating data. Current measures 11 to prevent and deter scientific misconduct come in the form of the peer-review process and on-12 site clinical trial auditors. As recent advances in high-throughput omics technologies have 13 moved biology into the realm of big-data, fraud detection methods must be updated for 14 sophisticated computational fraud. In the financial sector, machine learning and digit-preference 15 are successfully used to detect fraud. 16Results: Drawing from these sources, we develop methods of fabrication detection in 17 biomedical research and show that machine learning can be used to detect fraud in large-scale 18 omic experiments. Using the raw data as input, the best machine learning models correctly 19 predicted fraud with 84-95% accuracy. With digit frequency as input features, the best models 20 detected fraud with 98%-100% accuracy. All of the data and analysis scripts used in this project 21 are available at https://github.com/MSBradshaw/FakeData. 22Conclusions: Using digit frequencies as a generalized representation of the data, multiple 23 machine learning methods were able to identify fabricated data with near perfect accuracy. 24 25