Machine learning-guided protein engineering is a rapidly advancing field. Despite major experimental and computational advances however, collecting protein genotype (sequence) and phenotype (function) data remains time and resource intensive. As a result, the quality and quantity of training data is often a limiting factor in developing machine learning models. Data augmentation techniques have been successfully applied to the fields of computer vision and natural language processing, however, there is a lack of such augmentation techniques for biological sequence data. Towards this end we develop nucleotide augmentation (NTA), which leverages natural nucleotide codon degeneracy to augment protein sequence data in a biologically meaningful way. As a proof of concept for protein engineering, we apply NTA to train machine learning models with benchmark data sets of protein genotype and phenotype, revealing performance gains on par and surpassing benchmarks models, even when only using a fraction of the training data. NTA also enables substantial improvements for classification tasks under heavy class imbalance. Availability and implementation: The code to use NTA and to reproduce the analyses in this study is publicly available at https://github.com/minotm/NTA