Neuroprosthetics have demonstrated the potential to decode speech from intracranial brain signals, and hold promise for one day returning the ability to speak to those who have lost it. However, data in this domain is scarce, highly variable, and costly to label for supervised modeling. In order to address these constraints, we present brain2vec, a transformer-based approach for learning feature representations from intracranial electroencephalogram data. Brain2vec combines a self-supervised learning methodology, neuroanatomical positional embeddings, and the contextual representations of transformers to achieve three novelties: (1) learning from unlabeled intracranial brain signals, (2) learning from multiple participants simultaneously, all while (3) utilizing only raw unprocessed data. To assess our approach, we use a leave-one-participant-out validation procedure to separate brain2vec's feature learning from the holdout participant's speech-related supervised classification tasks. With only two linear layers, we achieve 90% accuracy on a canonical speech detection task, 42% accuracy on a more challenging 4-class speech-related behavior recognition, and 53% accuracy when applied to a 10-class, few-shot word classification task. Combined with visualizations of unsupervised class separation in the learned features, our results evidence brain2vec's ability to learn highly generalized representations of neural activity without the need for labels or consistent sensor location.