Hypergraphs capture multi-way relationships in data, and they have consequently seen a number of applications in higher-order network analysis, computer vision, geometry processing, and machine learning. In this paper, we develop the theoretical foundations in studying the space of hypergraphs using ingredients from optimal transport. By enriching a hypergraph with probability measures on its nodes and hyperedges, as well as relational information capturing local and global structure, we obtain a general and robust framework for studying the collection of all hypergraphs. First, we introduce a hypergraph distance based on the co-optimal transport framework of Redko et al. and study its theoretical properties. Second, we formalize common methods for transforming a hypergraph into a graph as maps from the space of hypergraphs to the space of graphs and study their functorial properties and Lipschitz bounds. Finally, we demonstrate the versatility of our Hypergraph Co-Optimal Transport (HyperCOT) framework through various examples.Recent years have seen the extension of the Gromov-Wasserstein (GW) framework-originally developed as a tool for comparing metric measure spaces [27, 28]-to probabilistic graph matching tasks [36,20,46,45,9,43,10]. The numerous benefits of this approach include computability via gradient descent [31,17] or backpropagation [44], state-of-the-art performance in tasks such as graph partitioning [45,12], and an underlying theoretical Riemannian framework [38,11]. These successes motivate the development of a GW framework for hypergraphs, which is the goal of this paper. Our contributions include: