To enable the full benefits from MU-MIMO (Multiuser-Multiple Input Multiple Output) and OFDMA (Orthogonal Frequency Division Multiple Access) to be achieved, the optimal use of these two technologies for a given set of network resources has been investigated in a rich body of literature. However, most of these studies have focused either on maximizing the performance of only one of these schemes, or have considered both but only for single-hop networks, in which the effect of the interference between nodes is relatively limited, thus causing the network performance to be overestimated. In addition, the heterogeneity of the nodes has not been sufficiently considered, and in particular, the joint use of OFDMA and MU-MIMO has been assumed to be always available at all nodes. In this paper, we propose a cross-layer optimization framework that considers both OFDMA and MU-MIMO for heterogeneous wireless networks. Not only does our model assume that the nodes have different capabilities, in terms of bandwidth and the number of antennas, but it also supports practical use cases in which nodes can support either OFDMA or MU-MIMO, or both at the same time. Our optimization model carefully takes into account the interactions between the key elements of the physical layer to the network layer. In addition, we consider multi-hop networks, and capture the complicated interference relationships between nodes as well as multi-path routing via multi-user transmissions. We formulate the proposed model as a Mixed Integer Linear Programming (MILP) problem, and initially model the case in which each node can selectively use either OFDMA or MU-MIMO; we then extend this to scenarios in which they are jointly used. As a case study, we apply the proposed model to sum-rate maximization and max–min fair allocation, and verify through MATLAB numerical evaluations that it can take appropriate advantage of each technology for a given set of network resources. Based on the optimization results, we also observe that when the two technologies are jointly used, more multi-user transmissions are enabled thanks to flexible resource allocation, meaning that greater use of the link capacity is achieved.