Base station cooperation in heterogeneous wireless networks (HetNets) is a promising approach to improve the network performance, but it also imposes a significant challenge on backhaul. On the other hand, caching at small base stations (SBSs) is considered as an efficient way to reduce backhaul load in HetNets. In this paper, we jointly consider SBS caching and cooperation in a downlink largescale HetNet. We propose two SBS cooperative transmission schemes under random caching at SBSs with the caching distribution as a design parameter. Using tools from stochastic geometry and adopting appropriate integral transformations, we first derive a tractable expression for the successful transmission probability under each scheme. Then, under each scheme, we consider the successful transmission probability maximization by optimizing the caching distribution, which is a challenging optimization problem with a non-convex objective function. By exploring optimality properties and using optimization techniques, under each scheme, we obtain a local optimal solution in the general case and global optimal solutions in some special cases. Compared with some existing caching designs in the literature, e.g., the most popular caching, the i.i.d. caching and the uniform caching, the optimal random caching under each scheme achieves better successful transmission probability performance. The analysis and optimization results provide valuable design insights for practical HetNets.