Ultra-densification, millimeter wave (mmW) communications, and proactive network-edge caching, utilized within mmW fog networks (mmFNs), are foreseen to provide tangible gains for broadband access, network capacity, and latency. However, caching implementation in mmFN imposes high capital expenditure (CAPEX) due to the ultra-high density of base stations (BSs). For a given caching CAPEX, it may be more efficient to install higher capacity caches in a fraction of the BSs than installing smaller capacity caches in every BSs. In the former case, wireless self-backhauling of mmW systems can be exploited to share the cache contents stored in a given cache enabled BSs (CE-BSs) with other BSs in the network. In this regards, this paper develops a mathematical model, based on stochastic geometry, to study the tradeoff between the cache size and intensity of CE-BSs on the probability that requested popular contents are retrieved from the network edge, denoted as the hit probability. Assuming a power-law inverse relationship between the cache size and intensity of CE-BSs, an optimization problem is formulated and solved for the intensity of CE-BSs and probabilistic file placement in caches such that the hit probability is maximized. The results show that neither installing small caches in every BS nor having sufficiently high capacity caches (i.e., that confine all popular files) installed in small number of BSs exploit the full potential of mmFN. Instead, there exists an optimal balance between the cache size and intensity of CE-BSs, which depends on the network parameters such as the applied caching strategy, required rate, total intensity of BSs, popular content distribution, and cache size/intensity relationship.