To align with climate initiatives, multiple reporting programs are transitioning from generic activity-based emission factors to site-specific measured emissions data to estimate greenhouse gas emissions at oil and gas facilities. This study contemporaneously deployed two top-down (TD) aerial methods across 14 midstream facilities, building upon previous research in the field. The methods produced multiple whole-facility estimates at each facility, resulting in 773 individual paired estimates (same facility, same day), and robust mean estimates for each facility. Mean estimates for each facility, aggregated across all facilities, differed by nearly 2:1 (49% [32% to 69%]). At 6 of 14 facilities, the methods produced mean estimates that differed by more than a factor of two. These data suggest that one or both methods did not produce accurate facility-level estimates at a majority of facilities and in aggregate across all facilities. The overall results are augmented with two case studies where TD estimates at two pre-selected facilities were coupled with comprehensive onsite measurements to understand the factors driving the divergence between TD and bottom-up (BU) emissions estimates. In 3 of 4 paired comparisons between the intensive onsite estimates and one of the TD methods, the intensive onsite surveys did not conclusively diagnose the difference in estimates. In these cases, our work suggests that the TD methods mis-estimate emissions an unknown fraction of the time, for unknown reasons. While two methods were selected for this study, it is unlikely that the issues identified here are confined to these two methods; similar issues may exist for other similar whole-facility methods on midstream and/or other facility types. These findings have important implications for the construction of voluntary and regulatory reporting programs that rely on emission estimates for reporting fees or penalties, or for studies using whole-facility estimates to aggregate TD emissions to basin or regional estimates.