This paper makes two new observations that lead to a new heterogeneous core design. First, we observe that most serial code exhibits fine-grained heterogeneity: at the scale of tens or hundreds of instructions, regions of code fit different microarchitectures better (at the same point or at different points in time). Second, we observe that by grouping contiguous regions of instructions into blocks that are executed atomically, a core can exploit this heterogeneity: atomicity allows each block to be executed independently on its own execution backend that fits its characteristics best.Based on these observations, we propose a fine-grained heterogeneous design that combines heterogeneous execution backends into one core. Our core design, the heterogeneous block architecture (HBA), breaks the program into blocks of code, determines the best backend for each block, and specializes the block for that backend. As an initial, concrete design, we combine out-of-order, VLIW, and in-order backends, using simple heuristics to choose backends. We compare HBA to multiple baseline core designs (including monolithic out-of-order, clustered out-of-order, in-order and a state-of-the-art heterogeneous core design) and show that HBA can provide significantly better energy efficiency than all designs at similar performance. Averaged across 184 traces from a wide variety of workloads, HBA reduces core power by 36.4% and energy per instruction by 31.9% compared to a 4-wide out-of-order core. We conclude that HBA provides a flexible substrate for exploiting fine-grained heterogeneity, enabling new energy-performance tradeoff points in core design.