Virtualizing Hadoop clusters provides many benefits, including rapid deployment, on-demand elasticity and secure multi-tenancy. However, a simple migration of Hadoop to a virtualized environment does not fully exploit these benefits. The dual role of a Hadoop worker, acting as both a compute node and a data node, makes it difficult to achieve efficient IO processing, maintain data locality, and exploit resource elasticity in the cloud. We find that decoupling per-node storage from its computation opens up opportunities for IO acceleration, locality improvement, and on-the-fly cluster resizing. To fully exploit these opportunities, we propose StoreApp, a shared storage appliance for virtual Hadoop worker nodes co-located on the same physical host. To completely separate storage from computation and prioritize IO processing, StoreApp pro-actively pushes intermediate data generated by map tasks to the storage node. StoreApp also implements late-binding task creation to take the advantage of prefetched data due to mis-aligned records. Experimental results show that StoreApp achieves up to 61% performance improvement compared to stock Hadoop and resizes the cluster to the (near) optimal degree of parallelism.