Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges of contemporary large scale high performance computing systems. In this paper we present AceMesh, a taskbased, data-driven language extension targeting legacy MPI applications. Its language features include data-centric parallelizing template, aggregated task dependence for parallel loops. These features not only relieve the programmer from tedious refactoring details but also provide possibility for structured execution of complex task graphs, data locality exploitation upon data tile templates, and reducing system complexity incurred by complex array sections. We present the prototype implementation, including task shifting, data management and communication-related analysis and transformations. The language extension is evaluated on two supercomputing platforms. We compare the performance of AceMesh with existing programming models, and the results show that NPB/MG achieves at most 1.2X and 1.85X speedups on TaihuLight and TH-2, respectively, and the Tend_lin benchmark attains more than 2X speedup on average and attain at most 3.0X and 2.2X speedups on the two platforms, respectively.