The Fork-Join task graph is one of the basic modeling structures for parallel processing. However, many previous scheduling algorithms ignore to economize processors and minimize the total completion time. What's more, many algorithms don't consider the competition caused by bus-based clusters and the heterogeneous of processors in real applications. This paper presents a new algorithm for Fork-Join task graph, considering economy of processors and minimization of the total completion time, the non-parallel communication, and heterogeneous environment as well. We propose a task scheduling algorithm based on task duplication which randomly generated a number of Fork-Join task graphs by producing the task execution time and communication time. Simulation results show that the proposed algorithm has less total completion time and less number of processors than other compared algorithms for more practical applications.