Scientific workflows benefit from the cloud computing paradigm, which offers access to virtual resources provisioned on pay-as-you-go and on-demand basis. Minimizing resources costs to meet user's budget is very important in a cloud environment. Several optimization approaches have been proposed to improve the performance and the cost of data-intensive scientific Workflow Scheduling (DiSWS) in cloud computing. However, in the literature, the majority of the DiSWS approaches focused on the use of heuristic and metaheuristic as an optimization method. Furthermore, the tasks hierarchy in data-intensive scientific workflows has not been extensively explored in the current literature. Specifically, in this paper, a data-intensive scientific workflow is represented as a hierarchy, which specifies hierarchical relations between workflow tasks, and an approach for data-intensive workflow scheduling applications is proposed. In this approach, first, the datasets and workflow tasks are modeled as a conditional probability matrix (CPM). Second, several data transformation and hierarchical clustering are applied to the CPM structure to determine the minimum number of virtual machines needed for the workflow execution. In this approach, the hierarchical clustering is done with respect to the budget imposed by the user. After data transformation and hierarchical clustering, the amount of data transmitted between clusters can be reduced, which can improve cost and makespan of the workflow by optimizing the use of virtual resources and network bandwidth. The performance and cost are analyzed using an extension of Cloudsim simulation tool and compared with existing multi-objective approaches. The results demonstrate that our approach reduces resources cost with respect to the user budgets.