Detection of malicious networks (botnet) is becoming a major concern as they pose a serious threat to network security. But, botnet detection methods often perform very poorly in real-life datasets as the methods are not developed based on a real-life botnet dataset. A crucial reason for the detection methods not being developed based on a real-life dataset is the scarcity of large-scale, real-life botnet datasets. Due to security and privacy concerns, organizations do not publish their real-life botnet dataset. Realizing the need for a real-life large-scale botnet dataset, in this paper, we develop a simulation methodology to simulate a large-scale botnet dataset from a real-life botnet dataset. This simulation methodology is based on the Markov chain and role-mining approaches. Besides simulating the degree distribution, our simulation methodology also simulates triangles (community structures). We propose a novel scalable algorithm using parallel computing that generates large-scale botnet graphs from a small-size input dataset. To evaluate the performance of our simulation methodology, we compare our simulated graph with the original graph and with the graph simulated by the Preferential attachment (PA) algorithm based on the distributions of triangles, indegrees, and outdegrees. Results demonstrate that the distributions of the simulated graph generated by our methodology are very similar to the distributions of the original graph with minor real-life random variations. Results also demonstrate that our simulation algorithm substantially outperforms the PA algorithm in simulating the distributions of triangles and botnet subgraphs. To emphasize the accuracy of botnet simulation more, we provide a separate comparison between the botnet subgraphs of the simulated and the original graphs that demonstrates the similarity of our simulated botnet subgraphs with the original botnet subgraph. A comparison of our simulated scaled-up graph with the original graph demonstrates that our methodology preserves the triangle distribution and the botnet subgraphs of the original graph, whereas the PA algorithm fails to preserve the triangle distribution and the botnet subgraphs in the scaled-up graph.