In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: system failures, bottlenecks due to limited communication bandwidth, latency due to straggler nodes, etc. On the other hand, these systems enjoy abundance of redundancy -a vast number of computing nodes and large storage capacity. There have been recent results that demonstrate the impact of coding for efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks in homogeneous clusters. In this paper, we focus on general heterogeneous distributed computing clusters consisting of a variety of computing machines with different capabilities. We propose a coding framework for speeding up distributed computing in heterogeneous clusters by trading redundancy for reducing the latency of computation. In particular, we propose Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal for a broad class of processing time distributions. Moreover, we show that HCMM is unboundedly faster than uncoded schemes that partition the total work load among the workers. To demonstrate how the proposed HCMM scheme can be applied in practice, we provide numerical results demonstrating significant speedups of up to 90% and 35% for HCMM in comparison to the "uncoded" and "coded homogeneous" schemes, respectively. Furthermore, we carry out real experiments over Amazon EC2 clusters that corroborate our numerical studies, where HCMM is found to be up to 17% faster than the uncoded scheme. Additionally, our observation is that machines rarely become stragglers and when they do, they continue to exhibit slower performance for sometime. In our worst case experiments with artificial stragglers, HCMM provides speedups of up to 12× over the uncoded scheme. Furthermore, we provide a generalization of the problem of optimal load allocation for heterogeneous clusters to scenarios with budget constraints and develop a heuristic algorithm for efficient load allocation. In the end, we discuss about the decoding complexity and describe how LDPC codes can be combined with HCMM in order to control the complexity of decoding as the problem size increases.