Abstract-Suitable mapping of processes to the nodes of a massively parallel machine can substantially improve communication performance by reducing network congestion. The hop-byte metric has been used as a measure of the quality of such a mapping by several recent works. Optimizing this metric is NP hard, and thus heuristics are applied. However, the heuristics proposed so far do not directly try to optimize this metric. Rather, they use some intuitive methods for reducing congestion and use the metric just to evaluate the quality of the mapping. In fact, heuristics intending to optimize other metrics too don't directly optimize for them, but, rather, use the metric to evaluate the results of the heuristic. In contrast, we pose the mapping problem with the hop-byte metric as a quadratic assignment problem and use a heuristic to directly optimize for this metric. We evaluate our approach on realistic node allocations obtained on the Kraken system at NICS. Our approach yields values for the metric that are up to 75% lower than the default mapping and 66% lower than existing heuristics. However, the time taken to produce the mapping can be substantially more, which makes this suitable for somewhat static, though possibly irregular, communication patterns. We introduce new heuristics that reduce the time taken to be comparable to that of existing fast heuristics, while still producing mappings of higher quality than existing ones. We also use theoretical lower bounds to suggest that our mapping may be close to optimal, at least for medium sized problems. Consequently, our work can also provide insight into the tradeoff between mapping quality and time taken by other mapping heuristics.