We examine the role of big data and machine learning in cancer research. We describe an example in cancer research where gene-level data from The Cancer Genome Atlas (TCGA) consortium is interpreted using a pathway-level model. As the complexity of computational models increases, their sample requirements grow exponentially. This growth stems from the fact that the number of combinations of variables grows exponentially as the number of variables increases. Thus, a large sample size is needed. The number of variables in a computational model can be reduced by incorporating biological knowledge. One particularly successful way of doing this is by using available gene regulatory, signaling, metabolic, or context-specific pathway information. We conclude that the incorporation of existing biological knowledge is essential for the progress in using big data for cancer research.