<span>Random forest is a machine learning algorithm that mainly built as a classification method to make predictions based on decision trees. Many machine learning approaches used random forest to perform deep analysis on different cancer diseases to understand their complex characterstics and behaviour. However, due to massive and complex data generated from such diseases, it has become difficult to run random forest using single machine. Therefore, advanced tools are highly required to run random forest to analyse such massive data. In this paper, random forest algorithm using Apache Mahout and Hadoop based software defined networking (SDN) are used to conduct the prediction and analysis on large lung cancer datasets. Several experiments are conducted to evaluate the proposed system. Experiments are conducted using nine virtual nodes. Experiments show that the implementation of random forest algorithm using the proposed work outperforms its implementation in traditional environment with regard to the execution time. Comparison between the proposed system using Hadoop based SDN and Hadoop only is performed. Results show that random forest using Hadoop based SDN has less execution time than when using Hadoop only. Furthermore, experiments reveal that the performance of implemented system achieved more efficiency regarding execution time, accuracy and reliability.</span>