The field of data mining and machine learning has been grown many folds from the last two decades. Almost every other problem can be solved using data mining and this becomes the most tempting part of it for the scientist and researchers all over the world. Data mining can be viewed as a process of discovering knowledge. This discovery of knowledge starts with the collection of data and ends with the acquired knowledge in the form of patterns. Data collection lays the foundation for the process of knowledge discovery. In this paper, various secondary data sources from where data can be collected for rainfall prediction are deeply studied and analyzed. Some of these authentic websites and secondary data sources are NCDC (National climate data center), Kaggle, Datahub.io, UCI machine learning repository, Earth Data etc. The data collected from these secondary data sources for rainfall prediction have been critically analyzed and compared on the parameters of Accuracy, Completeness, reliability, relevance, and timeliness.