Data mining and machine learning techniques for processing raster data consider a single spectral band of data at a time. The individual results are combined to obtain the final output. The essence of related multi-spectral information is lost when the bands are considered independently. The proposed platform is based on Apache Hadoop ecosystem and supports performing analysis on large amounts of multispectral raster data using MapReduce. A novel technique of transforming the spectral space to the geometrical space is also proposed. The technique allows to consider multiple bands coherently. The results of clustering 10 6 pixels for multiband imagery with widely used GIS software have been tested and other machine learning methods are planned to be incorporated in the platform. The platform is scalable to support tens of spectral bands. The results from our platform were found to be better and are also available faster due to application of distributed processing.
As stated in literature by several authors, there has been literally big-bang explosion in data acquired in recent times. This is especially so about the geographical or geospatial data. The huge volume of data acquired in different formats, structured, unstructured ways, having large complexity and non-stop generation of these data have posed an insurmountable challenge in scientific and business world alike. The conventional tools, techniques and hardware existing about a decade ago have met with the limitations in handling such data. Hence, such data are termed as big data. This has necessitated inventing new software tools and techniques as well as parallel computing hardware architectures to meet the requirement of timely and efficient handling of the big data. The field of data mining has been benefitted from these evolutions as well. This article reviews the evolution of data mining techniques over last two decades and efforts made in developing big data analytics, especially as applied to geospatial big data. This is still a very actively evolving field. There will be no surprise if some new techniques are published before this article appears in print.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.