Hydrological information analyses based on Digital Elevation Models (DEM) provide hydrological properties derived from high-resolution topographic data represented as an elevation grid. Flow direction is one of the most computationally intensive functions in the current implementation of TauDEM, a broadly used high-performance hydrological analysis software in hydrology community. Hydrologic flow direction defines a flow field on the DEM that directs flow from each grid cell to one or more of its neighbors. This is a local computation for the majority of grid cells, but becomes a global calculation for the geomorphologically motivated procedure in TauDEM to route flow across flat regions. As the resolution of DEM becomes higher, the computational bottleneck of this function hinders the use of these DEM data in large-scale studies. This paper presents an efficient parallel flow direction algorithm that identifies spatial features (e.g., flats) and reduces the number of sequential and parallel iterations needed to compute their geomorphologically motivated flow direction. Numerical experiments show that our algorithm outperformed the existing parallel D8 algorithm in TauDEM by two orders of magnitude. The new parallel algorithm exhibited desirable scalability on Stampede and ROGER supercomputers.
Data reduction is perhaps the most critical component in retrieving information from big data (i.e., petascale-sized data) in many data-mining processes. The central issue of these data reduction techniques is to save time and bandwidth in enabling the user to deal with larger datasets even in minimal resource environments, such as in desktop or small cluster systems. In this chapter, the authors examine the motivations behind why these reduction techniques are important in the analysis of big datasets. Then they present several basic reduction techniques in detail, stressing the advantages and disadvantages of each. The authors also consider signal processing techniques for mining big data by the use of discrete wavelet transformation and server-side data reduction techniques. Lastly, they include a general discussion on parallel algorithms for data reduction, with special emphasis given to parallel wavelet-based multi-resolution data reduction techniques on distributed memory systems using MPI and shared memory architectures on GPUs along with a demonstration of the improvement of performance and scalability for one case study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.