When studying large multiparametric databases with very heterogeneous parameters (microbiological, chemical, and physicochemical), covering a wide and heterogeneous area, the probability of observing extreme values (Z-score > 2.5) is high. The information carried by these few samples monopolizes a large part of the information conveyed by the entire database. The study of the spatial structure of the data and the identification of the mechanisms responsible for the water quality are then strongly degraded. Data transformation can be proposed to overcome these problems. This study deals with a database of 8110 groundwater analyses (Occitanie region, France), on which the bacteriological load was measured in Escherichia coli and Enterococci, in addition to electrical conductivity, major ions, Mn, Fe, As and pH. Three modes of data conditioning were tested and compared to the treatment with raw data. The results show that log transformation is the best option, revealing a relationship between E. coli content and all the other parameters. By reducing the impact of extreme values without eliminating them, it allowed a concentration of information on the first factorial axes of the PCA, and consequently a better definition of the associated processes. The spatial structure of the principal components and their cartographic representation is improved. The conditioning of the data with the square root function led to an intermediate improvement between the logarithmic transformation and the absence of conditioning. The application of these results should allow a targeted, more efficient, and therefore, less expensive monitoring of water quality by Regional Health Agencies.
In France, the data resulting from monitoring water intended for human consumption are integrated into a national database called SISE-Eaux, a useful and relevant tool for studying the quality of raw and distributed water. A previous study carried out on all the data from the Provence-Alpes-Côte d’Azur (PACA) region in south-eastern France (1061 sampling points, 5295 analyses and 15 parameters) revealed that the dilution of the information in a heterogeneous environment constitutes an obstacle to the analysis of ongoing processes that are sources of variability. In this article, cross-referencing this information with the compartmentalization into groundwater bodies (MESO) provides a hydrogeological constraint on the dataset that can help to better define more homogeneous subsets and improve the interpretation. The approach involves three steps: (1) A principal component analysis conducted on the whole dataset aimed at eliminating information redundancy; (2) an unsupervised grouping of groundwater bodies having similar sources of variability; (3) a principal component analysis carried out within the main groups and sub-groups identified, aiming to define and prioritize the sources of variability and the associated processes. The results supported by discriminant analysis and machine learning show that the grouping of MESO is the best-suited scale to study ongoing processes due to greater homogeneity. One of the eight main groups identified in PACA, corresponding to the accompanying aquifers of the main rivers, is analyzed by way of illustration. Water–rock interactions, redox processes and their effects on the release of metals, arsenic and fecal contamination along different pathways were specifically identified with varying impacts according to the subgroups. We discussed both the significance of the principal components and the mean values of the bacteriological parameters, which provide information on the causes and on the state of contamination, respectively. Based on the results from two different groups of MESO, some guidelines in terms of a strategy for resource quality monitoring are proposed.
In France, and more generally in Europe, the high number of groundwater bodies (GWB) per administrative region is an obstacle for the management and monitoring of water for human consumption by regional health agencies. Moreover, GWBs show a high spatial, temporal, physico-chemical, and bacteriological variability. The objective is to establish homogeneous groupings of GWB from the point of view of water quality and the processes responsible for this quality. In the Occitanie region in southwestern France, the cross-referencing of two databases, namely the French reference system for groundwater bodies and SISE-EAUX, provided a dataset of 8110 observations and 15 parameters distributed over 106 GWB. The 8-step approach, including data conditioning, dimensional reduction by Principal Component Analysis, and hierarchical clustering, resulted in 20 homogeneous groups of GWB over the whole region. The loss of information caused by this grouping is quantified by the evolution of the explained variance. Splitting the region into two large basins (Adour-Garonne and Rhône Méditerranée) according to the recommendations of the European community does not result in a significant additional loss of information contained in the data. A quick study of a few groups allows to highlight the specificities of each one, thus enabling targeted guidelines or recommendations for water quality management and monitoring. In the future, the method will have to be tested on the scale of large European watersheds, as well as in the context of an increase in the number of parameters.
The delineation of pollution plumes generated by household waste landfills is not easy, particularly in the case of discontinuous or intricately extending water tables, such as those developed in a fractured crystalline bedrock context. In Ouagadougou (Burkina Faso), there are many uncontrolled landfills throughout the urban area. The water table, generally located between 3 and 10 m deep, is likely to be contaminated by the leachate from these landfills. More than 1000 measurements of spontaneous potential (self-potential), referenced by GPS, have been carried out on a landfill and its immediate surroundings to the south of the urban area. The geostatistical processing by analysis of variograms and correlograms highlights an adapted prospecting technique and reliable cartography. The response seems to be mainly due to the electrochemical component with hot spots within the landfill and a plume heading towards the North-East. The distribution of the spontaneous potential seems to be controlled, not by the topography of the site, but by the fracturing of the mother rock of dominant direction 15° N, and by the mother rock/saprolite contact. Thus, the plume does not flow to the market gardening just below the landfill but rather to a residential area where monitoring of the quality of the borehole water is required.
Defining homogeneous units to optimize the monitoring and management of groundwater is a key challenge for organizations responsible for the protection of water for human consumption. However, the number of groundwater bodies (GWBs) is too large for targeted monitoring and recommendations. This study, carried out in the Provence-Alpes-Côte d’Azur region of France, is based on the intersection of two databases, one grouping together the physicochemical and bacteriological analyses of water and the other delimiting the boundaries of groundwater bodies. The extracted dataset contains 8627 measurements from 1143 observation points distributed over 63 GWB. Data conditioning through logarithmic transformation, dimensional reduction through principal component analysis, and hierarchical classification allows the grouping of GWBs into 11 homogeneous clusters. The fractions of unexplained variance (FUV) and ANOVA R2 were calculated to assess the performance of the method at each scale. For example, for the total dissolved load (TDS) parameter, the temporal variance was quantified at 0.36 and the clustering causes a loss of information with an R2 going from 0.63 to 0.4 from the scale of the sampling point to that of the GWB cluster. The results show that the logarithmic transformation reduces the effect of outliers and improves the quality of the GWB clustering. The groups of GWBs are homogeneous and clearly distinguishable from each other. The results can be used to define specific management and protection strategies for each group. The study also highlights the need to take into account the temporal variability of groundwater quality when implementing monitoring and management programs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.