Identifying the origin of groundwater samples in a multi-layer aquifer system with Random Forest classification

dentification of the origin of groundwater samples is not always possible in complex multilayered aquifers. This poses a major difficulty for a reliable interpretation of geochemical results. The problem is especially severe when the information on the tubewells design is hard to obtain. This paper shows a supervised classification method based on the Random Forest (RF) machine learning technique to identify the layer from where groundwater samples were extracted. The classification rules were based on the major ion composition of the samples. We applied this method to the Campo de Cartagena multi-layer aquifer system, in southeastern Spain. A large amount of hydrogeochemical data was available, but only a limited fraction of the sampled tubewells included a reliable determination of the borehole design and, consequently, of the aquifer layer being exploited. Added difficulty was the very similar compositions of water samples extracted from different aquifer layers. Moreover, not all groundwater samples included the same geochemical variables. Despite of the difficulty of such a background, the Random Forest classification reached accuracies over 90%. These results were much better than the Linear Discriminant Analysis (LDA) and Decision Trees (CART) supervised classification methods. From a total of 1549 samples, 805 proceeded from one unique identified aquifer, 409 proceeded from a possible blend of waters from several aquifers and 335 were of unknown origin. Only 468 of the 805 uniqueaquifer samples included all the chemical variables needed to calibrate and validate the models. Finally, 107 of the groundwater samples of unknown origin could be classified. Most unclassified samples did not feature a complete dataset. The uncertainty on the identification of training samples was taken in account to enhance the model. Most of the samples that could not be identified had an incomplete dataset.

Datos y Recursos

Este conjunto de datos no tiene datos

Metadatos

Información básica
Tipo de recurso Texto
Fecha de creación 17-09-2024
Fecha de última modificación 17-09-2024
Mostrar histórico de cambios
Identificador de los metadatos 56efc3e5-e4cf-5cac-8492-fe4bff17aa28
Idioma de los metadatos Español
Temáticas (NTI-RISP)
Categoría del conjunto de alto valor (HVD)
Categoría temática ISO 19115
Identificador alternativo DOI 10.1016/j.jhydrol.2013.07.009
URI de palabras clave
Codificación UTF-8
Información espacial
Identificador INSPIRE ESPMITECOIEPNBMMENOR540
Temas INSPIRE
Identificador geográfico Murcia
Sistema de Referencia de Coordenadas
Tipo de representación espacial
Extensión espacial
"{\"type\": \"Polygon\", \"coordinates\": [[[-2.34, 37.38], [-0.69, 37.38], [-0.69, 38.76], [-2.34, 38.76], [-2.34, 37.38]]]}"
Resolución espacial del dataset (m)
Procedencia
Declaración de linaje
Perfil de Metadatos
Conformidad
Conjunto de datos de origen
Frecuencia de actualización
Fuentes
  1. Journal of Hydrology
  2. vol 499
  3. 303-315
Propósito
Pasos del proceso
Cobertura temporal (Inicio)
Cobertura temporal (Fin)
Notas sobre la versión
Versión
Vigencia del conjunto de datos
Parte responsable
Nombre del autor Baudron, P., Alonso Sarría, F., García Aróstegui, J.L., Cánovas García, F., Martínez Vicente, D. y Moreno Brotóns, J.
Nombre del mantenedor
Identificador del autor
Email del autor paul.baudron@baudron.com
Web del autor
Identificador del mantenedor
Email del mantenedor
Web del mantenedor