GRACE Based Water Storage Predictions in Sri Lanka
Using Machine Learning and Down-Scaled Modeling
GRACE (Gravity Recovery and Climate Experiment) is a joint mission by NASA and DLR that monitors terrestrial water storage variations. Sri Lanka — a monsoon-dependent, agriculture-driven island — is vulnerable to hydrological extremes (floods, droughts). Downscaled modeling and prediction of Liquid Water Equivalent (LWE) thickness can improve disaster preparedness and water-resource management.
In this study, GRACE data from 2004-04-18 to 2025-03-16 were filtered for Sri Lanka. Additional features were engineered using Sri Lankan district and provincial boundaries from the National Spatial Data Infrastructure (NSDI). Because GRACE's native resolution is coarse (~0.25°), a synthetic dataset simulating LWE across Sri Lanka was generated using geometric shapefiles, random coordinate generation, and linear regression to mimic realistic data retrieval patterns. The synthetic dataset was generated for the period 2025-08-10 to 2026-08-09.
Data were standardized and feature selection applied using mutual information (MI). Three tree-based models — XGBoost (XGB), Random Forest Regressor (RFR), and Extra Trees Regressor (ETR) — were trained and evaluated with and without MI-based feature selection. Models were tuned via GridSearchCV. SHAP analysis was used to explain model behavior.
Key Results
XGB (no feature selection) performed best on the validation set: Val R² = 0.9178, RMSE = 1.4572.
Statistical validation included ANOVA (F = 593.05, p < 0.001) and Tukey's post-hoc tests, confirming significant differences among model predictions. XGB predictions were closest to ETR but statistically distinct from RFR and ETR, justifying XGB as the most reliable of the three.
An MLP model was tested but showed poor generalization to regional LWE variations and was excluded from the final analysis.
This study demonstrates the first application of GRACE data for LWE downscaling in Sri Lanka using machine learning, and provides a foundation for future hydrological modeling in the region.
Synthetic dataset generation and modeling helped overcome spatial resolution limits of GRACE and enabled district-scale LWE predictions for improved water management.