Groundsource is an open-access global dataset of about 2.6 million high-resolution historical flood events, curated from the automated processing of over 5 million news articles across more than 150 countries (2000–present).
High-quality historical flood data is critical for disaster risk management, model training, and climate attribution. Existing global archives are limited by sparse coverage, coarse resolution, or reliance on prolonged satellite observation. Groundsource addresses this by turning unstructured news into a structured, localized event archive suitable for training predictive models, quantifying exposure, and disaster research.
Methodology
The dataset is built with a four-stage automated pipeline described in the paper:
- Raw news ingestion — Publicly available articles mentioning floods are aggregated (e.g. via topicality scoring). Article text and publication dates are extracted; geographic entities are annotated.
- LLM-based extraction (Gemini) — The text is processed to determine whether it reports a specific flood event and to extract precise dates and affected locations. Relative time references are anchored to publication date.
- Geocoding — Extracted location names are mapped to spatial polygons (WGS 84) using geocoding services.
- Aggregation and filtering — Location–date pairs are aggregated into spatiotemporal events. Events with area > 5,000 km², diameter > 500 km, or duration > 7 days are filtered out. The result is 2.6 million localized events.
Data records
Each record in the dataset corresponds to one spatiotemporal flood observation. The dataset is entity-based: a single large real-world flood can appear as multiple entries (e.g. different neighborhoods or towns). Records from 2000 onward are included.
| Column | Description |
uuid | Unique identifier for the record. |
area_km2 | Area of the reported location polygon (km²). |
start_date | First day (YYYY-MM-DD) with documented evidence of flooding. |
end_date | Last consecutive day with evidence; equals start_date if single-day. |
geometry | Spatial boundary (WGS 84 / EPSG:4326): polygon or buffered point. |
About 82% of events have a footprint under 50 km²; the average footprint is ~142 km², supporting street-level and municipal-scale use cases.
Technical validation
Manual review of 400 randomly selected entries gave 82% practical precision (accurate + approximate/partial). Spatiotemporal matching against external databases shows that Groundsource captures 85%–100% of severe flood events in the Global Disaster Alert and Coordination System (GDACS) between 2020 and 2026. Recall is higher for more severe events (e.g. GDACS orange/red alerts). Geographic recall varies with media coverage and language support.
Use in Google Flood Hub (urban flash floods)
Groundsource is used by Google Research to train and evaluate urban flash flood forecasts. In March 2026, Google announced the roll-out of Urban Flash Flood forecasts on Flood Hub, providing up to 24 hours advance notice in urban areas.
Flash floods account for a large share of flood-related fatalities and often occur far from stream gauges, so traditional gauge-based training data is insufficient. The AI-driven flash flood forecasting work uses Groundsource as ground truth: Gemini-analyzed news reports were used to build a historical flash-flood dataset that trains and evaluates a new model. The model uses global weather products and runs at 20×20 km resolution, with initial focus on urban areas (population density > 100/km²). Precision and recall in many regions of the Global South are reported to be comparable to those in well-instrumented countries, helping narrow the “warning gap” for life-saving early warning systems.
References
Groundsource: A Dataset of Flood Events from NewsMayo, R., Zlydenko, O., Bootbool, M., Fronman, S., Gilon, O., Hassidim, A., Kratzert, F., Loike, G., Matias, Y., Nakar, Y., Nearing, G., Sayag, R., Sicherman, A., Zemach, I., & Cohen, D. (2026). Google Research. EarthArXiv (unreviewed preprint), March 9, 2026.EarthArXiv (preprint)
Protecting cities with AI-driven flash flood forecastingGoogle Research Blog, March 12, 2026. Urban Flash Flood forecasts on Flood Hub; use of Groundsource for training and evaluation.research.google/blog
Introducing Groundsource: Turning news reports into data with GeminiGoogle Research Blog, March 12, 2026. Methodology and dataset introduction.Groundsource Blog