Drinking Water Data for the Nation: Version 1.0 of the National Drinking Water Dataset + Explorer Tool Is Here
the tl;dr — After piloting in Texas, building a national data pipeline covering 821 variables from 30+ datasets, and engaging stakeholders across the drinking water ecosystem, EPIC is releasing Version 1.0 of the National Drinking Water Dataset + Explorer Tool. It's free, open source, and built for anyone who needs to understand drinking water systems across the U.S. Explore the tool here.
All Americans deserve access to safe, reliable, and affordable drinking water. But turning these principles into reality requires untangling a complex web of interdependent factors — and that requires data. Not just any data: connected, usable, trustworthy data that doesn't require a data science degree to put to work.
That's the gap we've been working to close. Today, we're releasing Version 1.0 of the National Drinking Water Dataset + Explorer Tool, a free, web-based platform that brings together 30+ state and national datasets — covering water system compliance, funding history, environmental hazards, census demographics, climate vulnerability, and more — into a single, harmonized, and openly accessible resource. This effort is part of the Digital Service for the Planet (DSP) initiative with our partners, New America.
Access the tool via our website
From Texas to the Nation
This work started in 2024 with a pilot data pipeline and application in Texas — a state with enormous demand for water infrastructure investment. That tool helped utilities, policymakers, and technical assistance providers answer questions that previously required juggling data from a half-dozen sources. The feedback was strong, and the most common comment we heard was: "I don't live in Texas."
So we built it for everyone.
Over the past year, we undertook national discovery — learning from researchers, regulators, utilities, technical assistance providers, community organizations, and funders about the data they use, the questions they need to answer, and what they wished existed. Then we built a transparent and durable data pipeline capable of delivering it, and a tool to put it in people's hands.
The Data: 821 Variables, 30+ Datasets, One Unified Pipeline
The dataset is a product in its own right. Our data pipeline pulls from sources across three interconnected domains:
Water systems — EPA's Safe Drinking Water Information System (SDWIS), disaggregated rule violations over 5 and 10 years, State Revolving Fund award history, EPA Service Area Boundaries, and drinking water advisories collected from 13 states (boil water, do-not-drink, and do-not-use orders — the largest publicly accessible inventory of its kind in the U.S.).
Communities served — Census demographics and 10-year change for 50+ variables, annual median household income, estimated water rates, the CDC Social Vulnerability Index, and the Climate and Economic Justice Screening Tool (CEJST).
Environmental context — Well and intake watersheds, impaired waterways, underground storage tanks, facilities with Risk Management Plans, and NPDES permits — all linked to the water systems drawing from those sources.
We investigated 821 variables and distilled them into approximately 130 well-documented fields available in the tool. Everything is summarized at the water system level using Public Water System ID (PWSID), meaning data from watersheds, census geographies, and tabular sources has all been harmonized to a common unit. We used EPA's crosswalk and a mix of areal, household-weighted, and population-weighted interpolation methods to do this right — and we documented every step.
The pipeline runs on AWS, updating key datasets daily, quarterly, or annually. The tool itself updates on a quarterly basis, after datasets pass data quality checks and expert review. All code is open source and available on our public GitHub repository.
How the National Drinking Water Dataset spatially combines different sources
Built for Real Work
Snapshot of EPIC’s Public Comment using the National Drinking Water Dataset - Texas’s HB500 $1B investment in water infrastructure
The tool is designed for the full spectrum of drinking water stakeholders — and different users will use it differently. Think of the dataset and tool as an engine that can power all kinds of vehicles.
For technical assistance providers, it eliminates the "too many tabs open" problem. Before engaging a community, our Funding Navigator team used to piece together Census data, SRF records, compliance history, and more from separate sources. The tool puts all of that in one place, freeing up time to actually support communities.
For policymakers, it enables more rigorous, data-grounded analysis. Using an earlier iteration of our national tool, EPIC submitted public comment on Texas SB500 — a $1 billion grant investment in water infrastructure. Our analysis showed that proposed funding allocations disproportionately favored large systems, while small and very small systems serve the greatest share of Texans. Read more about how the Texas Water Development Board implemented our analysis - dedicating $42 million to systems serving under 1,000 people - and increasing project proposal caps.
For researchers and academics, the tool lowers the barrier to move from question to analysis. Teams at ASU, Stanford, and UCLA have already been testing methods and generating insights with earlier versions.
For regulators, it supports more equitable deployment of funds by making system performance, community characteristics, and funding history visible in one place.
What You Can Do With It
The tool's main features include an interactive map with filters, a data table, dataset cards with documentation, and data downloads. Some quick use cases:
Find all drinking water systems in a given state, filtered by size, ownership type, or compliance status
Identify systems with open violations and climate vulnerability
Find small systems serving low-income communities that haven't received SRF funding
Download filtered datasets for your own analysis
Link directly to authoritative EPA compliance data for any system
The tool was developed in collaboration with the Center for Neighborhood Technology (CNT).
What We've Learned
This work is more ambitious than we initially imagined. Scaling from Texas to all 50 states wasn't 50 times the work — it was to the 50th power. A few things have become clear along the way:
Iteration matters, and it has to happen in the open. We won't get it right the first time. The pipeline is modular and open source precisely to make improvement efficient and community-driven.
Different users need different things. The drinking water space spans utility operators, community advocates, federal policymakers, and academics — all with different questions and different data needs. Walking the line between too much and too little complexity is core to the project's success.
The dataset and tool are both products. We built a data pipeline, not just a one-off dataset. When you turn on the tap here, the data should flow.
What's Next
We're now moving into Version 1.1 development this spring and summer, driven by feedback from users like you. We're also gearing up to launch usability sessions, and group steering opportunities — dedicated spaces to engage practitioners, researchers, and advocates on what works now, and what to build next
Here's how you can get involved:
Check it out! National Drinking Water Dataset + Explorer Tool
💻 Interested in collaborating? Quick Sign up to get invited to events and sessions!
👋 Want to learn more? View the release webinar recording and slide deck.
🎥 We built this with you, and we're continuing to build it with you. If you've found the work useful — or if you see what else is possible — we want to hear from you.
National Approach to Wildfire Data and Technology: Operations-Centered Innovation Pathways
Co-Written by James Puerini
Somewhere mid-way down the mountainous scrapheap of failed startups, one can find the dented chassis of a countertop juicer named the Juicero. The device ranged in cost from $400 - $700, was only compatible with proprietary QR-coded juice packs, and was ultimately superfluous as the packs could be snipped open for hand squeezing. It is a classic example of an over-designed solution for a non-existent problem that no one asked for.
To avoid Juicero debacles, our national wildfire intelligence capability needs to establish an operations-centered innovation cycle from day one. Data and technology initiatives must be rooted in the explicit needs of the people who use those systems to discover new solutions and improve wildfire resiliency.
Not doing so will waste time and resources developing innovations that answer the wrong questions, fail when deployed, or simply don’t get adopted at all. Beyond that, leaders of this initiative face a particular challenge as an integrator of operations housed in other institutions. They must evaluate, adapt, and coordinate needs, resources, and capabilities from across those institutions to deliver on the opportunity in front of them.
Committed leaders dedicated to personnel-identified solutions can spin up a sustainable cycle of solutioneering that focuses the power of nation-scale data for acutely scoped needs.
Making the Juice(ro) Worth the Squeeze: Three T's for an Operations-Centered Innovation Pathway
To effectively modernize wildfire management, forest management, frontline operations, analytics, and research must take precedence when setting national priorities for data and technology. In a high stakes, capacity-constrained ecosystem, every decision – from which data to collect to which products to procure – must be tethered to end user needs: translating investments into action and forging a culture of trust.
The cycle relies on three critical structures with analogs from the National Weather Service (NWS) as reference points.
*For more on these NOAA analogs, see the following: Hazardous Weather Testbed, Shadow Forecast Program, Tech Readiness Levels.
Operations-centered does not mean monodirectional; while practitioners push, leaders must pull. Executives in the national wildfire intelligence capability and their partner agencies need to step into the fray to ensure that money, clout, and time are allocated to projects with real promise.
For example, for large wildfire incidents, some experts envision the need for dedicated “Intelligence Officers” within Incident Management Teams to operationalize data for real-time decision-making. Expanding that logic to detail similar roles for prevention, detection, preparedness, mitigation, and recovery will improve the overall coordination of our response apparatus and save money, communities, natural places, and lives.
As they reach down their organizational chains, executives must also engage with the private sector, research institutions, and NGOs. The agility, ingenuity, and alternative resourcing these groups bring to the table pair with practitioner expertise to drive entirely new lines of inquiry for impact. Beyond direct engagement, a key element of national wildfire intelligence is the creation and maintenance of an open-data ecosystem that enables external innovators to access and shape that data into value-added services. Open data and the interoperability that supports its use will be covered in more detail in the two blogs that follow.
Resounding Calls for the Operations-Centered Innovation Pathway
The calls for an operations-centered innovation pathway are clarion from experts across the board. Across our conversations, this was the single most cited priority. And the rankings weren’t particularly close.
Wildland fire is an inherently place-based issue. Individual geographies ignite, burn, and recover differently. Community preparedness varies widely. Response resources are distributed differentially. While centralizing data and technology (including their governance and administration) has significant upside, doing so without prioritizing on-the-ground knowledge and needs will backfire. Testing, evaluation, and customization must prioritize regional and local reality. This work gets done on the ground. Their needs should steer the actions of organizations designed to support them.

