Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electric utilities report a huge amount of information to the US government and other public agencies. This includes yearly, monthly, and even hourly data about fuel burned, electricity generated, operating expenses, power plant usage patterns and emissions. Unfortunately, much of this data is not released in well documented, ready-to-use, machine readable formats. Data from different agencies tends not to be standardized or easily used in tandem. Several commercial data services clean, package, and re-sell this this data, but at prices which are too high to be accessible to many smaller stakeholders.
The Public Utility Data Liberation (PUDL) project takes information that’s already publicly available, and makes it publicly usable, by cleaning, standardizing, and cross-linking utility data from different sources in a single database. Thus far our primary focus has been on fuel use, generation, operating costs, and operation history. It currently includes data from:
We archive snapshots of the raw inputs on Zenodo and all our data processing uses those snapshots as a starting place for reproducibility.
You can find the source code that generates this database in the PUDL repository on GitHub. The PUDL project is coordinated by Catalyst Cooperative.
The data is updated nightly by our automated nightly builds. When they are successful, new data is uploaded to the AWS Open Data Registry
We publish PUDL Data Dictionaries on Read the Docs which provide more descriptive information about the data.
Dataset header image courtesy of Gerry Machen via Flickr under a CC-BY-ND license
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PUDL v2025.2.0 Data Release
This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.
One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.
Some potentially breaking changes to be aware of:
In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.
We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.
Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.
New Data
EIA 176
Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.
Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.
EIA 860
Added EIA 860 Multifuel table. See #3438 and #3946.
FERC 1
Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:
out_ferc1_yearly_detailed_income_statements
out_ferc1_yearly_detailed_balance_sheet_assets
out_ferc1_yearly_detailed_balance_sheet_liabilities
SEC Form 10-K Parent-Subsidiary Ownership
We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.
See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:
out_sec10k_parents_and_subsidiaries
core_sec10k_quarterly_filings
core_sec10k_quarterly_exhibit_21_company_ownership
core_sec10k_quarterly_company_information
Expanded Data Coverage
EPA CEMS
Added 2024 Q4 of CEMS data. See #4041 and #4052.
EPA CAMD EIA Crosswalk
In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.
The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.
EIA 860M
Added EIA 860m through December 2024. See #4038 and #4047.
EIA 923
Added EIA 923 monthly data through September 2024. See #4038 and #4047.
EIA Bulk Electricity Data
Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.
EIA 930
Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.
Bug Fixes
Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.
Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.
Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.
Quality of Life Improvements
We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.
Other PUDL v2025.2.0 Resources
PUDL v2025.2.0 Data Dictionary
PUDL v2025.2.0 Documentation
PUDL in the AWS Open Data Registry
PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/
PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/
Zenodo archive of the PUDL GitHub repo for this release
PUDL v2025.2.0 release on GitHub
PUDL v2025.2.0 package in the Python Package Index (PyPI)
Contact Us
If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:
Follow us on GitHub
Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter
GitHub Discussions is where we provide user support.
Watch our GitHub Project to see what we're working on.
Email us at hello@catalyst.coop for private communications.
On Mastodon: @CatalystCoop@mastodon.energy
On BlueSky: @catalyst.coop
On Twitter: @CatalystCoop
Connect with us on LinkedIn
Play with our data and notebooks on Kaggle
Combine our data with ML models on HuggingFace
Learn more about us on our website: https://catalyst.coop
Subscribe to our announcements list for email updates.
Facebook
TwitterThe Utility Energy Registry (UER) is a database platform that provides streamlined public access to aggregated community-scale utility-reported energy data. The UER is intended to promote and facilitate community-based energy planning and energy use awareness and engagement. On April 19, 2018, the New York State Public Service Commission (PSC) issued the Order Adopting the Utility Energy Registry under regulatory CASE 17-M-0315. The order requires utilities under its regulation to develop and report community energy use data to the UER.This dataset includes electricity and natural gas usage data reported at the city, town, and village level collected under a data protocol in effect between 2016 and 2021. Other UER datasets include energy use data reported at the county and ZIP code level. Data collected after 2021 were collected according to a modified protocol. Those data may be found at https://data.ny.gov/Energy-Environment/Utility-Energy-Registry-Monthly-Community-Energy-U/4txm-py4p.Data in the UER can be used for several important purposes such as planning community energy programs, developing community greenhouse gas emissions inventories, and relating how certain energy projects and policies may affect a particular community. It is important to note that the data are subject to privacy screening and fields that fail the privacy screen are withheld.
Facebook
TwitterEIA previously collected sales and revenue data in a category called "Other." This category was defined as including activities such as public street highway lighting, other sales to public authorities, sales to railroads and railways, and interdepartmental sales. EIA has revised its survey to separate the transportation sales and reassign the other activities to the commercial and industrial sectors as appropriate.
<p class="Bodypara">This is an electric utility data file that includes
utility level retail sales of electricity and associated revenue by end-use sector, State, and reporting month. The data source is the survey: Form EIA-826, "Monthly Electric Utility Sales and Revenue Report with State Distributions." The Form EIA-826 is used to collect retail sales of electricity and associated revenue, each month, from a statistically chosen sample of electric utilities in the United States. The respondents to the Form EIA-826 are chosen from the Form EIA-861, "Annual Electric Utility Report." The data also include, for each State, a record (UTILITYID "000000") containing data values which represent the arithmetic differences between the "estimated" State totals and the sum of the retail sales and associated revenue data reported by the respondents to the Form EIA-826.
The data are compressed into a self-extracting (f826yyyy.exe) zip file. This self-extracting zip file expands into one DBF file (f826utilyyyy.dbf) that contains the yearly data and an ASCII text file (f826layoutyyyy.txt) that contains the file description and record layout for the data base structure. The
current year's file will be a year-to-date file and is maintained in
this monthly format until the data for the final month is finalized.
To expand the self-extracting zip file, type f826yyyy.exe
from a DOS window, or double click on the file name from File Manager
in Windows 3x or Windows Explorer in either Windows 95, Windows 98,
Windows 2000, XP, or ME. Or, click Start, then Run, then select name of
.EXE file to open, then "OK." (Requires approx. 600K space). Usually,
the current year's file will be a "year-to-date" file until the data for
the final month is finalized.
*Note: Substitute the applicable year for "yyyy" in the file name.
File Size: 200 k
Methodology is based on the "Model-Based Sampling, Inference and Imputation."
Contact:
Charlene Harris-Russell
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The databases contain all the technical, financial, and tariff data collected through the study "Making power affordable in Africa and viable for its utilities." The final study and background papers are available at http://www.worldbank.org/affordableviablepowerforafrica. The objective of making the database public is to make data collected through the study available to utility companies, regulators, and practitioners to provide benchmarks and help inform analysis. The databases will be updated from time to time to make corrections or updates for latest data available and therefore may differ from data that appears in the reports. This database is a publication of the African Renewable Energy Access Program (AFREA), a World Bank Trust Fund Grant Program funded by the Kingdom of the Netherlands through ESMAP. It was prepared by staff of the International Bank for Reconstruction and Development / The World Bank.
Facebook
TwitterInvestments in infrastructure have been on the development agenda of Latin American and Caribbean (LCR) countries as they move towards economic and social progress. Investing in infrastructure is investing in human welfare by providing access to and quality basic infrastructure services. Improving the performance of the electricity sector is one such major infrastructure initiative and the focus of this benchmarking data. A key initiative for both public and private owned distribution utilities has been to upgrade their efficiency as well as to increase the coverage and quality of service. In order to accomplish this goal, this initiative serves as a clearing house for information regarding the country and utility level performance of electricity distribution sector. This initiative allows countries and utilities to benchmark their performance in relation to other comparator utilities and countries. In doing so, this benchmarking data contributes to the improvement of the electricity sector by filling in knowledge gaps for the identification of the best performers (and practices) of the region. This benchmarking database consists of detailed information of 25 countries and 249 utilities in the region. The data collected for this benchmarking project is representative of 88 percent of the electrification in the region. Through in-house and field data collection, consultants compiled data based on accomplishments in output, coverage, input, labor productivity, operating performance, the quality of service, prices, and ownership. By serving as a mirror of good performance, the report allows for a comparative analysis and the ranking of utilities and countries according to the indicators used to measure performance. Although significant efforts have been made to ensure data comparability and consistency across time and utilities, the World Bank and the ESMAP do not guarantee the accuracy of the data included in this work. Acknowledgement: This benchmarking database was prepared by a core team consisting of Luis Alberto Andres (Co-Task Team Leader), Jose Luis Guasch (Co-Task Team Leader), Julio A. Gonzalez, Georgeta Dragoiu, and Natalie Giannelli. The team was benefited by data contributions from Jordan Z. Schwartz (Senior Infrastructure Specialist, LCSTR), Lucio Monari (Lead Energy Economist, LCSEG), Katharina B. Gassner (Senior Economist, FEU), and Martin Rossi (consultant). Funding was provided by the Energy Sector Management Assistance Program (ESMAP) and the World Bank. Comments and suggestion are welcome by contacting Luis Andres (landres@worldbank.org)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The databases contain all the technical, financial, and tariff data collected through the study "Making power affordable in Africa and viable for its utilities." The WB study uses national household expenditure surveys conducted since 2008 in 22 countries; it makes use of tariff schedules in effect as of July 2014 in 39 countries, including all of the 22 countries with household surveys. The objective of making the database public is to make data collected through the study available to utility companies, regulators, and practitioners to provide benchmarks and help inform analysis. The databases will be updated from time to time to make corrections or updates for latest data available and therefore may differ from data that appears in the reports. This database is a publication of the African Renewable Energy Access Program (AFREA), a World Bank Trust Fund Grant Program funded by the Kingdom of the Netherlands through ESMAP. It was prepared by staff of the International Bank for Reconstruction and Development / The World Bank. The full report is available at https://openknowledge.worldbank.org/handle/10986/25091 Last Updated 26-Oct-2016 Citation: Trimble, Chris; Kojima, Masami; Perez Arroyo, Ines; Mohammadzadeh, Farah. 2016. Financial Viability of Electricity Sectors in Sub-Saharan Africa: Quasi-Fiscal Deficits and Hidden Costs. Policy Research Working Paper; No. 7788.
Facebook
TwitterThere are limited open source data available for determining water production/treatment and required energy for cities across the United States. This database represents the culmination of a two-year effort to obtain data from cities across the United States via open records requests in order to determine the state of the U.S. urban energy-water nexus. Data were requested at the daily or monthly scale when available for 127 cities across the United States, represented by 253 distinct water and sewer districts. Data were requested from cities larger than 100,000 people and from each state. In the case of states that did not have cities that met these criteria, the largest cities in those states were selected. The resulting database represents a drinking water service population of 81.4 million and a wastewater service population of 86.2 million people. Average daily demands for the United States were calculated to be 560 liters per capita for drinking water and 500 liters per capita of wastewater. The embedded energy within each of these resources is 340 kWh/1000 m3 and 430 kWh/1000 m3, respectively. Drinking water data at the annual scale are available for production volume (89 cities) and for embedded energy (73 cities). Annual wastewater data are available for treated volume (104 cities) and embedded energy (90 cities). Monthly data are available for drinking water volume and embedded energy (73 and 56 cities) and wastewater volume and embedded energy (88 and 70 cities). Please see the two related papers for this metadata are included with this submission. Each folder name is a city that contributed data to the collection effort (City+State Abbreviation). Within each folder is a .csv file with drinking water and wastewater volume and energy data. A READ-ME file within each folder details the contents of the folder within any relevant information pertaining to data collection. Data are on the order of a monthly timescale when available, and yearly if not. Please cite the following papers when using the database: Chini, C.M. and Stillwell, A.S. (2017). The State of U.S. Urban Water: Data and the Energy-Water Nexus. Water Resources Research. 54(3). DOI: https://doi.org/10.1002/2017WR022265 Chini, C.M., and Stillwell, A. (2016). Where are all the data? The case for a comprehensive water and wastewater utility database. Journal of Water Resources Planning and Management. 143(3). DOI: 10.1061/(ASCE)WR.1943-5452.0000739
Facebook
TwitterThe Utility Rate Database (URDB) is a free storehouse of rate structure information from utilities in the United States. Here, you can search for your utilities and rates to find out exactly how you are charged for your electric energy usage. Understanding this information can help reduce your bill, for example, by running your appliances during off-peak hours (times during the day when electricity prices are less expensive) and help you make more informed decisions regarding your energy usage.
Rates are also extremely important to the energy analysis community for accurately determining the value and economics of distributed generation such as solar and wind power. In the past, collecting rates has been an effort duplicated across many institutions. Rate collection can be tedious and slow, however, with the introduction of the URDB, OpenEI aims to change how analysis of rates is performed. The URDB allows anyone to access these rates in a computer-readable format for use in their tools and models. OpenEI provides an API for software to automatically download the appropriate rates, thereby allowing detailed economic analysis to be done without ever having to directly handle complex rate structures. Essentially, rate collection and processing that used to take weeks or months can now be done in seconds!
NREL’s System Advisor Model (formerly Solar Advisor Model or SAM), currently has the ability to communicate with the OpenEI URDB over the internet. SAM can download any rate from the URDB directly into the program, thereby enabling users to conduct detailed studies on various power systems ranging in size from a small residential rooftop solar system to large utility scale installations. Other applications available at NREL, such as OpenPV and IMBY, will also utilize the URDB data.
Upcoming features include better support for entering net metering parameters, maps to summarize the data, geolocation capabilities, and hundreds of additional rates!
Facebook
TwitterFind a list of moving companies regulated by the DPU, along with their rates (when available).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data about power company service areas and their announcements about outages are critical for the effective coordination of resources after disasters, and also for building community and neighborhood resilience. As part of the 2015 White House Mapathon, the Department of Energy's Office of Electricity created a national geospatial database of power company service areas with pointers to public outage information (eg, through Twitter, web sites, and toll-free telephone numbers).
Mapathon participants researched public outage information state by state, and populated a lookup table so that disaster-impacted residents, tourists, first responders and relief volunteers can easily get to the information they need on scope and estimated restore times for power outages. This project benefited from participation of private and public sector folks who need this data for their work, and of third party app developers such as Red Cross and The Weather Channel who will incorporate this data into the information services they offer their users.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database represents a list of community solar projects identified through various sources as of Spring 2018. The list has been reviewed but errors may exist and the list may not be comprehensive. Errors in the souces e.g. press releases may be duplicated in the list. Blank spaces represent missing information. NREL invites input to improve the database including to - correct erroneous information - add missing projects - fill in missing information - remove inactive projects. Updated information can be submitted to Eric O'Shaughnessy at eric.oshaughnessy@nrel.gov.
Facebook
TwitterREVISED 1/2/2019. SEE UPDATE LINK BELOW. This database contains unit cost information for different components that may be used to integrate distributed photovotaic D-PV systems onto distribution systems. Some of these upgrades and costs may also apply to integration of other distributed energy resources DER. Which components are required and how many of each is system-specific and should be determined by analyzing the effects of distributed PV at a given penetration level on the circuit of interest in combination with engineering assessments on the efficacy of different solutions to increase the ability of the circuit to host additional PV as desired. The current state of the distribution system should always be considered in these types of analysis. The data in this database was collected from a variety of utilities PV developers technology vendors and published research reports. Where possible we have included information on the source of each data point and relevant notes. In some cases where data provided is sensitive or proprietary we were not able to specify the source but provide other information that may be useful to the user e.g. year location where equipment was installed. NREL has carefully reviewed these sources prior to inclusion in this database. Additional information about the database data sources and assumptions is included in the Unit_cost_database_guide.doc file included in this submission. This guide provides important information on what costs are included in each entry. Please refer to this guide before using the unit cost database for any purpose.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
BD-L-TC - water reservoirs (point) from the official carto-/topographic database. The BD-L-TC is a vector dataset at the scale 1:5000 which represents the earth surface's objects on the national territory, with attributes in german, french and english. Data transformed into INSPIRE data model Description copied from catalog.inspire.geoportail.lu.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Russia Public Utility Electricity Generation: Diesel Power Stations data was reported at 360.571 kWh mn in Dec 2016. This records an increase from the previous number of 318.089 kWh mn for Nov 2016. Russia Public Utility Electricity Generation: Diesel Power Stations data is updated monthly, averaging 362.486 kWh mn from Jan 2010 (Median) to Dec 2016, with 84 observations. The data reached an all-time high of 1,490.000 kWh mn in Dec 2011 and a record low of 223.595 kWh mn in Jun 2016. Russia Public Utility Electricity Generation: Diesel Power Stations data remains active status in CEIC and is reported by Federal State Statistics Service. The data is categorized under Russia Premium Database’s Energy Sector – Table RU.RBD002: Electricity Generation: Public Utility Electricity Generation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Russia Public Utility Electricity Generation: Hydroelectric Pumped Storage Power Stations data was reported at 163.842 kWh mn in Dec 2016. This records an increase from the previous number of 159.590 kWh mn for Nov 2016. Russia Public Utility Electricity Generation: Hydroelectric Pumped Storage Power Stations data is updated monthly, averaging 154.652 kWh mn from Jan 2010 (Median) to Dec 2016, with 84 observations. The data reached an all-time high of 168.900 kWh mn in Jul 2010 and a record low of 0.200 kWh mn in Mar 2013. Russia Public Utility Electricity Generation: Hydroelectric Pumped Storage Power Stations data remains active status in CEIC and is reported by Federal State Statistics Service. The data is categorized under Russia Premium Database’s Energy Sector – Table RU.RBD002: Electricity Generation: Public Utility Electricity Generation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electric utilities report a huge amount of information to the US government and other public agencies. This includes yearly, monthly, and even hourly data about fuel burned, electricity generated, operating expenses, power plant usage patterns and emissions. Unfortunately, much of this data is not released in well documented, ready-to-use, machine readable formats. Data from different agencies tends not to be standardized or easily used in tandem. Several commercial data services clean, package, and re-sell this this data, but at prices which are too high to be accessible to many smaller stakeholders.
The Public Utility Data Liberation (PUDL) project takes information that’s already publicly available, and makes it publicly usable, by cleaning, standardizing, and cross-linking utility data from different sources in a single database. Thus far our primary focus has been on fuel use, generation, operating costs, and operation history. It currently includes data from:
We archive snapshots of the raw inputs on Zenodo and all our data processing uses those snapshots as a starting place for reproducibility.
You can find the source code that generates this database in the PUDL repository on GitHub. The PUDL project is coordinated by Catalyst Cooperative.
The data is updated nightly by our automated nightly builds. When they are successful, new data is uploaded to the AWS Open Data Registry
We publish PUDL Data Dictionaries on Read the Docs which provide more descriptive information about the data.
Dataset header image courtesy of Gerry Machen via Flickr under a CC-BY-ND license