9 datasets found

Z
Public Utility Data Liberation Project (PUDL) Data Release
data.niaid.nih.gov
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Belfer, Ella (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653158
Explore at:
Dataset updated
Feb 14, 2025
Dataset provided by
Catalyst Cooperative
Authors
Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL v2025.2.0 Data Release

This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.

One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.

Some potentially breaking changes to be aware of:

In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.

We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.

Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.

New Data

EIA 176

Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.

Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.

EIA 860

Added EIA 860 Multifuel table. See #3438 and #3946.

FERC 1

Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:

out_ferc1_yearly_detailed_income_statements

out_ferc1_yearly_detailed_balance_sheet_assets

out_ferc1_yearly_detailed_balance_sheet_liabilities

SEC Form 10-K Parent-Subsidiary Ownership

We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.

See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:

out_sec10k_parents_and_subsidiaries

core_sec10k_quarterly_filings

core_sec10k_quarterly_exhibit_21_company_ownership

core_sec10k_quarterly_company_information

Expanded Data Coverage

EPA CEMS

Added 2024 Q4 of CEMS data. See #4041 and #4052.

EPA CAMD EIA Crosswalk

In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.

The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.

EIA 860M

Added EIA 860m through December 2024. See #4038 and #4047.

EIA 923

Added EIA 923 monthly data through September 2024. See #4038 and #4047.

EIA Bulk Electricity Data

Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.

EIA 930

Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.

Bug Fixes

Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.

Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.

Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.

Quality of Life Improvements

We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.

Other PUDL v2025.2.0 Resources

PUDL v2025.2.0 Data Dictionary

PUDL v2025.2.0 Documentation

PUDL in the AWS Open Data Registry

PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/

PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/

Zenodo archive of the PUDL GitHub repo for this release

PUDL v2025.2.0 release on GitHub

PUDL v2025.2.0 package in the Python Package Index (PyPI)

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

Follow us on GitHub

Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter

GitHub Discussions is where we provide user support.

Watch our GitHub Project to see what we're working on.

Email us at hello@catalyst.coop for private communications.

On Mastodon: @CatalystCoop@mastodon.energy

On BlueSky: @catalyst.coop

On Twitter: @CatalystCoop

Connect with us on LinkedIn

Play with our data and notebooks on Kaggle

Combine our data with ML models on HuggingFace

Learn more about us on our website: https://catalyst.coop

Subscribe to our announcements list for email updates.
Public Utility Data Liberation Project (PUDL) Data Release
zenodo.org
bin, json, zip
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer (2024). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. http://doi.org/10.5281/zenodo.13346011
Explore at:
zip, json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13346011
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL v2024.8.0 Data Release

This is our regular quarterly release for 2024Q3. It includes quarterly updates to all datasets that are updated with quarterly or higher frequency by their publishers, including EIA-860M, EIA-923 (YTD data), EIA-930, the EIA’s bulk electricity API data (used to fill in missing fuel prices), and the EPA CEMS hourly emissions data.

Annual datasets which have been published since our last quarterly release have also been integrated. These include FERC Forms 1, 2, 6, 60, and 714, and the NREL ATB.

This release also includes provisional versions of the annual 2023 EIA-860 and EIA-923 datasets, whose final release will not happen until the fall.

New Data Coverage

FERC Form 1

Integrated FERC Form 1 data from 2023 into the main PUDL SQLite DB. See issue #3700 and PR #3701. This required updating to a new version of the catalystcoop.ferc_xbrl_extractor package because there are now multiple XBRL taxonomies in use by FERC in different years, or even within the same year. See this PR for more details, as well as issue #3544 and PR #3710.

FERC Forms 2, 6, 60, & 714

Updated the ferc_to_sqlite settings to extract 2023 XBRL data for FERC Forms 2, 6 60, and 714 and add them to their respective SQLite databases. Note that this data is not yet being processed beyond the conversion from XBRL to SQLite. See PR #3710

EIA AEO

Added new tables from EIA AEO table 54:

core_eiaaeo_yearly_projected_fuel_cost_in_electric_sector_by_type contains fuel costs for the electric power sector. These are broken out by fuel type, and include both nominal USD per MMBtu as well as real 2022 USD per MMBtu. See issue #3649 and PR #3656.

EIA 860

Added EIA 860 early release data from 2023. This included adding a new tab with proposed energy storage generators as well as adding a number of new columns regarding energy storage and solar generators. See issue #3676 and PR #3681.

Added EIA 860m data through June 2024. See issue #3759 and PR #3767.

EIA 923

Added EIA 923 early release data from 2023. See #3719 and PR #3721.

Added EIA 923 monthly data through May as part of the Q2 quarterly release. See #3760 and #3768.

EIA 930

Added EIA 930 hourly data through the end of July as part of the Q2 quarterly release. See #3761 and #3789.

EPA CEMS

Added 2024 Q2 of CEMS data. See #3762 and #3769.

EIA Bulk Electricity Data

Updated the EIA Bulk Electricity data archive to include data that was available as of 2024-08-01, which covers up through 2024-05-01 (3 months more than the previously used archive). See #3763 and PR #3785.

FERC 714

Added core_ferc714_yearly_planning_area_demand_forecast based on FERC Form 714, Part III, Schedule 2b. Data includes forecasted demand and net energy load. See issue #3519 and PR #3670.

NREL ATB

Added 2024 NREL ATB data. This includes adding a new tax credit case, model_tax_credit_case_nrelatb, a breakout of capex_grid_connection_per_kw for all technologies, and more detailed nuclear breakdowns of fuel_cost_per_mwh. Simultaneously, updated the docs.dev.existing_data_updates documentation to make it easier to add future years of data. See #3706 and #3719.

Updated NREL ATB data to include error corrections in the 2024 data. See #3777 and PR #3778.

Data Cleaning

When generator_operating_date values are too inconsistent to be harvested successfully, we now take the last reported date in EIA 860 and 860M. See #423 and PR #3967.

Added the generator_operating_date field into core_eia860m_changelog_generators, adding 860M reported generator operating dates into the changelog table. This table is not harvested, and thus does not affect the generator_operating_date values reported in other core EIA tables. See #3722 and PR #3751.

Bug Fixes

Disabled filling of missing values using rolling averages for the fuel_cost_per_mmbtu column in the out_eia923_fuel_receipts_costs table, as it was resulting in some anomlously high fuel prices. See #3716. This results in about 2% more records in the table being left NA after filling with the average prices for that fuel type for the state and month found in the bulk EIA API data.

Quality of Life Improvements

The full ETL settings are now read directly from etl_full.yml instead of using default values defined in the settings classes. This also results in the settings showing up in the Dagster UI Launchpad, which previously they didn’t, leading to confusion when trying to re-run the FERC to SQLite conversions. See #3710.

mlflow experiment tracking has been disabled by default when running the DAG, since it is only really helpful during development of new record linkage or other ML workflows. See #3710.

Other PUDL v2024.8.0 Resources

PUDL v2024.8.0 Data Dictionary

PUDL v2024.8.0 Documentation

PUDL in the AWS Open Data Registry

PUDL v2024.8.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2024.8.0/

PUDL v2024.8.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2024.8.0/

Zenodo archive of the PUDL GitHub repo for this release

PUDL v2024.8.0 release on GitHub

PUDL v2024.8.0 package in the Python Package Index (PyPI)

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data.
Public Utility Data Liberation Project (PUDL) Data Release
zenodo.org
bin, json, zip
Updated Aug 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. http://doi.org/10.5281/zenodo.16878930
Explore at:
zip, json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16878930
Dataset updated
Aug 15, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
v2025.8.0 (2025-08-14)

This is a regular quarterly release of PUDL. It includes new 2024 annual updates for a number of datasets (FERC Forms 2, 6, 60, & 714), and a minor update to the 2024 FERC Form 1 data that includes late filings & revisions. It also includes year-to-date updates for the monthly and quarterly datasets, including EIA-860M, EIA-923, EIA-930, and the EPA CEMS hourly emissions. There were also a number of data processing bug fixes and data usability improvements. See the full notes below for details.

New Data

Thanks to contributions from @alexclippinger, we’ve added cleaned EIA923 Schedule 8A Byproduct Disposition to the PUDL database as _core_eia923_yearly_byproduct_disposition. Once harvested, this table will be replaced with a well-normalized version of the same data, but it is being published in this form until then. See #4100 and #2448, and #4502.

Expanded Data Coverage

EIA-860M

Updated EIA-860M monthly generator report with newly published data for May and June of 2025. See issue #4379 and PR #4536.

EIA-923

Added EIA-923 data through May 2025. See #4516 and #4538.

EIA 930

Updated EIA 930 data published up through the beginning of August 2025. See #4517 and PR #4523.

EIA Bulk Electricity API

Updated the EIA Bulk Electricity data to include data published up through the beginning of August 2025. See #4519 and PR #4523.

EPA CEMS

Added EPA CEMS data through June 2025. See #4518 and #4531.

FERC Form 1

Updated FERC Form 1 2024 data to include late respondents. See #4493 and #4507.

FERC Forms 2, 6 and 60

Updated our extraction of FERC Forms 2, 6, and 60 to raw SQLite databases to include 2024 data. See #4418 and #4433.

FERC Form 714

Integrated 2024 data for FERC Form 714. See issue #4409 and PR #4530.

PHMSA Gas Data

Extracted 2023 and 2024 PHMSA distribution and transmission data to raw assets. This data is not currently published to the PUDL database. See #4449 and #4470.

Extracted 1970 through 1989 PHMSA transmission data to raw assets. This data is not currently published to the PUDL database. See #3290 and #4500.

Quality of Life Improvements

The output of dbt_helper update-tables now conforms to the format that our pre-commit hooks expect, reducing annoying back-and-forth and diffs. See #4119 and #4401.

Improved behavior of dbt_helper when interacting with row count test definitions as well as updating the row counts stored in dbt seed tables: the logic for writing a new table dbt schema no longer includes automatically adding a row count test. Also, the logic for updating row counts now depends on whether a test has been defined in the dbt schema, whether any existing row counts for that table are present in the seed table, as well as user provided settings such as --clobber.

Stopped running code checks in CI when only the documentation has changed. See issue #4410 and PR #4429.

Added utility_id_ferc1_dbf and utility_id_ferc1_xbrl columns into all ferc1 output tables. See #4365 and PR #4528.

Bug Fixes

Fixed bug in how we were labeling the data_maturity of EIA 923. See issue #4328 and PR #4392.

Fixed bug in how we were repairing a misfiled EIA code in core_ferc714_respondent_id. See issue #4439 and PR #4497.

Fixed bug in how we were removing duplicates in core_eia923_monthly_generation resulting in ~400 more records in this table over several years. See details in PR #4538

Documentation

Migrated table description metadata into new format; see epic #4358 for issues & PRs for all source groups.

This included renaming two of the preliminarily published _core tables to better conform with our table naming conventions. Table _core_eia923_cooling_system_information is now _core_eia923_monthly_cooling_system_information and _core_eia923_fgd_operation_maintenance is now _core_eia923_yearly_fgd_operation_maintenance. See #4422.

Added data source pages for:

EPA CAMD to EIA Power Sector Data Crosswalk; see issue #4376 and PR #4403

New Tests and Data Validations

EIA-930 and FERC-714 Hourly Imputed Demand

Added checks which ensure that only hourly electricity demand values which are flagged for imputation change significantly from their reported values before and after the imputation. Check that the missingness of various columns in the hourly reported demand and imputed demand are within expected ranges. Explicitly flag years of which are dropped due to insufficient data for meaningful imputation with BAD_YEAR. Affected tables include out_eia930_hourly_operations, out_eia930_hourly_subregion_demand, and out_ferc714_hourly_planning_area_demand. See PR #4334.

Check for entirely null column-years

Previously we had a data validation check that ensured there were no entirely null columns applied to a handful of tables. Such columns were typically the result of typos or failures to update column names, or application
Public Utility Data Liberation Project (PUDL) Data Release
zenodo.org
application/gzip, bin +1
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer (2024). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. http://doi.org/10.5281/zenodo.11292273
Explore at:
bin, json, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11292273
Dataset updated
May 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL v2024.5.0 Data Release

We've just completed our quarterly integration of EIA data sources for 2024Q2 (in support of RMI's Utility Transition Hub) and have also added a bunch of new tables over the last few months in an effort to better support energy system modelers (with support from GridLab).

New Data Coverage

EIA-860 & EIA-923

Added cleaned EIA860 Schedule 8E FGD Equipment and EIA923 Schedule 8C FGD Operation and Maintenance data to the PUDL database as
_core_eia923_fgd_operation_maintenance and _core_eia860_fgd_equipment. Once harvested, these tables will eventually be
removed from the database, but they are being published until then. See issues #3394 and #3392, and PR #3403.

Added new core_eia860_scd_generators_wind table from EIA860 Schedule 3.2 which contains wind generator attributes. See PRs #3522 and #3494.

Added new core_eia860_scd_generators_solar table from EIA860 Schedule 3.3 which contains solar generator attributes. See PRs #3524 and #3482.

Added new core_eia860_scd_generators_energy_storage table from EIA860 Schedule 3.4 which contains energy storage generator attributes. See PRs #3488 and #3526.

Added new core_eia923_monthly_energy_storage table from EIA923 which contains monthly energy and fuel consumption metrics. See PRs #3516 and #3546.

Added 2024 Q1 EIA923 and EIA860m data. See issues #3617 and #3618, and PR #3625.

GridPath RA Toolkit

Added a new gridpathratoolkit data source containing hourly wind and solar generation profiles from the GridPath Resoure Adequacy Toolkit. See our documentation and the new Zenodo archive, PR #3489 and this PUDL archiver issue.

Integrated the most processed version of the GridPath RA Toolkit wind and solar generation profiles, as well as the tables describing how individual generators were aggregated together to create the profiles. See issues #3509, #3510, #3511, and #3515 as well as PR #3514. The new tables include: out_gridpathratoolkit_hourly_available_capacity_factor and core_gridpathratoolkit_assn_generator_aggregation_group.

EIA AEO

Extracted tables 13, 15, 20, and 54 from the EIA Annual Energy Outlook 2023, which include future projections related to electric power and renewable energy through the year 2050, across a variety of scenarios. See issue #3368 and PR #3538.

Added new tables from EIA AEO table 54:

:ref:`core_eiaaeo_yearly_projected_generation_in_electric_sector_by_technology` contains generation capacity & generation projections for the electric sector, broken out by technology type. See issue #3581 and PR #3582.

:ref:`core_eiaaeo_yearly_projected_generation_in_end_use_sectors_by_fuel_type` contains generation capacity & generation projections for the electric sector, broken out by technology type. See issue #3581 and PR #3598.

:ref:`core_eiaaeo_yearly_projected_electric_sales` contains electric sales projections until 2050, broken out by customer type. See issue #3581 and PR #3617.

NREL ATB

Added new NREL ATB tables with annual technology cost and performance projections. See issue #3465 and PRs #3498, #3570.

EIA-930

Added hourly generation, demand, and interchange tables from the EIA-930. See issues #3486 and #3505, PR #3584, and this issue in the PUDL archiver repo. See the data source documentation for more information.

EPA CEMS

Added 2024 Q1 of CEMS data. See issue #3620 and PR #3624.

EIA Bulk Electricity Data

Updated the EIA Bulk Electricity data archive to include data that was available as of 2024-05-01, which covers up through 2024-02-01 (3 months more than the previously used archive). See PR #3615.

FERC Form 1

Added new out_ferc1_yearly_rate_base table which includes granular financial data regarding what utilities include in their rate bases. See epic #2016.

Data Cleaning

When generator_operating_date values are too inconsistent to be harvested successfully, we now take the max date within a year and attempt to harvest again, to rescue records lost because of inconsistent month reporting in EIA 860 and 860M. See issue #3340 and PR #3419. This change also fixed a bug that was preventing other columns harvested with a special process from being saved.

When ingesting FERC 1 XBRL filings, we now take the most recent non-null value instead of the value from the latest filing that applies for a specific row. This means that we no longer lose data if a utility posts a FERC filing with only a small number of updated values. See issue #3309 and PR #3545.

EIA - FERC1 Record Linkage Model Update

We merged in a refactor of the EIA plant parts to FERC1 plants record linkage model, which was generously supported by a CCAI Innovation Grant. This replaced the linear regression model with a model built
Q
Data for: Making the real: Rhetorical adduction and the Bangladesh...
data.qdr.syr.edu
Updated Nov 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph O'Mahoney; Joseph O'Mahoney (2023). Data for: Making the real: Rhetorical adduction and the Bangladesh Liberation War [Dataset]. http://doi.org/10.5064/F6M2H9VQ
Explore at:
pdf(469931), pdf(1855397), pdf(643666), pdf(1143595), pdf(1137028), pdf(729721), pdf(574915), pdf(1370863), pdf(1326238), pdf(1481295), pdf(1283462), pdf(592212), application/x-json-hypothesis(69650), pdf(1031805), pdf(444483), pdf(752959), pdf(690519), pdf(856413), pdf(244748), pdf(527574), pdf(1601197), pdf(635560), pdf(2210653), pdf(2323480), pdf(1417185), pdf(1671062), pdf(583011), pdf(666643), pdf(1229547), pdf(552236), pdf(3360994), pdf(1180783), pdf(1043689), pdf(895213), pdf(1711382), pdf(1362381)Available download formats
Unique identifier
https://doi.org/10.5064/F6M2H9VQ
Dataset updated
Nov 13, 2023
Dataset provided by
Qualitative Data Repository
Authors
Joseph O'Mahoney; Joseph O'Mahoney
License
https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
Time period covered
1970 - 1974
Area covered
Bangladesh, United States, United Nations, United States
Description
This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the publisher's website. The overarching empirical research question of the paper is “why did states recognize Bangladesh as a state?” and, more specifically, “why did (most of) the international community first condemn and then accept Bangladesh as a state?”. The goal of the empirical section of the paper was to do theory-building process-tracing of the decisions to recognize Bangladesh, that is, to build a theoretical explanation from the empirical evidence of a particular case, and then inferring that an analytically general mechanism exists. Data generation After immersing myself in the secondary literature and the archival material that I had collected for the prior doctoral project, I had an idea for a skeleton causal mechanism, i.e. that the withdrawal of Indian troops from Bangladesh had somehow changed the status of recognition, i.e. legitimated recognition. In order to assess this idea, I then consulted some theory from international relations, psychology, sociology, and cognitive science, on how decisions are made and how arguments work, in order to hypothesize a causal mechanism. This causal mechanism, elucidated in the paper, was rhetorical adduction; basically that states try to win arguments (thus changing the behavior of relatively uncommitted audiences relative to some policy) by linking some empirical state of affairs with their argument and then bringing that empirical state of affairs about. In this Bangladesh case, this meant that some actors argued that although India’s invasion and occupation of East Pakistan made recognition of Bangladesh problematic, the withdrawal of Indian troops from Bangladesh would dismiss or undercut the critique. At this point, I formulated some observable implications of this idea, such as that if this is what had actually been going on, the states making the argument (e.g. Bangladesh and India) would have to actually have made the argument, and states would have explicitly conditioned their recognition policy decision on the withdrawal of Indian troops. In order to find out whether there was any evidence for these observable implications, I consulted three main types of evidence; 1) public statements by state representatives in the press and at the UN (using the UN verbatim meeting records), 2) UK political and diplomatic archives and 3) US political and diplomatic archives. As it happens, the UK was heavily involved in discussions surrounding recognition and the US was not (US President Richard Nixon and National Security Adviser Henry Kissinger were more concerned with other issues, like supporting West Pakistan and also organizing the historic visit to the People’s Republic of China), so that almost all of the relevant evidence came from UK archives. A clear limitation of this sampling frame is that it relies on 3rd party evaluations of internal deliberations of most of the states involved. This is less of a problem than it might otherwise be because there seems little reason to explicitly condition recognition on troop withdrawal in private and secret/confidential bilateral communication with the UK if it is irrelevant to internal deliberations. If there had been some clear self-interest in misrepresenting, in this type of communication, then it would affect the plausibility of the causal claims. I collected most of the documents used in the paper from the National Archives at Kew in the UK during two visits, one in January 2011 and another in July 2013. The first visit was to collect data for my doctoral dissertation, which was a prior, separate project from this paper. While I was finishing the Bangladesh case for my dissertation, I began to have another idea about the material. That is, I started to think that a slightly different type of conceptual/theoretical argument was relevant to a different empirical aspect of the Bangladesh case. However, as I had not had that in mind when initially collecting archival documents, I arranged a second visit to search for more information more directly relevant to this second puzzle. The documents primarily come from a series of folders from the Foreign and Commonwealth Offices’ archives and the Premiers’ Archives that I found via two methods. First, I used the citations in Musson 2008 to identify potentially important or relevant material and then made a list of all the folders that that material was contained in. Second, I performed keyword searches for recognition and Bangladesh in the National Archives database search engine. While I was in the archives, I made copies of almost every single document in the folders that I had previously identified. I excluded documents that were obvious duplicates or that had no readable text. Data analysis Data Analysis for this paper involved reading through all of the documents, constructing a detailed timeline of who said what when and who did what when, and then...
PUDL Data Release v4.0.0
zenodo.org
application/gzip
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush (2023). PUDL Data Release v4.0.0 [Dataset]. http://doi.org/10.5281/zenodo.6349861
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6349861
Dataset updated
Aug 28, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL Data Release 4.0.0

This is a data release from the Public Utility Data Liberation (PUDL) project.

GitHub repository for the software used to generate this data.

Zenodo archive of the particular version (v0.6.0) of the software that went into this release. For use in citations & long-term accessibility you can use this doi: https://doi.org/10.5281/zenodo.6349337

Documentation and release notes for the software and data.

The software can be installed via the Python Package Index (PyPI) or from conda-forge.

Using This Data

The data in this archive is stored in a combination of SQLite database files, and Apache Parquet datasets. It can be used as a standalone resource, or in conjunction with the PUDL software. The PUDL documentation contains data dictionaries for many of the data tables.

If you want to use the data in conjunction with the PUDL software, we've included a Docker image within the archive that will run a Jupyter Notebook Server containing examples of use based on our PUDL Examples repository. This Docker image contains all of the required software, and can access the associated archived data.

Make sure that you've got Docker installed and running, and also have docker-compose. You'll want to allocate at least 8GB of memory to Docker.

To use the Docker container to access and work with the data, download and extract the compressed tar archive on you computer.

Inside the directory that is created when you extract the archive, you will find a Docker image. Load that image into your Docker environment locally with:

docker load -i pudl-jupyter.tar

Then within that same directory, run:

docker-compose up

This should start a Jupyter Notebook Server, and provide you with a link to connect to the server running on your local computer, beginning with https://127.0.0.1:48512 or https://localhost:48512

You can select the tutorial notebooks from within the notebook interface. The README file contained in the archive and the PUDL Examples repository both provide more details on how to access and work with the data.

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. You can also:

Subscribe to our announcements list for email updates.

Use the Github issue tracker to file bugs, suggest improvements, or ask for help.

Email the project team at pudl@catalyst.coop for private communications.

Follow @CatalystCoop on Twitter.
PUDL Data Release v3.0.0
zenodo.org
application/gzip
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush (2023). PUDL Data Release v3.0.0 [Dataset]. http://doi.org/10.5281/zenodo.5701406
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5701406
Dataset updated
Aug 28, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL Data Release 3.0.0

This is a data release from the Public Utility Data Liberation (PUDL) project.

GitHub repository for the software used to generate this data.

Zenodo archive of the particular version (v0.5.0) of the software that went into this release. For use in citations & long-term accessibility you can use this doi: https://doi.org/10.5281/zenodo.5677623

Documentation and release notes for the software and data.

The software can be installed via the Python Package Index (PyPI) or from conda-forge.

Using This Data

The data in this archive is stored in a combination of SQLite database files, and Apache Parquet datasets. It can be used as a standalone resource, or in conjunction with the PUDL software. The PUDL documentation contains data dictionaries for many of the data tables.

If you want to use the data in conjunction with the PUDL software, we've included a Docker image within the archive that will run a Jupyter Notebook Server containing examples of use based on our PUDL Examples repository. This Docker image contains all of the required software, and can access the associated archived data.

Make sure that you've got Docker installed and running, and also have docker-compose. You'll want to allocate at least 8GB of memory to Docker.

To use the Docker container to access and work with the data, download and extract the compressed tar archive on you computer.

Inside the directory that is created when you extract the archive, you will find a Docker image. Load that image into your Docker environment locally with:

docker load -i pudl-jupyter.tar

Then within that same directory, run:

docker-compose up

This should start a Jupyter Notebook Server, and provide you with a link to connect to the server running on your local computer, beginning with https://127.0.0.1:48512 or https://localhost:48512

You can select the tutorial notebooks from within the notebook interface. The README file contained in the archive and the PUDL Examples repository both provide more details on how to access and work with the data.

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. You can also:

Subscribe to our announcements list for email updates.

Use the Github issue tracker to file bugs, suggest improvements, or ask for help.

Email the project team at pudl@catalyst.coop for private communications.

Follow @CatalystCoop on Twitter.
PUDL Data Release v2.0.0
zenodo.org
application/gzip
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik (2023). PUDL Data Release v2.0.0 [Dataset]. http://doi.org/10.5281/zenodo.5214231
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5214231
Dataset updated
Aug 28, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL Data Release 2.0.0

This is a data release from the Public Utility Data Liberation (PUDL) project.

GitHub repository for the software used to generate this data.

Zenodo archive of the particular version (v0.4.0) of the software that went into this release. For use in citations & long-term accessibility you can use this doi: https://doi.org/10.5281/zenodo.5207986

Documentation and release notes for the software and data.

The software can be installed via the Python Package Index (PyPI) or from conda-forge.

Using This Data

The data in this archive is stored in a combination of SQLite database files, and Apache Parquet datasets. It can be used as a standalone resource, or in conjunction with the PUDL software. The PUDL documentation contains data dictionaries for many of the data tables.

If you want to use the data in conjunction with the PUDL software, we've included a Docker image within the archive that will run a Jupyter Notebook Server containing examples of use based on our PUDL Examples repository. This Docker image contains all of the required software, and can access the associated archived data.

Make sure that you've got Docker installed and running, and also have docker-compose. You'll want to allocate at least 8GB of memory to Docker.

To use the Docker container to access and work with the data, download and extract the compressed tar archive on you computer.

Inside the directory that is created when you extract the archive, you will find a Docker image. Load that image into your Docker environment locally with:

docker load -i pudl-jupyter.tar

Then within that same directory, run:

docker-compose up

This should start a Jupyter Notebook Server, and provide you with a link to connect to the server running on your local computer, beginning with https://127.0.0.1:48512 or https://localhost:48512

You can select the tutorial notebooks from within the notebook interface. The README file contained in the archive and the PUDL Examples repository both provide more details on how to access and work with the data.

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. You can also:

Subscribe to our announcements list for email updates.

Use the Github issue tracker to file bugs, suggest improvements, or ask for help.

Email the project team at pudl@catalyst.coop for private communications.

Follow @CatalystCoop on Twitter.
PUDL Data Release v2023.12.01
zenodo.org
application/gzip, bin +1
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer (2023). PUDL Data Release v2023.12.01 [Dataset]. http://doi.org/10.5281/zenodo.10275052
Explore at:
application/gzip, json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10275052
Dataset updated
Dec 8, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL v2023.12.01 Data Release
This is a data release from the Public Utility Data Liberation (PUDL) project. It's the first data-only release we've published. All of the tables which were previously only available by using the PUDL software package to process the data we previously published in the PUDL SQLite database are now being written into the database itself. This should make it easier for people to access with minimal setup, using a variety of different tools: Python, R, DuckDB, and many others! We are still committed to keeping the data processing pipeline behind this data free and open and transparent, we just don't want everyone to have to install and work with that software it if all they want is the output data!
We are about to do a major reorganization of the database, renaming almost every table and number of columns. This data release is a snapshot of the database before all that change happens, and is meant to provide continuity for users who are already working with the database, so that they can access to all the final 2022 data and migrate to the new database structure at a time of their own choosing over the coming months. We will do another data release soon, containing data through 2022, but with the new table and column names.
Other PUDL v2023.12.01 Resources
PUDL v2023.12.01 Release Notes
PUDL v2023.12.01 Data Dictionary
PUDL v2023.12.01 Documentation
PUDL v2023.12.01 on Kaggle (Corresponds to v9 of the PUDL Project dataset)
PUDL in the AWS Open Data Registry
PUDL v2023.12.01 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2023.12.01/
PUDL v2023.12.01 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2023.12.01/
PUDL v2023.12.01 Software Release
This is the software that was used to produce the data release. It is not necessary to work with the data, but it's linked here to provide transparency and provenance:
Zenodo archive of the GitHub repository
PUDL v2023.12.01 release tag on GitHub
PUDL v2023.12.1 package in the Python Package Index (PyPI)
Contact Us
If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:
Follow us on GitHub
Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter
GitHub Discussions is where we provide user support.
Watch our GitHub Project to see what we're working on.
Email us at hello@catalyst.coop for private communications.
On Mastodon: @CatalystCoop@mastodon.energy
On BlueSky: @catalyst.coop
On Twitter: @CatalystCoop
Play with our data and notebooks on Kaggle
Combine our data with ML models on HuggingFace
Learn more about us on our website: https://catalyst.coop
Subscribe to our announcements list for email updates.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Belfer, Ella (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653158

Public Utility Data Liberation Project (PUDL) Data Release

Explore at:

Dataset updated

Feb 14, 2025

Dataset provided by

Catalyst Cooperative

Authors

Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

PUDL v2025.2.0 Data Release

This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.

One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.

Some potentially breaking changes to be aware of:

In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.

We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.

Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.

New Data

EIA 176

Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.

Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.

EIA 860

Added EIA 860 Multifuel table. See #3438 and #3946.

FERC 1

Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:

out_ferc1_yearly_detailed_income_statements

out_ferc1_yearly_detailed_balance_sheet_assets

out_ferc1_yearly_detailed_balance_sheet_liabilities

SEC Form 10-K Parent-Subsidiary Ownership

We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.

See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:

out_sec10k_parents_and_subsidiaries

core_sec10k_quarterly_filings

core_sec10k_quarterly_exhibit_21_company_ownership

core_sec10k_quarterly_company_information

Expanded Data Coverage

EPA CEMS

Added 2024 Q4 of CEMS data. See #4041 and #4052.

EPA CAMD EIA Crosswalk

In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.

The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.

EIA 860M

Added EIA 860m through December 2024. See #4038 and #4047.

EIA 923

Added EIA 923 monthly data through September 2024. See #4038 and #4047.

EIA Bulk Electricity Data

Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.

EIA 930

Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.

Bug Fixes

Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.

Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.

Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.

Quality of Life Improvements

We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.

Other PUDL v2025.2.0 Resources

PUDL v2025.2.0 Data Dictionary

PUDL v2025.2.0 Documentation

PUDL in the AWS Open Data Registry

PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/

PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/

Zenodo archive of the PUDL GitHub repo for this release

PUDL v2025.2.0 release on GitHub

PUDL v2025.2.0 package in the Python Package Index (PyPI)

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter

GitHub Discussions is where we provide user support.

Watch our GitHub Project to see what we're working on.

Email us at hello@catalyst.coop for private communications.

On Mastodon: @CatalystCoop@mastodon.energy

On BlueSky: @catalyst.coop

On Twitter: @CatalystCoop

Connect with us on LinkedIn

Play with our data and notebooks on Kaggle

Combine our data with ML models on HuggingFace

Learn more about us on our website: https://catalyst.coop

Subscribe to our announcements list for email updates.

Clear search

Close search

Google apps

Main menu

Public Utility Data Liberation Project (PUDL) Data Release

Public Utility Data Liberation Project (PUDL) Data Release

PUDL v2024.8.0 Data Release

New Data Coverage

FERC Form 1

FERC Forms 2, 6, 60, & 714

EIA AEO

EIA 860

EIA 923

EIA 930

EPA CEMS

EIA Bulk Electricity Data

FERC 714

NREL ATB

Data Cleaning

Bug Fixes

Quality of Life Improvements

Other PUDL v2024.8.0 Resources

Contact Us

Public Utility Data Liberation Project (PUDL) Data Release

v2025.8.0 (2025-08-14)

New Data

Expanded Data Coverage

EIA-860M

EIA-923

EIA 930

EIA Bulk Electricity API

EPA CEMS

FERC Form 1

FERC Forms 2, 6 and 60

FERC Form 714

PHMSA Gas Data

Quality of Life Improvements

Bug Fixes

Documentation

New Tests and Data Validations

EIA-930 and FERC-714 Hourly Imputed Demand

Check for entirely null column-years

Public Utility Data Liberation Project (PUDL) Data Release

PUDL v2024.5.0 Data Release

New Data Coverage

EIA-860 & EIA-923

GridPath RA Toolkit

EIA AEO

NREL ATB

EIA-930

EPA CEMS

EIA Bulk Electricity Data

FERC Form 1

Data Cleaning

EIA - FERC1 Record Linkage Model Update

Data for: Making the real: Rhetorical adduction and the Bangladesh...

PUDL Data Release v4.0.0

PUDL Data Release v3.0.0

PUDL Data Release v2.0.0

PUDL Data Release v2023.12.01

PUDL v2023.12.01 Data Release

Other PUDL v2023.12.01 Resources

PUDL v2023.12.01 Software Release

Contact Us

Public Utility Data Liberation Project (PUDL) Data Release