9 datasets found
  1. Z

    Public Utility Data Liberation Project (PUDL) Data Release

    • data.niaid.nih.gov
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Belfer, Ella (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653158
    Explore at:
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Catalyst Cooperative
    Authors
    Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL v2025.2.0 Data Release

    This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.

    One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.

    Some potentially breaking changes to be aware of:

    In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.

    We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.

    Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.

    New Data

    EIA 176

    Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.

    Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.

    EIA 860

    Added EIA 860 Multifuel table. See #3438 and #3946.

    FERC 1

    Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:

    out_ferc1_yearly_detailed_income_statements

    out_ferc1_yearly_detailed_balance_sheet_assets

    out_ferc1_yearly_detailed_balance_sheet_liabilities

    SEC Form 10-K Parent-Subsidiary Ownership

    We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.

    See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:

    out_sec10k_parents_and_subsidiaries

    core_sec10k_quarterly_filings

    core_sec10k_quarterly_exhibit_21_company_ownership

    core_sec10k_quarterly_company_information

    Expanded Data Coverage

    EPA CEMS

    Added 2024 Q4 of CEMS data. See #4041 and #4052.

    EPA CAMD EIA Crosswalk

    In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.

    The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.

    EIA 860M

    Added EIA 860m through December 2024. See #4038 and #4047.

    EIA 923

    Added EIA 923 monthly data through September 2024. See #4038 and #4047.

    EIA Bulk Electricity Data

    Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.

    EIA 930

    Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.

    Bug Fixes

    Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.

    Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.

    Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.

    Quality of Life Improvements

    We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.

    Other PUDL v2025.2.0 Resources

    PUDL v2025.2.0 Data Dictionary

    PUDL v2025.2.0 Documentation

    PUDL in the AWS Open Data Registry

    PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/

    PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/

    Zenodo archive of the PUDL GitHub repo for this release

    PUDL v2025.2.0 release on GitHub

    PUDL v2025.2.0 package in the Python Package Index (PyPI)

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

    Follow us on GitHub

    Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter

    GitHub Discussions is where we provide user support.

    Watch our GitHub Project to see what we're working on.

    Email us at hello@catalyst.coop for private communications.

    On Mastodon: @CatalystCoop@mastodon.energy

    On BlueSky: @catalyst.coop

    On Twitter: @CatalystCoop

    Connect with us on LinkedIn

    Play with our data and notebooks on Kaggle

    Combine our data with ML models on HuggingFace

    Learn more about us on our website: https://catalyst.coop

    Subscribe to our announcements list for email updates.

  2. Public Utility Data Liberation Project (PUDL) Data Release

    • zenodo.org
    bin, json, zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer (2024). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. http://doi.org/10.5281/zenodo.13346011
    Explore at:
    zip, json, binAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL v2024.8.0 Data Release

    This is our regular quarterly release for 2024Q3. It includes quarterly updates to all datasets that are updated with quarterly or higher frequency by their publishers, including EIA-860M, EIA-923 (YTD data), EIA-930, the EIA’s bulk electricity API data (used to fill in missing fuel prices), and the EPA CEMS hourly emissions data.

    Annual datasets which have been published since our last quarterly release have also been integrated. These include FERC Forms 1, 2, 6, 60, and 714, and the NREL ATB.

    This release also includes provisional versions of the annual 2023 EIA-860 and EIA-923 datasets, whose final release will not happen until the fall.

    New Data Coverage

    FERC Form 1

    • Integrated FERC Form 1 data from 2023 into the main PUDL SQLite DB. See issue #3700 and PR #3701. This required updating to a new version of the catalystcoop.ferc_xbrl_extractor package because there are now multiple XBRL taxonomies in use by FERC in different years, or even within the same year. See this PR for more details, as well as issue #3544 and PR #3710.

    FERC Forms 2, 6, 60, & 714

    • Updated the ferc_to_sqlite settings to extract 2023 XBRL data for FERC Forms 2, 6 60, and 714 and add them to their respective SQLite databases. Note that this data is not yet being processed beyond the conversion from XBRL to SQLite. See PR #3710

    EIA AEO

    EIA 860

    • Added EIA 860 early release data from 2023. This included adding a new tab with proposed energy storage generators as well as adding a number of new columns regarding energy storage and solar generators. See issue #3676 and PR #3681.

    • Added EIA 860m data through June 2024. See issue #3759 and PR #3767.

    EIA 923

    • Added EIA 923 early release data from 2023. See #3719 and PR #3721.

    • Added EIA 923 monthly data through May as part of the Q2 quarterly release. See #3760 and #3768.

    EIA 930

    • Added EIA 930 hourly data through the end of July as part of the Q2 quarterly release. See #3761 and #3789.

    EPA CEMS

    • Added 2024 Q2 of CEMS data. See #3762 and #3769.

    EIA Bulk Electricity Data

    • Updated the EIA Bulk Electricity data archive to include data that was available as of 2024-08-01, which covers up through 2024-05-01 (3 months more than the previously used archive). See #3763 and PR #3785.

    FERC 714

    NREL ATB

    • Added 2024 NREL ATB data. This includes adding a new tax credit case, model_tax_credit_case_nrelatb, a breakout of capex_grid_connection_per_kw for all technologies, and more detailed nuclear breakdowns of fuel_cost_per_mwh. Simultaneously, updated the docs.dev.existing_data_updates documentation to make it easier to add future years of data. See #3706 and #3719.

    • Updated NREL ATB data to include error corrections in the 2024 data. See #3777 and PR #3778.

    Data Cleaning

    • When generator_operating_date values are too inconsistent to be harvested successfully, we now take the last reported date in EIA 860 and 860M. See #423 and PR #3967.

    • Added the generator_operating_date field into core_eia860m_changelog_generators, adding 860M reported generator operating dates into the changelog table. This table is not harvested, and thus does not affect the generator_operating_date values reported in other core EIA tables. See #3722 and PR #3751.

    Bug Fixes

    • Disabled filling of missing values using rolling averages for the fuel_cost_per_mmbtu column in the out_eia923_fuel_receipts_costs table, as it was resulting in some anomlously high fuel prices. See #3716. This results in about 2% more records in the table being left NA after filling with the average prices for that fuel type for the state and month found in the bulk EIA API data.

    Quality of Life Improvements

    • The full ETL settings are now read directly from etl_full.yml instead of using default values defined in the settings classes. This also results in the settings showing up in the Dagster UI Launchpad, which previously they didn’t, leading to confusion when trying to re-run the FERC to SQLite conversions. See #3710.

    • mlflow experiment tracking has been disabled by default when running the DAG, since it is only really helpful during development of new record linkage or other ML workflows. See #3710.

    Other PUDL v2024.8.0 Resources

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data.

  3. Public Utility Data Liberation Project (PUDL) Data Release

    • zenodo.org
    bin, json, zip
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. http://doi.org/10.5281/zenodo.16878930
    Explore at:
    zip, json, binAvailable download formats
    Dataset updated
    Aug 15, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen; Austen Sharpe; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Kathryn Mazaitis; Marianne Hoogeveen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    v2025.8.0 (2025-08-14)

    This is a regular quarterly release of PUDL. It includes new 2024 annual updates for a number of datasets (FERC Forms 2, 6, 60, & 714), and a minor update to the 2024 FERC Form 1 data that includes late filings & revisions. It also includes year-to-date updates for the monthly and quarterly datasets, including EIA-860M, EIA-923, EIA-930, and the EPA CEMS hourly emissions. There were also a number of data processing bug fixes and data usability improvements. See the full notes below for details.

    New Data

    Expanded Data Coverage

    EIA-860M

    • Updated EIA-860M monthly generator report with newly published data for May and June of 2025. See issue #4379 and PR #4536.

    EIA-923

    • Added EIA-923 data through May 2025. See #4516 and #4538.

    EIA 930

    • Updated EIA 930 data published up through the beginning of August 2025. See #4517 and PR #4523.

    EIA Bulk Electricity API

    • Updated the EIA Bulk Electricity data to include data published up through the beginning of August 2025. See #4519 and PR #4523.

    EPA CEMS

    • Added EPA CEMS data through June 2025. See #4518 and #4531.

    FERC Form 1

    • Updated FERC Form 1 2024 data to include late respondents. See #4493 and #4507.

    FERC Forms 2, 6 and 60

    • Updated our extraction of FERC Forms 2, 6, and 60 to raw SQLite databases to include 2024 data. See #4418 and #4433.

    FERC Form 714

    • Integrated 2024 data for FERC Form 714. See issue #4409 and PR #4530.

    PHMSA Gas Data

    • Extracted 2023 and 2024 PHMSA distribution and transmission data to raw assets. This data is not currently published to the PUDL database. See #4449 and #4470.

    • Extracted 1970 through 1989 PHMSA transmission data to raw assets. This data is not currently published to the PUDL database. See #3290 and #4500.

    Quality of Life Improvements

    • The output of dbt_helper update-tables now conforms to the format that our pre-commit hooks expect, reducing annoying back-and-forth and diffs. See #4119 and #4401.

    • Improved behavior of dbt_helper when interacting with row count test definitions as well as updating the row counts stored in dbt seed tables: the logic for writing a new table dbt schema no longer includes automatically adding a row count test. Also, the logic for updating row counts now depends on whether a test has been defined in the dbt schema, whether any existing row counts for that table are present in the seed table, as well as user provided settings such as --clobber.

    • Stopped running code checks in CI when only the documentation has changed. See issue #4410 and PR #4429.

    • Added utility_id_ferc1_dbf and utility_id_ferc1_xbrl columns into all ferc1 output tables. See #4365 and PR #4528.

    Bug Fixes

    Documentation

    New Tests and Data Validations

    EIA-930 and FERC-714 Hourly Imputed Demand

    Added checks which ensure that only hourly electricity demand values which are flagged for imputation change significantly from their reported values before and after the imputation. Check that the missingness of various columns in the hourly reported demand and imputed demand are within expected ranges. Explicitly flag years of which are dropped due to insufficient data for meaningful imputation with BAD_YEAR. Affected tables include out_eia930_hourly_operations, out_eia930_hourly_subregion_demand, and out_ferc714_hourly_planning_area_demand. See PR #4334.

    Check for entirely null column-years

    Previously we had a data validation check that ensured there were no entirely null columns applied to a handful of tables. Such columns were typically the result of typos or failures to update column names, or application

  4. Public Utility Data Liberation Project (PUDL) Data Release

    • zenodo.org
    application/gzip, bin +1
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer (2024). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. http://doi.org/10.5281/zenodo.11292273
    Explore at:
    bin, json, application/gzipAvailable download formats
    Dataset updated
    May 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL v2024.5.0 Data Release

    We've just completed our quarterly integration of EIA data sources for 2024Q2 (in support of RMI's Utility Transition Hub) and have also added a bunch of new tables over the last few months in an effort to better support energy system modelers (with support from GridLab).

    New Data Coverage

    EIA-860 & EIA-923

    GridPath RA Toolkit

    EIA AEO

    • Extracted tables 13, 15, 20, and 54 from the EIA Annual Energy Outlook 2023, which include future projections related to electric power and renewable energy through the year 2050, across a variety of scenarios. See issue #3368 and PR #3538.
    • Added new tables from EIA AEO table 54:
      • :ref:`core_eiaaeo_yearly_projected_generation_in_electric_sector_by_technology` contains generation capacity & generation projections for the electric sector, broken out by technology type. See issue #3581 and PR #3582.
      • :ref:`core_eiaaeo_yearly_projected_generation_in_end_use_sectors_by_fuel_type` contains generation capacity & generation projections for the electric sector, broken out by technology type. See issue #3581 and PR #3598.
      • :ref:`core_eiaaeo_yearly_projected_electric_sales` contains electric sales projections until 2050, broken out by customer type. See issue #3581 and PR #3617.

    NREL ATB

    • Added new NREL ATB tables with annual technology cost and performance projections. See issue #3465 and PRs #3498, #3570.

    EIA-930

    EPA CEMS

    • Added 2024 Q1 of CEMS data. See issue #3620 and PR #3624.

    EIA Bulk Electricity Data

    • Updated the EIA Bulk Electricity data archive to include data that was available as of 2024-05-01, which covers up through 2024-02-01 (3 months more than the previously used archive). See PR #3615.

    FERC Form 1

    Data Cleaning

    • When generator_operating_date values are too inconsistent to be harvested successfully, we now take the max date within a year and attempt to harvest again, to rescue records lost because of inconsistent month reporting in EIA 860 and 860M. See issue #3340 and PR #3419. This change also fixed a bug that was preventing other columns harvested with a special process from being saved.
    • When ingesting FERC 1 XBRL filings, we now take the most recent non-null value instead of the value from the latest filing that applies for a specific row. This means that we no longer lose data if a utility posts a FERC filing with only a small number of updated values. See issue #3309 and PR #3545.

    EIA - FERC1 Record Linkage Model Update

    We merged in a refactor of the EIA plant parts to FERC1 plants record linkage model, which was generously supported by a CCAI Innovation Grant. This replaced the linear regression model with a model built

  5. Q

    Data for: Making the real: Rhetorical adduction and the Bangladesh...

    • data.qdr.syr.edu
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph O'Mahoney; Joseph O'Mahoney (2023). Data for: Making the real: Rhetorical adduction and the Bangladesh Liberation War [Dataset]. http://doi.org/10.5064/F6M2H9VQ
    Explore at:
    pdf(469931), pdf(1855397), pdf(643666), pdf(1143595), pdf(1137028), pdf(729721), pdf(574915), pdf(1370863), pdf(1326238), pdf(1481295), pdf(1283462), pdf(592212), application/x-json-hypothesis(69650), pdf(1031805), pdf(444483), pdf(752959), pdf(690519), pdf(856413), pdf(244748), pdf(527574), pdf(1601197), pdf(635560), pdf(2210653), pdf(2323480), pdf(1417185), pdf(1671062), pdf(583011), pdf(666643), pdf(1229547), pdf(552236), pdf(3360994), pdf(1180783), pdf(1043689), pdf(895213), pdf(1711382), pdf(1362381)Available download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    Qualitative Data Repository
    Authors
    Joseph O'Mahoney; Joseph O'Mahoney
    License

    https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions

    Time period covered
    1970 - 1974
    Area covered
    Bangladesh, United States, United Nations, United States
    Description

    This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the publisher's website. The overarching empirical research question of the paper is “why did states recognize Bangladesh as a state?” and, more specifically, “why did (most of) the international community first condemn and then accept Bangladesh as a state?”. The goal of the empirical section of the paper was to do theory-building process-tracing of the decisions to recognize Bangladesh, that is, to build a theoretical explanation from the empirical evidence of a particular case, and then inferring that an analytically general mechanism exists. Data generation After immersing myself in the secondary literature and the archival material that I had collected for the prior doctoral project, I had an idea for a skeleton causal mechanism, i.e. that the withdrawal of Indian troops from Bangladesh had somehow changed the status of recognition, i.e. legitimated recognition. In order to assess this idea, I then consulted some theory from international relations, psychology, sociology, and cognitive science, on how decisions are made and how arguments work, in order to hypothesize a causal mechanism. This causal mechanism, elucidated in the paper, was rhetorical adduction; basically that states try to win arguments (thus changing the behavior of relatively uncommitted audiences relative to some policy) by linking some empirical state of affairs with their argument and then bringing that empirical state of affairs about. In this Bangladesh case, this meant that some actors argued that although India’s invasion and occupation of East Pakistan made recognition of Bangladesh problematic, the withdrawal of Indian troops from Bangladesh would dismiss or undercut the critique. At this point, I formulated some observable implications of this idea, such as that if this is what had actually been going on, the states making the argument (e.g. Bangladesh and India) would have to actually have made the argument, and states would have explicitly conditioned their recognition policy decision on the withdrawal of Indian troops. In order to find out whether there was any evidence for these observable implications, I consulted three main types of evidence; 1) public statements by state representatives in the press and at the UN (using the UN verbatim meeting records), 2) UK political and diplomatic archives and 3) US political and diplomatic archives. As it happens, the UK was heavily involved in discussions surrounding recognition and the US was not (US President Richard Nixon and National Security Adviser Henry Kissinger were more concerned with other issues, like supporting West Pakistan and also organizing the historic visit to the People’s Republic of China), so that almost all of the relevant evidence came from UK archives. A clear limitation of this sampling frame is that it relies on 3rd party evaluations of internal deliberations of most of the states involved. This is less of a problem than it might otherwise be because there seems little reason to explicitly condition recognition on troop withdrawal in private and secret/confidential bilateral communication with the UK if it is irrelevant to internal deliberations. If there had been some clear self-interest in misrepresenting, in this type of communication, then it would affect the plausibility of the causal claims. I collected most of the documents used in the paper from the National Archives at Kew in the UK during two visits, one in January 2011 and another in July 2013. The first visit was to collect data for my doctoral dissertation, which was a prior, separate project from this paper. While I was finishing the Bangladesh case for my dissertation, I began to have another idea about the material. That is, I started to think that a slightly different type of conceptual/theoretical argument was relevant to a different empirical aspect of the Bangladesh case. However, as I had not had that in mind when initially collecting archival documents, I arranged a second visit to search for more information more directly relevant to this second puzzle. The documents primarily come from a series of folders from the Foreign and Commonwealth Offices’ archives and the Premiers’ Archives that I found via two methods. First, I used the citations in Musson 2008 to identify potentially important or relevant material and then made a list of all the folders that that material was contained in. Second, I performed keyword searches for recognition and Bangladesh in the National Archives database search engine. While I was in the archives, I made copies of almost every single document in the folders that I had previously identified. I excluded documents that were obvious duplicates or that had no readable text. Data analysis Data Analysis for this paper involved reading through all of the documents, constructing a detailed timeline of who said what when and who did what when, and then...

  6. PUDL Data Release v4.0.0

    • zenodo.org
    application/gzip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush (2023). PUDL Data Release v4.0.0 [Dataset]. http://doi.org/10.5281/zenodo.6349861
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL Data Release 4.0.0

    This is a data release from the Public Utility Data Liberation (PUDL) project.

    Using This Data

    The data in this archive is stored in a combination of SQLite database files, and Apache Parquet datasets. It can be used as a standalone resource, or in conjunction with the PUDL software. The PUDL documentation contains data dictionaries for many of the data tables.

    If you want to use the data in conjunction with the PUDL software, we've included a Docker image within the archive that will run a Jupyter Notebook Server containing examples of use based on our PUDL Examples repository. This Docker image contains all of the required software, and can access the associated archived data.

    Make sure that you've got Docker installed and running, and also have docker-compose. You'll want to allocate at least 8GB of memory to Docker.

    To use the Docker container to access and work with the data, download and extract the compressed tar archive on you computer.

    Inside the directory that is created when you extract the archive, you will find a Docker image. Load that image into your Docker environment locally with:

    docker load -i pudl-jupyter.tar

    Then within that same directory, run:

    docker-compose up

    This should start a Jupyter Notebook Server, and provide you with a link to connect to the server running on your local computer, beginning with https://127.0.0.1:48512 or https://localhost:48512

    You can select the tutorial notebooks from within the notebook interface. The README file contained in the archive and the PUDL Examples repository both provide more details on how to access and work with the data.

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. You can also:

  7. PUDL Data Release v3.0.0

    • zenodo.org
    application/gzip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush (2023). PUDL Data Release v3.0.0 [Dataset]. http://doi.org/10.5281/zenodo.5701406
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Bennett Norman; Trenton Bush; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik; Bennett Norman; Trenton Bush
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL Data Release 3.0.0

    This is a data release from the Public Utility Data Liberation (PUDL) project.

    Using This Data

    The data in this archive is stored in a combination of SQLite database files, and Apache Parquet datasets. It can be used as a standalone resource, or in conjunction with the PUDL software. The PUDL documentation contains data dictionaries for many of the data tables.

    If you want to use the data in conjunction with the PUDL software, we've included a Docker image within the archive that will run a Jupyter Notebook Server containing examples of use based on our PUDL Examples repository. This Docker image contains all of the required software, and can access the associated archived data.

    Make sure that you've got Docker installed and running, and also have docker-compose. You'll want to allocate at least 8GB of memory to Docker.

    To use the Docker container to access and work with the data, download and extract the compressed tar archive on you computer.

    Inside the directory that is created when you extract the archive, you will find a Docker image. Load that image into your Docker environment locally with:

    docker load -i pudl-jupyter.tar

    Then within that same directory, run:

    docker-compose up

    This should start a Jupyter Notebook Server, and provide you with a link to connect to the server running on your local computer, beginning with https://127.0.0.1:48512 or https://localhost:48512

    You can select the tutorial notebooks from within the notebook interface. The README file contained in the archive and the PUDL Examples repository both provide more details on how to access and work with the data.

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. You can also:

  8. PUDL Data Release v2.0.0

    • zenodo.org
    application/gzip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik (2023). PUDL Data Release v2.0.0 [Dataset]. http://doi.org/10.5281/zenodo.5214231
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Austen Sharpe; Steven Winter; Ethan Welty; Ethan Welty; Jan Rousik; Christina M. Gosnell; Austen Sharpe; Steven Winter; Jan Rousik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL Data Release 2.0.0

    This is a data release from the Public Utility Data Liberation (PUDL) project.

    Using This Data

    The data in this archive is stored in a combination of SQLite database files, and Apache Parquet datasets. It can be used as a standalone resource, or in conjunction with the PUDL software. The PUDL documentation contains data dictionaries for many of the data tables.

    If you want to use the data in conjunction with the PUDL software, we've included a Docker image within the archive that will run a Jupyter Notebook Server containing examples of use based on our PUDL Examples repository. This Docker image contains all of the required software, and can access the associated archived data.

    Make sure that you've got Docker installed and running, and also have docker-compose. You'll want to allocate at least 8GB of memory to Docker.

    To use the Docker container to access and work with the data, download and extract the compressed tar archive on you computer.

    Inside the directory that is created when you extract the archive, you will find a Docker image. Load that image into your Docker environment locally with:

    docker load -i pudl-jupyter.tar

    Then within that same directory, run:

    docker-compose up

    This should start a Jupyter Notebook Server, and provide you with a link to connect to the server running on your local computer, beginning with https://127.0.0.1:48512 or https://localhost:48512

    You can select the tutorial notebooks from within the notebook interface. The README file contained in the archive and the PUDL Examples repository both provide more details on how to access and work with the data.

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. You can also:

  9. PUDL Data Release v2023.12.01

    • zenodo.org
    application/gzip, bin +1
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer (2023). PUDL Data Release v2023.12.01 [Dataset]. http://doi.org/10.5281/zenodo.10275052
    Explore at:
    application/gzip, json, binAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer; Austen Sharpe; Bennett Norman; Trenton Bush; Zach Schira; Katherine Lamb; Dazhong Xia; Ella Belfer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL v2023.12.01 Data Release

    This is a data release from the Public Utility Data Liberation (PUDL) project. It's the first data-only release we've published. All of the tables which were previously only available by using the PUDL software package to process the data we previously published in the PUDL SQLite database are now being written into the database itself. This should make it easier for people to access with minimal setup, using a variety of different tools: Python, R, DuckDB, and many others! We are still committed to keeping the data processing pipeline behind this data free and open and transparent, we just don't want everyone to have to install and work with that software it if all they want is the output data!

    We are about to do a major reorganization of the database, renaming almost every table and number of columns. This data release is a snapshot of the database before all that change happens, and is meant to provide continuity for users who are already working with the database, so that they can access to all the final 2022 data and migrate to the new database structure at a time of their own choosing over the coming months. We will do another data release soon, containing data through 2022, but with the new table and column names.

    Other PUDL v2023.12.01 Resources

    PUDL v2023.12.01 Software Release

    This is the software that was used to produce the data release. It is not necessary to work with the data, but it's linked here to provide transparency and provenance:

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Belfer, Ella (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653158

Public Utility Data Liberation Project (PUDL) Data Release

Explore at:
Dataset updated
Feb 14, 2025
Dataset provided by
Catalyst Cooperative
Authors
Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

PUDL v2025.2.0 Data Release

This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.

One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.

Some potentially breaking changes to be aware of:

In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.

We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.

Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.

New Data

EIA 176

Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.

Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.

EIA 860

Added EIA 860 Multifuel table. See #3438 and #3946.

FERC 1

Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:

out_ferc1_yearly_detailed_income_statements

out_ferc1_yearly_detailed_balance_sheet_assets

out_ferc1_yearly_detailed_balance_sheet_liabilities

SEC Form 10-K Parent-Subsidiary Ownership

We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.

See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:

out_sec10k_parents_and_subsidiaries

core_sec10k_quarterly_filings

core_sec10k_quarterly_exhibit_21_company_ownership

core_sec10k_quarterly_company_information

Expanded Data Coverage

EPA CEMS

Added 2024 Q4 of CEMS data. See #4041 and #4052.

EPA CAMD EIA Crosswalk

In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.

The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.

EIA 860M

Added EIA 860m through December 2024. See #4038 and #4047.

EIA 923

Added EIA 923 monthly data through September 2024. See #4038 and #4047.

EIA Bulk Electricity Data

Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.

EIA 930

Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.

Bug Fixes

Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.

Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.

Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.

Quality of Life Improvements

We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.

Other PUDL v2025.2.0 Resources

PUDL v2025.2.0 Data Dictionary

PUDL v2025.2.0 Documentation

PUDL in the AWS Open Data Registry

PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/

PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/

Zenodo archive of the PUDL GitHub repo for this release

PUDL v2025.2.0 release on GitHub

PUDL v2025.2.0 package in the Python Package Index (PyPI)

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

Follow us on GitHub

Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter

GitHub Discussions is where we provide user support.

Watch our GitHub Project to see what we're working on.

Email us at hello@catalyst.coop for private communications.

On Mastodon: @CatalystCoop@mastodon.energy

On BlueSky: @catalyst.coop

On Twitter: @CatalystCoop

Connect with us on LinkedIn

Play with our data and notebooks on Kaggle

Combine our data with ML models on HuggingFace

Learn more about us on our website: https://catalyst.coop

Subscribe to our announcements list for email updates.

Search
Clear search
Close search
Google apps
Main menu