Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper explores the rise of NoSQL (Not Only SQL) databases as a modern alternative that addresses the demands of todayâs dynamic, large-scale data environments. The goal is to provide a comprehensive and accessible overview of NoSQL systems and their increasing significance in modern data management.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compiles heat flow and temperature gradient data from over 44,000 wells across the United States, along with more than 6,000 related geothermal exploration resources. Originally assembled prior to 2014 for the now-retired National Geothermal Data System (NGDS), the collection includes curated well data, scanned field notes, temperature-depth curves, publications, maps, and other supporting documents. SMU Geothermal Laboratory contributed two different nationwide heat flow databases to the project. One is based on equilibrium temperature measurements (over 14,000 sites) and the other is based on corrected bottom hole temperature (BHT) data from oil and gas industry wells (over 30,000 sites). In addition, scanned field notes and temperature-depth curves were associated with approximately 6,000 specific sites in the heat flow database. Records were corrected and overlapping sites in the equilibrium heat flow database were linked between the original SMU National database and the UND Global Heat Flow database. New or related sites, which were not previously published because they lacked full heat flow content, are now included as gradient only information along with their detailed temperature data to fill in data gaps. Finally, SMU submitted over 920 scanned publications, reports, and maps suitable for full text searching. The dataset is provided in two flat-structured zip archives: one containing the curated well data and another containing related resources. An Excel index file is provided for each archive, allowing filtering by well name, location, and description. Data files are labeled with state or institutional origin where available.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Knowledge Management Database (KMD) is a document repository that provides links to archived oil and gas documents as well as to reports stored in the DOE Office of Science and Technology (OSTI) library.
Facebook
TwitterCSV
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F8734253%2F6fb2bf7728a3a187a6d406c0ff2b7a8f%2Fnuclear%20energy%20flag.png?generation=1718487639963302&alt=media" alt="">
This dataset is a curated collection of data related to nuclear energy, covering various aspects such as power plant locations and characteristics, uranium production, electricity generation, safety, and more. The data comes from reputable sources including the U.S. Energy Information Administration (EIA), World Resources Institute, Ember Climate, U.S. Nuclear Regulatory Commission (NRC), and Our World in Data.
Nuclear energy is a critical topic as countries around the world seek to decarbonize their electricity grids and combat climate change. At the same time, concerns around safety, waste disposal, and weapons proliferation lead to ongoing debate about the role nuclear should play in the energy transition.
This dataset enables in-depth analysis to inform these important discussions. Potential use cases include:
I encourage the Kaggle community to explore and build upon this dataset. Potential future collaborations could expand the dataset to include more granular plant-level data, detailed reactor specifications, waste and decommissioning data, country-level policy information, public opinion surveys, and more. I also welcome suggestions for additional datasets to include and new analytics projects and tutorials to undertake. Please do not hesitate to create threads in the "Discussion" or "Suggestions" sections. Together we can create a rich resource to power essential research and decision-making around nuclear energy. đ€
Refer to for some sample data analysis using this dataset. It also shows how to interface with the particular files.
The below table contains brief descriptions of the files in the dataset. For a more in-depth description of a file titled filename, refer to README_${filename}.md. The README files do not contain any actual data.
| File Name | Description |
| --- | --- |
| global_power_plant_database.csv | The Global Power Plant Database is a comprehensive, open-source dataset of grid-scale electricity generating facilities operating worldwide, currently containing nearly 35,000 power plants in 167 countries and representing about 72% of the world's capacity. The database provides detailed information on each power plant, including location, capacity, primary fuel type, owner, and commissioning year. It also includes both reported and estimated annual electricity generation data from 2013 to 2019. |
| nuclear_energy_overview_eia.csv | The data file contains information about nuclear energy in the United States, broken down by year and month. It includes the number of operable nuclear generating units, their net summer capacity, the net generation of electricity from nuclear power, the percentage share of total electricity net generation coming from nuclear power, and the capacity factor of nuclear generating units. This dataset provides a comprehensive overview of the state of nuclear energy in the U.S. over time. |
| number_of_plants_producing_uranium_in_us.csv | The file contains yearly data on the number of uranium mills and plants producing uranium concentrate in the United States. It includes columns for the year, the number of conventional milling operations, non-conventional milling operations, in-situ recovery plants, and byproduct recovery plants active each year. |
| rates_death_from_energy_production_per_twh.csv | The file contains data on the mortality rates associated with different energy sources used for electricity production. It includes columns for the energy source type, the number of deaths per terawatt-hour (TWh) of electricity generated, and the year (consistently 2021 for all entries). The data provides insights into the relative safety of various energy sources in terms of deaths per ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains files used to train and test the Multi-Configuration 23 (MC23) functional and to compare the results to other methods. It includes files to carry out electronic structure calculations. These include molecular geometries in xyz format, OpenMolcas input files for CASSCF calculations, converged CASSCF natural orbitals, OpenMolcas basis set files, and Gaussian 16 formatted checkpoint files for KS-DFT calculations. It also includes data used for data processing such as stoichiometries, absolute energies, and reference energies.
Each file in this dataset is a .tar.xz archive. One can extract them by the following command:
tar -xJf name_of_archive.tar.xz
Below is a description of the content of each archive.
gaussian_16_fchk.tar.xz contains Gaussian 16 formatted checkpoint files for all KS-DFT calculations used in this work. The files in the archive are named as functional/database/system.fchk
openmolcas_basis_set.tar.xz contains OpenMolcas basis set files used for multireference calculations. To reproduce the results in this work, the basis set files should be placed in the âbasis_libraryâ directory in the OpenMolcas installation location.
openmolcas_wave_function.tar.xz contains files needed by OpenMolcas to reproduce the CASSCF wave function used in this work. The files in the archive are named database/system.*.
gaussian_16_stoichiometry_energy.tar.xz and openmolcas_stoichiometry_energy.tar.xz contain files used for data processing.
The database names in the directory names use a slightly different convention than the ones in the article describing MC23. A prefix DS2_ or DS3_ is used to indicate the data set to which a database belongs, and the number of data points is removed from the database name. For example, the MR-MGN-BE8 database from Data Set 2 has a file name DS2_MR-MGN-BE.
Facebook
TwitterThe Electric Vehicle (EV) charging permitting processes' database is a novel, multi-jurisdictional resource designed to contain the required codes and compliances in a structured database. Within this database are three tables, each structured with 287 columns, designed to capture detailed information spanning electrical, structural, zoning, and accessibility aspects, along with data regarding fees, reviews, and process durations. The database contains 99 state-level documents pertaining to 36 U.S. states, in addition to 87 county-level and 101 city-level documents, thus offering a complete overview of guidance and practices regarding permitting. The data was gathered via an Azure-hosted GPT-4o workflow, supplemented by targeted manual Google searches. State and county materials were located and extracted using the GPT-4o model. The Large Language Models (LLM) were used in conjunction with the decision tree framework with targeted prompts to extract the key information. The structured database incorporates Tables 1-3 included below as resources, as well as Table 4 which provides the scores for each document based on the scoring criteria in the paper (to be added after publication). The database can be used to compare and identify the patterns and trends in the requirements across different authorities having jurisdictions. This resource can be used by researchers, policymakers, and project teams. Note: LLMs are known to make mistakes in the interpretation of complex procedural documents and therefore no one should rely solely on this database to inform their own real-world EV infrastructure projects.
Facebook
TwitterCSV
Facebook
TwitterThis document is part of the source library for NRGI's National Oil Company Database, an open database of facts and figures on more than 70 national oil companies worldwide. See the full database at https://nationaloilcompanydata.org/.
Facebook
TwitterTechnical Assistance on the Energy Supply and Demand Database : To assist member countries to establish and maintain a compatible, accurate, reliable and up-to-date energy supply and demand database so as to enable effective management and planning of their national energy sectors. This is ongoing task that will provide assistance to countries to source and document energy data, however, progress has been minimum with PICs not putting priority o data collation and due to shortage of staff.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Power plants with a capacity of at least 1 MW are included in totals. Counties with no
symbol have no utility-scale renewable electric generation. Distributed generation, such as
rooftop solar, is not included. Data is classified using Jenks Natural Breaks method.
Projection is WGS 1984 California (Teale) Alberts (US Feet). Data sources are the California
Energy Commission's Quarterly Fuel and Energy Report and the Wind Generation
Reporting System databases. Data provided is for the year 2024 and is current as of July
1, 2025. For further inquiries contact John Hingtgen at john.hingtgen@energy.ca.gov.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterA national summary of transit Vehicles and Energy Consumption based on data reported by transit agencies to the National Transit Database (NTD) in Report Year 2024. Only Full Reporters report energy consumption, while other reporter types report vehicles.
NTD Data Tables organize and summarize data from the 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2023 Energy Consumption database file and Revenue Vehicles Inventory database file.
Facebook
TwitterDataset Overview
This robust dataset delves into the world of biogas production from livestock farming across the United States, providing a pivotal tool for assessing renewable energy prospects. With a focus on biogas projects derived from various livestock such as cattle, dairy cows, poultry, and swine, this resource is invaluable for stakeholders in the farming industry, renewable energy sectors, and environmental policy-making. Each record encapsulates detailed information about a specific biogas project, making it a treasure trove for research, development, and strategic planning in the renewable energy domain.
Key Features:
đ Project Name: The name of the biogas project. đ**Project Type:** Type of the biogas project. đ City: The city where the project is located. đïž County: The county where the project is situated. đșïž**State:** The state where the project is located. đŹ Digester Type: Type of digester used in the project. đ Status: Current status of the project. đ Year Operational: The year when the project became operational. đ Animal/Farm Type(s): Types of animals or farms used in the project. đ Cattle: Number of cattle involved. đ„**Dairy:** Number of dairy cows involved. đ Poultry: Number of poultry involved. đ Swine: Number of swine involved. đ Co-Digestion: Information on whether co-digestion is being used or not. đŹïž Biogas Generation Estimate (cu-ft/day): Estimated daily biogas production. ⥠Electricity Generated (kWh/yr): Estimated annual electricity generation. đĄ Biogas End Use(s): How the produced biogas is utilized. đż LCFS Pathway?: Information on the Low Carbon Fuel Standard pathway. đ Receiving Utility: The utility company receiving the biogas or electricity. đ Total Emission Reductions (MTCO2e/yr): Estimated total emission reduction. đ Awarded USDA Funding?: Information on whether the project received USDA funding or not. đ Operational Years:Number of years the project has been operational. đŠ Total_Animals: Total number of animals involved in the project. đš Biogas_per_Animal (cu-ft/day): Estimated biogas production per animal. đ± Emission_Reduction_per_Year: Estimated annual emission reduction per animal. đ Electricity_to_Biogas_Ratio: The ratio between electricity generation and biogas production. đïž Total_Waste_kg/day: Estimated daily waste production. âïž Waste_Efficiency: Efficiency of waste conversion to biogas. đ§ Electricity_Efficiency: Efficiency of biogas conversion to electricity.
This dataset stands as a cornerstone for developing strategies that can enhance profitability for farmers, guide investment decisions for energy companies, and contribute significantly to environmental sustainability efforts.
Facebook
TwitterReporting requirements for power plants at least 1 MW are in accordance with 20 CA CCR 304 and 1385. Counties without pie symbols had no utility-scale (commercial) electric generation installed. Distributed renewable generation (e.g. rooftop solar) is not included. Map and data from the California Energy Commission. Energy production data is from the Quarter Fuel and Energy Report (QFER) and the Wind Performance Report System (WPRS) databases. Data is from 2018, and is current as of June 2019. Contact Dylan Kojimoto at (916) 651-0477 or John Hingtgen at (916) 657-4046 for questions.
Facebook
Twitterhttps://www.energy.ca.gov/conditions-of-usehttps://www.energy.ca.gov/conditions-of-use
This map outlines the total renewable electrical generation in gigawatt-hours (GWh) for all counties in California for 2019. Sources below 1 megawatt (MW) were not included in this map. Counties without a symbol had no utility-scale (commercial) renewable electric generation installed. The table depicts the amount of renewable energy production for each energy type for every county. Data obtained from Quarterly Fuel and Energy Reports (QFER) and the Wind Performance Reporting System (WPRS) databases.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Energy Climate dataset consistent with ENTSO-E Pan-European Climatic Database (PECD 2021.3) in CSV and netCDF format
TL;DR: this is a tidy and friendly version of a recreation of ENTSO-E's PECD 2021.3 data by using ERA5: hourly capacity factors for wind onshore, offshore, solar PV and hourly electricity demand are provided. All the data is provided for 28-71 climatic years (1950-2020 for wind and solar, 1982-2010 for demand).
Description
Country averages of energy-climate variables generated using the Python scripts, based on the ENTSO-E's TYNDP 2020 study. For the following scenario's data is available
The time-series are at hourly resolution and the included variables are:
The Files are provided in CSV (.csv) & NetCDF (.nc). The data is given per ENTSO-E's bidding zone as used within the TYNDP2020.
DISCLAIMER: the content of this dataset has been created with the greatest possible care. However, we invite to use the original data for critical applications and studies.
Facebook
TwitterA national summary of transit Vehicles and Energy Consumption based on data reported by transit agencies to the National Transit Database (NTD) in Report Year 2022. Only Full Reporters report energy consumption, while other reporter types report vehicles.
NTD Data Tables organize and summarize data from the 2022 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 Energy Consumption database file and Revenue Vehicles Inventory database file. In years 2014-2021 the data tables that underlie these data were "Vehicles" and "Fuel and Energy".
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset consists of a situational description of Brazilian Amazon's extractive communities regarding availability of energy sources. Data are reported for 50 protected 'extractive reserve' areas, their geographical extents, management plans, population details and sources of/access to energy.The related study examines the access to energy sources for populations within extractive communities in the Brazilian Amazon and reports on the related presence or absence of management plan documentation. Management plans published by the relevant Federal or State authority are used to generate data on mode of access for communities, energy supply level and energy source. Where management plans are not available these values are not present.The dataset consists of two files holding the raw extracted data and a detailed description of the methodology used to generate the data. These are in .xlsx Excel spreadsheet and .docx Word document formats, accessible via MS Office or open office applications.SITUATIONAL_ENERGY_PANORAMA_OF_AMAZONS_EXTRACTIVE_RESERVES.xlsx - each row of the data table represents a named extractive reserve area, with the following variables reported for each: MunicipalityState: abbreviated two-letter code for Brazilian StateDate of Creation: of the protected area itselfManagement Plan: the existence of a management plan covering this area - 'Yes' or 'No'Administrative Sphere: State or FederalArea in Hectares: territorial extent of the extractive reserve Beneficiary Families: living within the protected areaAverage No. of Hectares by Family: calculated as meanAccess: mode of access available to communities - 'fluvial', 'aerial' or 'terrestrial' (multivariate)Energy Supply: single value describing level of supply: 'deficient', 'nonexistent' or 'no information'Energy Source: relevant source accessible to the community - 'fossil', 'mainly fossil', 'public network', 'nonexistent' or 'no information' (multivariate)Methodology_Description.docx - a detailed descriptive document outlining the data collection process from two official digital databases of the Brazilian government and one digital repository of a public interest civil society organization: (i) National Registry of Conservation Units of the Brazilian Ministry of the Environment; (ii) Dynamic Panel and Management Plans of the Chico Mendes Institute for Biodiversity Conservation (ICMBio), an independent agency responsible for managing protected areas in Brazil at the federal level; and (iii) Social and Environmental Institute (ISA) respectively.See the references linked below for these data sources.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises atomic properties of 44K (44 470) molecules selected from the QM9 database. The file names are based on the same indexing system used for QM9.
This dataset includes four types of files:
# B3LYP/6-31G(2df,p) scf=(maxcycle=9999) nosymm output=wfxaimqb -nogui -scp=false -nproc=8 -naat=4 input.wfx and two extracted atomic properties:
3. Electronic Population, N
4. Atomic Energy, E
The aimel_merged_44k.csv presents the concatenation of the 44 470 csv Files.
Additionaly, the aimel_merged_38k.csv presents the concatenation of the 38 876 csv Files. This file corresponds to the version 1.0 of the dataset.
If you find this dataset useful, please cite the original paper:
Meza-GonzĂĄlez, B., RamĂrez-Palma, D.I., Carpio-MartĂnez, P. et al. Quantum Topological Atomic Properties of 44K molecules. Sci Data 11, 945 (2024). https://doi.org/10.1038/s41597-024-03723-0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper explores the rise of NoSQL (Not Only SQL) databases as a modern alternative that addresses the demands of todayâs dynamic, large-scale data environments. The goal is to provide a comprehensive and accessible overview of NoSQL systems and their increasing significance in modern data management.