Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data model to generate datasets used in the tests of the article: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This submission contains cleaned and filtered data from the Environmental Protection Agency Clean Air Markets CAM database of thermal power plant operation and performance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
T10I4D100K is a renowned synthetic database generated using the IBM Quest generator. This database is widely used to evaluate various frequent and correlated pattern mining algorithms.
Facebook
TwitterCreating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.
Code Url: https://github.com/intellisenseCodez/faker-data-generator
Facebook
TwitterThe HazWaste database contains generator (companies and/or individuals) site and mailing address information, waste generation, the amount of waste generated etc. of all the hazardous waste generators in Vermont. Database was developed in early 1990's for program management and to meet EPA Authorization requirements. The database has been updated to more modern data systems periodically.�
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global database testing tool market is anticipated to experience substantial growth in the coming years, driven by factors such as the increasing adoption of cloud-based technologies, the rising demand for data quality and accuracy, and the growing complexity of database systems. The market is expected to reach a value of USD 1,542.4 million by 2033, expanding at a CAGR of 7.5% during the forecast period of 2023-2033. Key players in the market include Apache JMeter, DbFit, SQLMap, Mockup Data, SQL Test, NoSQLUnit, Orion, ApexSQL, QuerySurge, DBUnit, DataFactory, DTM Data Generator, Oracle, SeLite, SLOB, and others. The North American region is anticipated to hold a significant share of the database testing tool market, followed by Europe and Asia Pacific. The increasing adoption of cloud-based database testing services, the presence of key market players, and the growing demand for data testing and validation are driving the market growth in North America. Asia Pacific, on the other hand, is expected to experience the highest growth rate due to the rapidly increasing IT spending, the emergence of new technologies, and the growing number of businesses investing in data quality management solutions.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is synthetically generated fake data designed to simulate a realistic e-commerce environment.
To provide large-scale relational datasets for practicing database operations, analytics, and testing tools like DuckDB, Pandas, and SQL engines. Ideal for benchmarking, educational projects, and data engineering experiments.
int): Unique identifier for each customer string): Customer full name string): Customer email address string): Customer gender ('Male', 'Female', 'Other') date): Date customer signed up string): Customer country of residence int): Unique identifier for each product string): Name of the product string): Product category (e.g., Electronics, Books) float): Price per unit int): Available stock count string): Product brand name int): Unique identifier for each order int): ID of the customer who placed the order (foreign key to Customers) date): Date when order was placed float): Total amount for the order string): Payment method used (Credit Card, PayPal, etc.) string): Country where the order is shipped int): Unique identifier for each order item int): ID of the order this item belongs to (foreign key to Orders) int): ID of the product ordered (foreign key to Products) int): Number of units ordered float): Price per unit at order time int): Unique identifier for each review int): ID of the reviewed product (foreign key to Products) int): ID of the customer who wrote the review (foreign key to Customers) int): Rating score (1 to 5) string): Text content of the review date): Date the review was written https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9179978%2F7681afe8fc52a116ff56a2a4e179ad19%2FEDR.png?generation=1754741998037680&alt=media" alt="">
The script saves two folders inside the specified output path:
csv/ # CSV files
parquet/ # Parquet files
MIT License
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The NRCS National Water and Climate Center Report Generator web-based application uses long-term snowpack, precipitation, reservoir, streamflow, and soils data from a variety of quality-controlled sources to create reports. Users can choose from predefined templates or build custom reports. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats such as JPG and PNG. The Report Generator network incorporates data from many agency databases. The NRCS snow survey flagship database, the Water and Climate Information System (WCIS), provides a wealth of data, including manually-collected snow course data and information from automated Snow Telemetry (SNOTEL) and Soil Climate Analysis Network (SCAN) stations across the United States. Report Generator also uses precipitation, streamflow, and reservoir data from the U.S. Army Corps of Engineers (USACE), the U.S. Bureau of Reclamation (BOR), the Applied Climate Information System (ACIS), the U.S. Geological Survey (USGS), various water districts and other entities. In addition to creating reports, Report Generator lets you view information on sites, including metadata, such as elevation, latitude/longitude and hydrologic unit code (HUC). You can also view photos of the site, including a site map (in Google maps when available). Report Generator creates reports in both tabular and chart format. Single-station and multiple-station charting is also supported. Data may be displayed in either English or Metric units. Farmers, municipalities, water and hydroelectric utilities, environmental organizations, fish and wildlife managers, tribal nations, reservoir managers, recreationists, wetlands managers, urban developers, transportation departments, and research organizations regularly use these data and products. This release has several new features which focus on improving the way reports are specified and how they are displayed. Multi-station charting is also supported in this release. Resources in this dataset:Resource Title: Report Generator 2.0. File Name: Web Page, url: https://wcc.sc.egov.usda.gov/reportGenerator/ Create custom reports and charts from multiple data sources. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats, such as JPG and PNG.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Export: Electric Motor & Generator data was reported at 1.038 USD bn in Dec 2019. This records an increase from the previous number of 887.863 USD mn for Nov 2019. China Export: Electric Motor & Generator data is updated monthly, averaging 384.006 USD mn from May 1993 (Median) to Dec 2019, with 320 observations. The data reached an all-time high of 1.089 USD bn in May 2019 and a record low of 4.440 USD mn in Sep 1993. China Export: Electric Motor & Generator data remains active status in CEIC and is reported by General Administration of Customs. The data is categorized under Global Database’s China – Table CN.JA: USD: Export by Major Commodity: Value.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PFam Domains and biological process GO categories for the four rhizobia strains. Predicted proteins related to multiple GO biological process categories are joined together with the pipe character. (XLSX 639Â kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: YoY: Number of Loss Making Enterprise data was reported at 14.173 % in Oct 2015. This records an increase from the previous number of 13.953 % for Sep 2015. China Generator & Generator Set: YoY: Number of Loss Making Enterprise data is updated monthly, averaging 5.357 % from Jan 2006 (Median) to Oct 2015, with 89 observations. The data reached an all-time high of 56.122 % in Aug 2012 and a record low of -13.529 % in Aug 2014. China Generator & Generator Set: YoY: Number of Loss Making Enterprise data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: Total Liability data was reported at 299.834 RMB bn in Oct 2015. This records an increase from the previous number of 294.039 RMB bn for Sep 2015. China Generator & Generator Set: Total Liability data is updated monthly, averaging 181.089 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 299.834 RMB bn in Oct 2015 and a record low of 20.835 RMB bn in Dec 2003. China Generator & Generator Set: Total Liability data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PUDL v2025.2.0 Data Release
This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.
One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.
Some potentially breaking changes to be aware of:
In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.
We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.
Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.
New Data
EIA 176
Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.
Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.
EIA 860
Added EIA 860 Multifuel table. See #3438 and #3946.
FERC 1
Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:
out_ferc1_yearly_detailed_income_statements
out_ferc1_yearly_detailed_balance_sheet_assets
out_ferc1_yearly_detailed_balance_sheet_liabilities
SEC Form 10-K Parent-Subsidiary Ownership
We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.
See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:
out_sec10k_parents_and_subsidiaries
core_sec10k_quarterly_filings
core_sec10k_quarterly_exhibit_21_company_ownership
core_sec10k_quarterly_company_information
Expanded Data Coverage
EPA CEMS
Added 2024 Q4 of CEMS data. See #4041 and #4052.
EPA CAMD EIA Crosswalk
In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.
The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.
EIA 860M
Added EIA 860m through December 2024. See #4038 and #4047.
EIA 923
Added EIA 923 monthly data through September 2024. See #4038 and #4047.
EIA Bulk Electricity Data
Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.
EIA 930
Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.
Bug Fixes
Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.
Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.
Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.
Quality of Life Improvements
We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.
Other PUDL v2025.2.0 Resources
PUDL v2025.2.0 Data Dictionary
PUDL v2025.2.0 Documentation
PUDL in the AWS Open Data Registry
PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/
PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/
Zenodo archive of the PUDL GitHub repo for this release
PUDL v2025.2.0 release on GitHub
PUDL v2025.2.0 package in the Python Package Index (PyPI)
Contact Us
If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:
Follow us on GitHub
Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter
GitHub Discussions is where we provide user support.
Watch our GitHub Project to see what we're working on.
Email us at hello@catalyst.coop for private communications.
On Mastodon: @CatalystCoop@mastodon.energy
On BlueSky: @catalyst.coop
On Twitter: @CatalystCoop
Connect with us on LinkedIn
Play with our data and notebooks on Kaggle
Combine our data with ML models on HuggingFace
Learn more about us on our website: https://catalyst.coop
Subscribe to our announcements list for email updates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Jamaica Imports from Belgium of Producer Gas or Water Gas Generators was US$443 during 2018, according to the United Nations COMTRADE database on international trade. Jamaica Imports from Belgium of Producer Gas or Water Gas Generators - data, historical chart and statistics - was last updated on October of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: Total Asset data was reported at 458.934 RMB bn in Oct 2015. This records an increase from the previous number of 451.458 RMB bn for Sep 2015. China Generator & Generator Set: Total Asset data is updated monthly, averaging 299.527 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 458.934 RMB bn in Oct 2015 and a record low of 28.965 RMB bn in Dec 2003. China Generator & Generator Set: Total Asset data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: Number of Employee: Average data was reported at 246.196 Person th in Dec 2013. This records an increase from the previous number of 212.926 Person th for Dec 2012. China Generator & Generator Set: Number of Employee: Average data is updated monthly, averaging 151.600 Person th from Dec 2003 (Median) to Dec 2013, with 64 observations. The data reached an all-time high of 246.196 Person th in Dec 2013 and a record low of 69.115 Person th in Dec 2003. China Generator & Generator Set: Number of Employee: Average data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: Loss Amount: Year to Date data was reported at 3.841 RMB bn in Oct 2015. This records an increase from the previous number of 3.342 RMB bn for Sep 2015. China Generator & Generator Set: Loss Amount: Year to Date data is updated monthly, averaging 0.902 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 3.841 RMB bn in Oct 2015 and a record low of 0.061 RMB bn in Feb 2006. China Generator & Generator Set: Loss Amount: Year to Date data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: Account Receivable data was reported at 133.527 RMB bn in Oct 2015. This records an increase from the previous number of 126.823 RMB bn for Sep 2015. China Generator & Generator Set: Account Receivable data is updated monthly, averaging 82.475 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 133.527 RMB bn in Oct 2015 and a record low of 4.207 RMB bn in Dec 2003. China Generator & Generator Set: Account Receivable data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: Product Inventory data was reported at 22.855 RMB bn in Oct 2015. This records a decrease from the previous number of 22.964 RMB bn for Sep 2015. China Generator & Generator Set: Product Inventory data is updated monthly, averaging 14.001 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 22.964 RMB bn in Sep 2015 and a record low of 1.672 RMB bn in Dec 2003. China Generator & Generator Set: Product Inventory data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Generator & Generator Set: YoY: Total Asset data was reported at 8.369 % in Oct 2015. This records a decrease from the previous number of 8.876 % for Sep 2015. China Generator & Generator Set: YoY: Total Asset data is updated monthly, averaging 13.420 % from Jan 2006 (Median) to Oct 2015, with 89 observations. The data reached an all-time high of 45.680 % in Mar 2011 and a record low of 4.307 % in Dec 2013. China Generator & Generator Set: YoY: Total Asset data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data model to generate datasets used in the tests of the article: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning.