53 datasets found

m
data for: Synthetic Datasets Generator for Testing Techniques and Tools of...
data.mendeley.com
Updated Mar 12, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yvan Brito (2019). data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning [Dataset]. http://doi.org/10.17632/2j3hg4j6tc.1
Explore at:
Unique identifier
https://doi.org/10.17632/2j3hg4j6tc.1
Dataset updated
Mar 12, 2019
Authors
Yvan Brito
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data model to generate datasets used in the tests of the article: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning.
Data from: A National Thermal Generator Performance Database
data.openei.org
datalumos.org
+3more
archive, data
Updated Dec 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen; Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen (2018). A National Thermal Generator Performance Database [Dataset]. https://data.openei.org/submissions/8184
Explore at:
data, archiveAvailable download formats
Dataset updated
Dec 5, 2018
Dataset provided by
United States Department of Energyhttp://energy.gov/
National Renewable Energy Laboratory
Open Energy Data Initiative (OEDI)
Authors
Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen; Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This submission contains cleaned and filtered data from the Environmental Protection Agency Clean Air Markets CAM database of thermal power plant operation and performance.
Fake Employee Dataset
kaggle.com
zip
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oyekanmi Olamilekan (2023). Fake Employee Dataset [Dataset]. https://www.kaggle.com/datasets/oyekanmiolamilekan/fake-employee-dataset
Explore at:
zip(162874 bytes)Available download formats
Dataset updated
Nov 20, 2023
Authors
Oyekanmi Olamilekan
Description
Creating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.

Code Url: https://github.com/intellisenseCodez/faker-data-generator
d
Hazardous Waste Generators
catalog.data.gov
anrgeodata.vermont.gov
+8more
Updated Dec 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ANR/DEC/WMPD HazWaste program (2024). Hazardous Waste Generators [Dataset]. https://catalog.data.gov/dataset/hazardous-waste-generators-e03ea
Explore at:
Dataset updated
Dec 13, 2024
Dataset provided by
ANR/DEC/WMPD HazWaste program
Description
The HazWaste database contains generator (companies and/or individuals) site and mailing address information, waste generation, the amount of waste generated etc. of all the hazardous waste generators in Vermont. Database was developed in early 1990's for program management and to meet EPA Authorization requirements. The database has been updated to more modern data systems periodically.�
m
T10I4D100K transactional database
data.mendeley.com
Updated Oct 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uday kiran RAGE (2019). T10I4D100K transactional database [Dataset]. http://doi.org/10.17632/4hz2vcvxhp.1
Explore at:
Unique identifier
https://doi.org/10.17632/4hz2vcvxhp.1
Dataset updated
Oct 23, 2019
Authors
Uday kiran RAGE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
T10I4D100K is a renowned synthetic database generated using the IBM Quest generator. This database is widely used to evaluate various frequent and correlated pattern mining algorithms.
D
Database Testing Tool Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Database Testing Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/database-testing-tool-26309
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 9, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global database testing tool market is anticipated to experience substantial growth in the coming years, driven by factors such as the increasing adoption of cloud-based technologies, the rising demand for data quality and accuracy, and the growing complexity of database systems. The market is expected to reach a value of USD 1,542.4 million by 2033, expanding at a CAGR of 7.5% during the forecast period of 2023-2033. Key players in the market include Apache JMeter, DbFit, SQLMap, Mockup Data, SQL Test, NoSQLUnit, Orion, ApexSQL, QuerySurge, DBUnit, DataFactory, DTM Data Generator, Oracle, SeLite, SLOB, and others. The North American region is anticipated to hold a significant share of the database testing tool market, followed by Europe and Asia Pacific. The increasing adoption of cloud-based database testing services, the presence of key market players, and the growing demand for data testing and validation are driving the market growth in North America. Asia Pacific, on the other hand, is expected to experience the highest growth rate due to the rapidly increasing IT spending, the emergence of new technologies, and the growing number of businesses investing in data quality management solutions.
Report Generator 2.0
agdatacommons.nal.usda.gov
bin
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA Natural Resources Conservation Service (2025). Report Generator 2.0 [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Report_Generator_2_0/24661338
Explore at:
binAvailable download formats
Dataset updated
Nov 21, 2025
Dataset provided by
Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
Authors
USDA Natural Resources Conservation Service
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The NRCS National Water and Climate Center Report Generator web-based application uses long-term snowpack, precipitation, reservoir, streamflow, and soils data from a variety of quality-controlled sources to create reports. Users can choose from predefined templates or build custom reports. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats such as JPG and PNG. The Report Generator network incorporates data from many agency databases. The NRCS snow survey flagship database, the Water and Climate Information System (WCIS), provides a wealth of data, including manually-collected snow course data and information from automated Snow Telemetry (SNOTEL) and Soil Climate Analysis Network (SCAN) stations across the United States. Report Generator also uses precipitation, streamflow, and reservoir data from the U.S. Army Corps of Engineers (USACE), the U.S. Bureau of Reclamation (BOR), the Applied Climate Information System (ACIS), the U.S. Geological Survey (USGS), various water districts and other entities. In addition to creating reports, Report Generator lets you view information on sites, including metadata, such as elevation, latitude/longitude and hydrologic unit code (HUC). You can also view photos of the site, including a site map (in Google maps when available). Report Generator creates reports in both tabular and chart format. Single-station and multiple-station charting is also supported. Data may be displayed in either English or Metric units. Farmers, municipalities, water and hydroelectric utilities, environmental organizations, fish and wildlife managers, tribal nations, reservoir managers, recreationists, wetlands managers, urban developers, transportation departments, and research organizations regularly use these data and products. This release has several new features which focus on improving the way reports are specified and how they are displayed. Multi-station charting is also supported in this release. Resources in this dataset:Resource Title: Report Generator 2.0. File Name: Web Page, url: https://wcc.sc.egov.usda.gov/reportGenerator/ Create custom reports and charts from multiple data sources. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats, such as JPG and PNG.
Additional file 2: Table S2. of ODG: Omics database generator - a tool for...
springernature.figshare.com
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Guhlin; Kevin Silverstein; Peng Zhou; Peter Tiffin; Nevin Young (2023). Additional file 2: Table S2. of ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding [Dataset]. http://doi.org/10.6084/m9.figshare.c.3850801_D2.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3850801_D2.v1
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Joseph Guhlin; Kevin Silverstein; Peng Zhou; Peter Tiffin; Nevin Young
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PFam Domains and biological process GO categories for the four rhizobia strains. Predicted proteins related to multiple GO biological process categories are joined together with the pipe character. (XLSX 639Â kb)
Synthetic E-Commerce Relational Datasets
kaggle.com
Updated Aug 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nael Aqel (2025). Synthetic E-Commerce Relational Datasets [Dataset]. https://www.kaggle.com/datasets/naelaqel/synthetic-e-commerce-relational-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nael Aqel
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Synthetic E-Commerce Relational Dataset

This dataset is synthetically generated fake data designed to simulate a realistic e-commerce environment.

Purpose

To provide large-scale relational datasets for practicing database operations, analytics, and testing tools like DuckDB, Pandas, and SQL engines. Ideal for benchmarking, educational projects, and data engineering experiments.

Entity Relationship Diagram (ERD) - Tables Overview

1. Customers

customer_id (int): Unique identifier for each customer

name (string): Customer full name

email (string): Customer email address

gender (string): Customer gender ('Male', 'Female', 'Other')

signup_date (date): Date customer signed up

country (string): Customer country of residence

2. Products

product_id (int): Unique identifier for each product

product_name (string): Name of the product

category (string): Product category (e.g., Electronics, Books)

price (float): Price per unit

stock_quantity (int): Available stock count

brand (string): Product brand name

3. Orders

order_id (int): Unique identifier for each order

customer_id (int): ID of the customer who placed the order (foreign key to Customers)

order_date (date): Date when order was placed

total_amount (float): Total amount for the order

payment_method (string): Payment method used (Credit Card, PayPal, etc.)

shipping_country (string): Country where the order is shipped

4. Order Items

order_item_id (int): Unique identifier for each order item

order_id (int): ID of the order this item belongs to (foreign key to Orders)

product_id (int): ID of the product ordered (foreign key to Products)

quantity (int): Number of units ordered

unit_price (float): Price per unit at order time

5. Product Reviews

review_id (int): Unique identifier for each review

product_id (int): ID of the reviewed product (foreign key to Products)

customer_id (int): ID of the customer who wrote the review (foreign key to Customers)

rating (int): Rating score (1 to 5)

review_text (string): Text content of the review

review_date (date): Date the review was written

Visual EDR

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9179978%2F7681afe8fc52a116ff56a2a4e179ad19%2FEDR.png?generation=1754741998037680&alt=media" alt="">

Notes

All data is randomly generated using Python’s Faker library, so it does not reflect any real individuals or companies.

The data is provided in both CSV and Parquet formats.

The generator script is available in the accompanying GitHub repository for reproducibility and customization.

Output

The script saves two folders inside the specified output path:

csv/ # CSV files parquet/ # Parquet files

License

MIT License

References

Github Repo: https://github.com/NaelAqel/db_gen

Notebook: https://www.kaggle.com/code/naelaqel/synthetic-e-commerce-relational-dataset-generator
v
United States import data of Generator from Germany
volza.com
csv
Updated Aug 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza.LLC (2021). United States import data of Generator from Germany [Dataset]. https://www.volza.com/imports-united-states/united-states-import-data-of-generator-from-germany
Explore at:
csvAvailable download formats
Dataset updated
Aug 10, 2021
Dataset provided by
Volza.LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2014 - Sep 30, 2021
Area covered
Germany, United States
Variables measured
Count of exporters, Count of importers, Count of shipments, Sum of import value
Description
47894 United States import shipment records of Generator from Germany with prices, volume & current Buyer’s suppliers relationships based on actual United States import trade database.
C
China CN: Generator & Generator Set: YoY: No of Loss Making Enterprise
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, China CN: Generator & Generator Set: YoY: No of Loss Making Enterprise [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-yoy-no-of-loss-making-enterprise
Explore at:
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 1, 2014 - Oct 1, 2015
Area covered
China
Variables measured
Economic Activity
Description
China Generator & Generator Set: YoY: Number of Loss Making Enterprise data was reported at 14.173 % in Oct 2015. This records an increase from the previous number of 13.953 % for Sep 2015. China Generator & Generator Set: YoY: Number of Loss Making Enterprise data is updated monthly, averaging 5.357 % from Jan 2006 (Median) to Oct 2015, with 89 observations. The data reached an all-time high of 56.122 % in Aug 2012 and a record low of -13.529 % in Aug 2014. China Generator & Generator Set: YoY: Number of Loss Making Enterprise data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
C
China CN: Generator & Generator Set: Total Asset
ceicdata.com
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). China CN: Generator & Generator Set: Total Asset [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-total-asset
Explore at:
Dataset updated
Oct 15, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 1, 2014 - Oct 1, 2015
Area covered
China
Variables measured
Economic Activity
Description
China Generator & Generator Set: Total Asset data was reported at 458.934 RMB bn in Oct 2015. This records an increase from the previous number of 451.458 RMB bn for Sep 2015. China Generator & Generator Set: Total Asset data is updated monthly, averaging 299.527 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 458.934 RMB bn in Oct 2015 and a record low of 28.965 RMB bn in Dec 2003. China Generator & Generator Set: Total Asset data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
C
China CN: Generator & Generator Set: Account Receivable
ceicdata.com
Updated Dec 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2019). China CN: Generator & Generator Set: Account Receivable [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-account-receivable
Explore at:
Dataset updated
Dec 15, 2019
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 1, 2014 - Oct 1, 2015
Area covered
China
Variables measured
Economic Activity
Description
China Generator & Generator Set: Account Receivable data was reported at 133.527 RMB bn in Oct 2015. This records an increase from the previous number of 126.823 RMB bn for Sep 2015. China Generator & Generator Set: Account Receivable data is updated monthly, averaging 82.475 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 133.527 RMB bn in Oct 2015 and a record low of 4.207 RMB bn in Dec 2003. China Generator & Generator Set: Account Receivable data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
w
Generator Summary View
data.wu.ac.at
csv, json, xml
Updated May 15, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Energy and Environmental Protection (2018). Generator Summary View [Dataset]. https://data.wu.ac.at/schema/data_ct_gov/NzJtaS0zZjgy
Explore at:
json, csv, xmlAvailable download formats
Dataset updated
May 15, 2018
Dataset provided by
Department of Energy and Environmental Protection
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
PLEASE NOTE: Use ALL CAPS when searching using the "Filter" function on text such as: LITCHFIELD. But not needed for the upper right corner "Find in this Dataset" search where for example "Litchfield" can be used.
We know there are errors in the data although we strive to minimize them. Examples include: • Manifests completed incorrectly by the generator or the transporter - data was entered based on the incorrect information. We can only enter the information we receive. • Data entry errors – we now have QA/QC procedures in place to prevent or catch and fix a lot of these. • Historically there are multiple records of the same generator. Each variation in spelling in name or address generated a separate handler record. We have worked to minimize these but many remain. The good news is that as long as they all have the same EPA ID they will all show up in your search results. • Handlers provide erroneous data to obtain an EPA ID - data entry was based on erroneous information. Examples include incorrect or bogus addresses and names. There are also a lot of MISSPELLED NAMES AND ADDRESSES! • Missing manifests – Not every required manifest gets submitted to the DEP. Also, of the more than 100,000 paper manifests we receive each year, some were incorrectly handled and never entered. • Missing data – we know that the records for approximately 25 boxes of manifests, mostly prior to 1985 were lost from the database in the 1980’s. • Translation errors – the data has been migrated to newer data platforms numerous times, and each time there have been errors and data losses. • Wastes incorrectly entered – mostly due to complex names that were difficult to spell, or typos in quantities or units of measure.
Z
TAU Spatial Room Impulse Response Database (TAU-SRIR DB)
data.niaid.nih.gov
nde-dev.biothings.io
+2more
Updated Apr 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Politis, Archontis; Adavanne, Sharath; Virtanen, Tuomas (2022). TAU Spatial Room Impulse Response Database (TAU-SRIR DB) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6408610
Explore at:
Dataset updated
Apr 6, 2022
Dataset provided by
Tampere University
Authors
Politis, Archontis; Adavanne, Sharath; Virtanen, Tuomas
Description
DESCRIPTION

The TAU Spatial Room Impulse Response Database (TAU-SRIR DB) database contains spatial room impulse responses (SRIRs) captured in various spaces of Tampere University (TAU), Finland, for a fixed receiver position and multiple source positions per room, along with separate recordings of spatial ambient noise captured at the same recording point. The dataset is intended for emulation of spatial multichannel recordings for evaluation and/or training of multichannel processing algorithms in realistic reverberant conditions and over multiple rooms. The major distinct properties of the database compared to other databases of room impulse responses are:

Capturing in a high resolution multichannel format (32 channels) from which multiple more limited application-specific formats can be derived (e.g. tetrahedral array, circular array, first-order Ambisonics, higher-order Ambisonics, binaural).

Extraction of densely spaced SRIRs along measurement trajectories, allowing emulation of moving source scenarios.

Multiple source distances, azimuths, and elevations from the receiver per room, allowing emulation of complex configurations for multi-source methods.

Multiple rooms, allowing evaluation of methods at various acoustic conditions, and training of methods with the aim of generalization on different rooms.

The RIRs were collected by staff of TAU between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.

NOTE: This database is a work-in-progress. We intend to publish additional rooms, additional formats, and potentially higher-fidelity versions of the captured responses in the near future, as new versions of the database in this repository.

REPORT AND REFERENCE

A compact description of the dataset, recording setup, recording procedure, and extraction can be found in:

Politis., Archontis, Adavanne, Sharath, & Virtanen, Tuomas (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan.

available here. A more detailed report specifically focusing on the dataset collection and properties will follow.

AIM

The dataset can be used for generating multichannel or monophonic mixtures for testing or training of methods under realistic reverberation conditions, related to e.g. multichannel speech enhancement, acoustic scene analysis, and machine listening, among others. It is especially suitable for the follow application scenarios:

monophonic and multichannal reverberant single- or multi-source speech in multi-room reverberant conditions

monophonic and multichannel polyphonic sound events in multi-room reverberant conditions

single-source and multi-source localization in multi-room reverberant conditions, in static or dynamic scenarios

single-source and multi-source tracking in multi-room reverberant conditions, in static or dynamic scenarios

sound event localization and detection in multi-room reverberant conditions, in static or dynamic scenarios

SPECIFICATIONS

The SRIRs were captured using an Eigenmike spherical microphone array. A Genelec G Three loudspeaker was used to playback a maximum length sequence (MLS) around the Eigenmike. The SRIRs were obtained in the STFT domain using a least-squares regression between the known measurement signal (MLS) and far-field recording independently at each frequency. In this version of the dataset the SRIRs and ambient noise are downsampled to 24kHz for compactness.

The currently published SRIR set was recorded at nine different indoor locations inside the Tampere University campus at Hervanta, Finland. Additionally, 30 minutes of ambient noise recordings were collected at the same locations with the IR recording setup unchanged. SRIR directions and distances differ with the room. Possible azimuths span the whole range of $\phi\in[-180,180)$, while the elevations span approximately a range between $\theta\in[-45,45]$ degrees. The currently shared measured spaces are as follows:

Large open space in underground bomb shelter, with plastic-coated floor and rock walls. Ventilation noise. Circular source trajectory.

Large open gym space. Ambience of people using weights and gym equipment in adjacent rooms. Circular source trajectory.

Small classroom (PB132) with group work tables and carpet flooring. Ventilation noise. Circular source trajectory.

Meeting room (PC226) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.

Lecture hall (SA203) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.

Small classroom (SC203) with group work tables and carpet flooring. Ventilation noise. Linear source trajectory.

Large classroom (SE203) with hard floor and rows of desks. Ventilation noise. Linear source trajectory.

Lecture hall (TB103) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.

Meeting room (TC352) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.

The measurement trajectories were organised in groups, with each group being specified by a circular or linear trace at the floor at a certain distance from the z-axis of the microphone. For circular trajectories two ranges were measured, a close and a far one, except room TC352, where the same range was measured twice, but with different furniture configuration and open or closed doors. For linear trajectories also two ranges were measured, close and far, but with linear paths at either side of the array, resulting in 4 unique trajectory groups, with the exception of room SA203 where 3 ranges were measured resulting on 6 trajectory groups. Linear trajectory groups are always parallel to each other, in the same room.

Each trajectory group had multiple measurement trajectories, following the same floor path, but with the source at different heights.

The SRIRs are extracted from the noise recordings of the slowly moving source across those trajectories, at an angular spacing of approximately every 1 degree from the microphone. Instead of extracting SRIRs at equally spaced points along the path (e.g. every 20cm), this extraction scheme was found more practical for synthesis purposes, making emulation of moving sources at an approximately constant angular speed easier.

More details on the trajectory geometries can be found in the README file and the measinfo.mat file.

RECORDING FORMATS

As with the DCASE2019-2021 datasets, currently the database is provided in two formats, first-order Ambisonics, and a tetrahedral microphone array - both derived from the Eigenmike 32-channel recordings. For more details on the format specifications, check the README.

We intend to add additional formats of the database, of both higher resolution (e.g. higher-order Ambisonics), or lower resolution (e.g. binaural).

REFERENCE DOAs

For each extracted RIR across a measurement trajectory there is a direction-of-arrival (DOA) associated with it, which can be used as the reference direction for sound source spatialized using this RIR, for training or evaluation purposes. The DOAs were determined acoustically from the extracted RIRs, by windowing the direct sound part and applying a broadband version of the MUSIC localization algorithm on the windowed multichannel signal.

The DOAs are provided as Cartesian components [x, y, z] of unit length vectors.

SCENE GENERATOR

A set of routines is shared, here termed scene generator, that can spatialize a bank of sound samples using the SRIRs and noise recordings of this library, to emulate scenes for the two target formats. The code is similar to the one used to generate the TAU-NIGENS Spatial Sound Events 2021 dataset, and has been ported to Python from the original version written in Matlab.

The generator can be found here, along with more details on its use.

The generator at the moment is set to work with the NIGENS sound event sample database, and the FSD50K sound event database, but additional sample banks can be added with small modifications.

The dataset together with the generator has been used by the authors in the following public challenges:

DCASE 2019 Challenge Task 3, to generate the TAU Spatial Sound Events 2019 dataset (development/evaluation)

DCASE 2020 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2020 dataset

DCASE2021 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2021 dataset

DCASE2022 Challenge Task 3, to generate additional SELD synthetic mixtures for training the task baseline

NOTE: The current version of the generator is work-in-progress, with some code being quite "rough". If something does not work as intended or it is not clear what certain parts do, please contact us.

DATASET STRUCTURE

The dataset contains a folder of the SRIRs (TAU-SRIR_DB), with all the SRIRs per room in a single MAT file. The file rirdata.mat contains some general information such as sample rate, format specifications, and most importantly the DOAs of every extracted SRIR. The file measinfo.mat contains measurement and recording information in each room. Finally, the dataset contains a folder of spatial ambient noise recordings (TAU-SNoise_DB), with one subfolder per room having two audio recordings fo the spatial ambience, one for each format, FOA or MIC. For more information on how to SRIRs and DOAs are organized, check the README.

DOWNLOAD

The files TAU-SRIR_DB.z01, ..., TAU-SRIR_DB.zip contain the SRIRs and measurement info files.

The files TAU-SNoise_DB.z01, ..., TAU-SNoise_DB.zip
Z
Public Utility Data Liberation Project (PUDL) Data Release
data.niaid.nih.gov
zenodo.org
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653158
Explore at:
Dataset updated
Feb 14, 2025
Dataset provided by
Catalyst Cooperative
Authors
Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PUDL v2025.2.0 Data Release

This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.

One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.

Some potentially breaking changes to be aware of:

In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.

We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.

Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.

New Data

EIA 176

Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.

Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.

EIA 860

Added EIA 860 Multifuel table. See #3438 and #3946.

FERC 1

Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:

out_ferc1_yearly_detailed_income_statements

out_ferc1_yearly_detailed_balance_sheet_assets

out_ferc1_yearly_detailed_balance_sheet_liabilities

SEC Form 10-K Parent-Subsidiary Ownership

We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.

See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:

out_sec10k_parents_and_subsidiaries

core_sec10k_quarterly_filings

core_sec10k_quarterly_exhibit_21_company_ownership

core_sec10k_quarterly_company_information

Expanded Data Coverage

EPA CEMS

Added 2024 Q4 of CEMS data. See #4041 and #4052.

EPA CAMD EIA Crosswalk

In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.

The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.

EIA 860M

Added EIA 860m through December 2024. See #4038 and #4047.

EIA 923

Added EIA 923 monthly data through September 2024. See #4038 and #4047.

EIA Bulk Electricity Data

Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.

EIA 930

Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.

Bug Fixes

Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.

Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.

Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.

Quality of Life Improvements

We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.

Other PUDL v2025.2.0 Resources

PUDL v2025.2.0 Data Dictionary

PUDL v2025.2.0 Documentation

PUDL in the AWS Open Data Registry

PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/

PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/

Zenodo archive of the PUDL GitHub repo for this release

PUDL v2025.2.0 release on GitHub

PUDL v2025.2.0 package in the Python Package Index (PyPI)

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

Follow us on GitHub

Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter

GitHub Discussions is where we provide user support.

Watch our GitHub Project to see what we're working on.

Email us at hello@catalyst.coop for private communications.

On Mastodon: @CatalystCoop@mastodon.energy

On BlueSky: @catalyst.coop

On Twitter: @CatalystCoop

Connect with us on LinkedIn

Play with our data and notebooks on Kaggle

Combine our data with ML models on HuggingFace

Learn more about us on our website: https://catalyst.coop

Subscribe to our announcements list for email updates.
C
China CN: Generator & Generator Set: Total Liability
ceicdata.com
Updated Dec 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2020). China CN: Generator & Generator Set: Total Liability [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-total-liability
Explore at:
Dataset updated
Dec 15, 2020
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 1, 2014 - Oct 1, 2015
Area covered
China
Variables measured
Economic Activity
Description
China Generator & Generator Set: Total Liability data was reported at 299.834 RMB bn in Oct 2015. This records an increase from the previous number of 294.039 RMB bn for Sep 2015. China Generator & Generator Set: Total Liability data is updated monthly, averaging 181.089 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 299.834 RMB bn in Oct 2015 and a record low of 20.835 RMB bn in Dec 2003. China Generator & Generator Set: Total Liability data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
C
China CN: Generator & Generator Set: No of Employee: Average
ceicdata.com
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2024). China CN: Generator & Generator Set: No of Employee: Average [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-no-of-employee-average
Explore at:
Dataset updated
Dec 15, 2024
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2012 - Dec 1, 2013
Area covered
China
Variables measured
Economic Activity
Description
China Generator & Generator Set: Number of Employee: Average data was reported at 246.196 Person th in Dec 2013. This records an increase from the previous number of 212.926 Person th for Dec 2012. China Generator & Generator Set: Number of Employee: Average data is updated monthly, averaging 151.600 Person th from Dec 2003 (Median) to Dec 2013, with 64 observations. The data reached an all-time high of 246.196 Person th in Dec 2013 and a record low of 69.115 Person th in Dec 2003. China Generator & Generator Set: Number of Employee: Average data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
C
China CN: Generator & Generator Set: YoY: Account Receivable
ceicdata.com
Updated Sep 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2020). China CN: Generator & Generator Set: YoY: Account Receivable [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-yoy-account-receivable
Explore at:
Dataset updated
Sep 15, 2020
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 1, 2014 - Oct 1, 2015
Area covered
China
Variables measured
Economic Activity
Description
China Generator & Generator Set: YoY: Account Receivable data was reported at 12.139 % in Oct 2015. This records an increase from the previous number of 11.472 % for Sep 2015. China Generator & Generator Set: YoY: Account Receivable data is updated monthly, averaging 27.840 % from Jan 2006 (Median) to Oct 2015, with 89 observations. The data reached an all-time high of 87.380 % in Mar 2011 and a record low of -7.849 % in May 2013. China Generator & Generator Set: YoY: Account Receivable data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.
v
Global import data of Generator
volza.com
csv
Updated Oct 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). Global import data of Generator [Dataset]. https://www.volza.com/imports-global/global-import-data-of-generator-from-martinique
Explore at:
csvAvailable download formats
Dataset updated
Oct 31, 2025
Dataset authored and provided by
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
Description
12 Global import shipment records of Generator with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yvan Brito (2019). data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning [Dataset]. http://doi.org/10.17632/2j3hg4j6tc.1

data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning

Explore at:

Unique identifier

https://doi.org/10.17632/2j3hg4j6tc.1

Dataset updated

Mar 12, 2019

Authors

Yvan Brito

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data model to generate datasets used in the tests of the article: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning.

Clear search

Close search

Google apps

Main menu

data for: Synthetic Datasets Generator for Testing Techniques and Tools of...

Data from: A National Thermal Generator Performance Database

Fake Employee Dataset

Hazardous Waste Generators

T10I4D100K transactional database

Database Testing Tool Report

Report Generator 2.0

Additional file 2: Table S2. of ODG: Omics database generator - a tool for...

Synthetic E-Commerce Relational Datasets

Synthetic E-Commerce Relational Dataset

Purpose

Entity Relationship Diagram (ERD) - Tables Overview

1. Customers

2. Products

3. Orders

4. Order Items

5. Product Reviews

Visual EDR

Notes

Output

License

References

United States import data of Generator from Germany

China CN: Generator & Generator Set: YoY: No of Loss Making Enterprise

China CN: Generator & Generator Set: Total Asset

China CN: Generator & Generator Set: Account Receivable

Generator Summary View

TAU Spatial Room Impulse Response Database (TAU-SRIR DB)

Public Utility Data Liberation Project (PUDL) Data Release

China CN: Generator & Generator Set: Total Liability

China CN: Generator & Generator Set: No of Employee: Average

China CN: Generator & Generator Set: YoY: Account Receivable

Global import data of Generator

data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine LearningSee More Versions

data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning