53 datasets found
  1. m

    data for: Synthetic Datasets Generator for Testing Techniques and Tools of...

    • data.mendeley.com
    Updated Mar 12, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yvan Brito (2019). data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning [Dataset]. http://doi.org/10.17632/2j3hg4j6tc.1
    Explore at:
    Dataset updated
    Mar 12, 2019
    Authors
    Yvan Brito
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data model to generate datasets used in the tests of the article: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning.

  2. Data from: A National Thermal Generator Performance Database

    • data.openei.org
    • datalumos.org
    • +3more
    archive, data
    Updated Dec 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen; Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen (2018). A National Thermal Generator Performance Database [Dataset]. https://data.openei.org/submissions/8184
    Explore at:
    data, archiveAvailable download formats
    Dataset updated
    Dec 5, 2018
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    National Renewable Energy Laboratory
    Open Energy Data Initiative (OEDI)
    Authors
    Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen; Rossol; Brinkman; Buster; Denholm; Novacheck; Stephen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This submission contains cleaned and filtered data from the Environmental Protection Agency Clean Air Markets CAM database of thermal power plant operation and performance.

  3. Fake Employee Dataset

    • kaggle.com
    zip
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oyekanmi Olamilekan (2023). Fake Employee Dataset [Dataset]. https://www.kaggle.com/datasets/oyekanmiolamilekan/fake-employee-dataset
    Explore at:
    zip(162874 bytes)Available download formats
    Dataset updated
    Nov 20, 2023
    Authors
    Oyekanmi Olamilekan
    Description

    Creating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.

    Code Url: https://github.com/intellisenseCodez/faker-data-generator

  4. d

    Hazardous Waste Generators

    • catalog.data.gov
    • anrgeodata.vermont.gov
    • +8more
    Updated Dec 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ANR/DEC/WMPD HazWaste program (2024). Hazardous Waste Generators [Dataset]. https://catalog.data.gov/dataset/hazardous-waste-generators-e03ea
    Explore at:
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    ANR/DEC/WMPD HazWaste program
    Description

    The HazWaste database contains generator (companies and/or individuals) site and mailing address information, waste generation, the amount of waste generated etc. of all the hazardous waste generators in Vermont. Database was developed in early 1990's for program management and to meet EPA Authorization requirements. The database has been updated to more modern data systems periodically.�

  5. m

    T10I4D100K transactional database

    • data.mendeley.com
    Updated Oct 23, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uday kiran RAGE (2019). T10I4D100K transactional database [Dataset]. http://doi.org/10.17632/4hz2vcvxhp.1
    Explore at:
    Dataset updated
    Oct 23, 2019
    Authors
    Uday kiran RAGE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    T10I4D100K is a renowned synthetic database generated using the IBM Quest generator. This database is widely used to evaluate various frequent and correlated pattern mining algorithms.

  6. D

    Database Testing Tool Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Database Testing Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/database-testing-tool-26309
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 9, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global database testing tool market is anticipated to experience substantial growth in the coming years, driven by factors such as the increasing adoption of cloud-based technologies, the rising demand for data quality and accuracy, and the growing complexity of database systems. The market is expected to reach a value of USD 1,542.4 million by 2033, expanding at a CAGR of 7.5% during the forecast period of 2023-2033. Key players in the market include Apache JMeter, DbFit, SQLMap, Mockup Data, SQL Test, NoSQLUnit, Orion, ApexSQL, QuerySurge, DBUnit, DataFactory, DTM Data Generator, Oracle, SeLite, SLOB, and others. The North American region is anticipated to hold a significant share of the database testing tool market, followed by Europe and Asia Pacific. The increasing adoption of cloud-based database testing services, the presence of key market players, and the growing demand for data testing and validation are driving the market growth in North America. Asia Pacific, on the other hand, is expected to experience the highest growth rate due to the rapidly increasing IT spending, the emergence of new technologies, and the growing number of businesses investing in data quality management solutions.

  7. Report Generator 2.0

    • agdatacommons.nal.usda.gov
    bin
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Natural Resources Conservation Service (2025). Report Generator 2.0 [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Report_Generator_2_0/24661338
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
    Authors
    USDA Natural Resources Conservation Service
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The NRCS National Water and Climate Center Report Generator web-based application uses long-term snowpack, precipitation, reservoir, streamflow, and soils data from a variety of quality-controlled sources to create reports. Users can choose from predefined templates or build custom reports. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats such as JPG and PNG. The Report Generator network incorporates data from many agency databases. The NRCS snow survey flagship database, the Water and Climate Information System (WCIS), provides a wealth of data, including manually-collected snow course data and information from automated Snow Telemetry (SNOTEL) and Soil Climate Analysis Network (SCAN) stations across the United States. Report Generator also uses precipitation, streamflow, and reservoir data from the U.S. Army Corps of Engineers (USACE), the U.S. Bureau of Reclamation (BOR), the Applied Climate Information System (ACIS), the U.S. Geological Survey (USGS), various water districts and other entities. In addition to creating reports, Report Generator lets you view information on sites, including metadata, such as elevation, latitude/longitude and hydrologic unit code (HUC). You can also view photos of the site, including a site map (in Google maps when available). Report Generator creates reports in both tabular and chart format. Single-station and multiple-station charting is also supported. Data may be displayed in either English or Metric units. Farmers, municipalities, water and hydroelectric utilities, environmental organizations, fish and wildlife managers, tribal nations, reservoir managers, recreationists, wetlands managers, urban developers, transportation departments, and research organizations regularly use these data and products. This release has several new features which focus on improving the way reports are specified and how they are displayed. Multi-station charting is also supported in this release. Resources in this dataset:Resource Title: Report Generator 2.0. File Name: Web Page, url: https://wcc.sc.egov.usda.gov/reportGenerator/ Create custom reports and charts from multiple data sources. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats, such as JPG and PNG.

  8. Additional file 2: Table S2. of ODG: Omics database generator - a tool for...

    • springernature.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Guhlin; Kevin Silverstein; Peng Zhou; Peter Tiffin; Nevin Young (2023). Additional file 2: Table S2. of ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding [Dataset]. http://doi.org/10.6084/m9.figshare.c.3850801_D2.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Joseph Guhlin; Kevin Silverstein; Peng Zhou; Peter Tiffin; Nevin Young
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PFam Domains and biological process GO categories for the four rhizobia strains. Predicted proteins related to multiple GO biological process categories are joined together with the pipe character. (XLSX 639Â kb)

  9. Synthetic E-Commerce Relational Datasets

    • kaggle.com
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nael Aqel (2025). Synthetic E-Commerce Relational Datasets [Dataset]. https://www.kaggle.com/datasets/naelaqel/synthetic-e-commerce-relational-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nael Aqel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Synthetic E-Commerce Relational Dataset

    This dataset is synthetically generated fake data designed to simulate a realistic e-commerce environment.

    Purpose

    To provide large-scale relational datasets for practicing database operations, analytics, and testing tools like DuckDB, Pandas, and SQL engines. Ideal for benchmarking, educational projects, and data engineering experiments.

    Entity Relationship Diagram (ERD) - Tables Overview

    1. Customers

    • customer_id (int): Unique identifier for each customer
    • name (string): Customer full name
    • email (string): Customer email address
    • gender (string): Customer gender ('Male', 'Female', 'Other')
    • signup_date (date): Date customer signed up
    • country (string): Customer country of residence

    2. Products

    • product_id (int): Unique identifier for each product
    • product_name (string): Name of the product
    • category (string): Product category (e.g., Electronics, Books)
    • price (float): Price per unit
    • stock_quantity (int): Available stock count
    • brand (string): Product brand name

    3. Orders

    • order_id (int): Unique identifier for each order
    • customer_id (int): ID of the customer who placed the order (foreign key to Customers)
    • order_date (date): Date when order was placed
    • total_amount (float): Total amount for the order
    • payment_method (string): Payment method used (Credit Card, PayPal, etc.)
    • shipping_country (string): Country where the order is shipped

    4. Order Items

    • order_item_id (int): Unique identifier for each order item
    • order_id (int): ID of the order this item belongs to (foreign key to Orders)
    • product_id (int): ID of the product ordered (foreign key to Products)
    • quantity (int): Number of units ordered
    • unit_price (float): Price per unit at order time

    5. Product Reviews

    • review_id (int): Unique identifier for each review
    • product_id (int): ID of the reviewed product (foreign key to Products)
    • customer_id (int): ID of the customer who wrote the review (foreign key to Customers)
    • rating (int): Rating score (1 to 5)
    • review_text (string): Text content of the review
    • review_date (date): Date the review was written

    Visual EDR

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9179978%2F7681afe8fc52a116ff56a2a4e179ad19%2FEDR.png?generation=1754741998037680&alt=media" alt="">

    Notes

    • All data is randomly generated using Python’s Faker library, so it does not reflect any real individuals or companies.
    • The data is provided in both CSV and Parquet formats.
    • The generator script is available in the accompanying GitHub repository for reproducibility and customization.

    Output

    The script saves two folders inside the specified output path:

    csv/    # CSV files
    parquet/  # Parquet files
    

    License

    MIT License

    References

  10. v

    United States import data of Generator from Germany

    • volza.com
    csv
    Updated Aug 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza.LLC (2021). United States import data of Generator from Germany [Dataset]. https://www.volza.com/imports-united-states/united-states-import-data-of-generator-from-germany
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 10, 2021
    Dataset provided by
    Volza.LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2014 - Sep 30, 2021
    Area covered
    Germany, United States
    Variables measured
    Count of exporters, Count of importers, Count of shipments, Sum of import value
    Description

    47894 United States import shipment records of Generator from Germany with prices, volume & current Buyer’s suppliers relationships based on actual United States import trade database.

  11. C

    China CN: Generator & Generator Set: YoY: No of Loss Making Enterprise

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, China CN: Generator & Generator Set: YoY: No of Loss Making Enterprise [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-yoy-no-of-loss-making-enterprise
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 1, 2014 - Oct 1, 2015
    Area covered
    China
    Variables measured
    Economic Activity
    Description

    China Generator & Generator Set: YoY: Number of Loss Making Enterprise data was reported at 14.173 % in Oct 2015. This records an increase from the previous number of 13.953 % for Sep 2015. China Generator & Generator Set: YoY: Number of Loss Making Enterprise data is updated monthly, averaging 5.357 % from Jan 2006 (Median) to Oct 2015, with 89 observations. The data reached an all-time high of 56.122 % in Aug 2012 and a record low of -13.529 % in Aug 2014. China Generator & Generator Set: YoY: Number of Loss Making Enterprise data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.

  12. C

    China CN: Generator & Generator Set: Total Asset

    • ceicdata.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). China CN: Generator & Generator Set: Total Asset [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-total-asset
    Explore at:
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 1, 2014 - Oct 1, 2015
    Area covered
    China
    Variables measured
    Economic Activity
    Description

    China Generator & Generator Set: Total Asset data was reported at 458.934 RMB bn in Oct 2015. This records an increase from the previous number of 451.458 RMB bn for Sep 2015. China Generator & Generator Set: Total Asset data is updated monthly, averaging 299.527 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 458.934 RMB bn in Oct 2015 and a record low of 28.965 RMB bn in Dec 2003. China Generator & Generator Set: Total Asset data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.

  13. C

    China CN: Generator & Generator Set: Account Receivable

    • ceicdata.com
    Updated Dec 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2019). China CN: Generator & Generator Set: Account Receivable [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-account-receivable
    Explore at:
    Dataset updated
    Dec 15, 2019
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 1, 2014 - Oct 1, 2015
    Area covered
    China
    Variables measured
    Economic Activity
    Description

    China Generator & Generator Set: Account Receivable data was reported at 133.527 RMB bn in Oct 2015. This records an increase from the previous number of 126.823 RMB bn for Sep 2015. China Generator & Generator Set: Account Receivable data is updated monthly, averaging 82.475 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 133.527 RMB bn in Oct 2015 and a record low of 4.207 RMB bn in Dec 2003. China Generator & Generator Set: Account Receivable data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.

  14. w

    Generator Summary View

    • data.wu.ac.at
    csv, json, xml
    Updated May 15, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Energy and Environmental Protection (2018). Generator Summary View [Dataset]. https://data.wu.ac.at/schema/data_ct_gov/NzJtaS0zZjgy
    Explore at:
    json, csv, xmlAvailable download formats
    Dataset updated
    May 15, 2018
    Dataset provided by
    Department of Energy and Environmental Protection
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    PLEASE NOTE: Use ALL CAPS when searching using the "Filter" function on text such as: LITCHFIELD. But not needed for the upper right corner "Find in this Dataset" search where for example "Litchfield" can be used.
    We know there are errors in the data although we strive to minimize them. Examples include: • Manifests completed incorrectly by the generator or the transporter - data was entered based on the incorrect information. We can only enter the information we receive. • Data entry errors – we now have QA/QC procedures in place to prevent or catch and fix a lot of these. • Historically there are multiple records of the same generator. Each variation in spelling in name or address generated a separate handler record. We have worked to minimize these but many remain. The good news is that as long as they all have the same EPA ID they will all show up in your search results. • Handlers provide erroneous data to obtain an EPA ID - data entry was based on erroneous information. Examples include incorrect or bogus addresses and names. There are also a lot of MISSPELLED NAMES AND ADDRESSES! • Missing manifests – Not every required manifest gets submitted to the DEP. Also, of the more than 100,000 paper manifests we receive each year, some were incorrectly handled and never entered. • Missing data – we know that the records for approximately 25 boxes of manifests, mostly prior to 1985 were lost from the database in the 1980’s. • Translation errors – the data has been migrated to newer data platforms numerous times, and each time there have been errors and data losses. • Wastes incorrectly entered – mostly due to complex names that were difficult to spell, or typos in quantities or units of measure.

  15. Z

    TAU Spatial Room Impulse Response Database (TAU-SRIR DB)

    • data.niaid.nih.gov
    • nde-dev.biothings.io
    • +2more
    Updated Apr 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Politis, Archontis; Adavanne, Sharath; Virtanen, Tuomas (2022). TAU Spatial Room Impulse Response Database (TAU-SRIR DB) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6408610
    Explore at:
    Dataset updated
    Apr 6, 2022
    Dataset provided by
    Tampere University
    Authors
    Politis, Archontis; Adavanne, Sharath; Virtanen, Tuomas
    Description

    DESCRIPTION

    The TAU Spatial Room Impulse Response Database (TAU-SRIR DB) database contains spatial room impulse responses (SRIRs) captured in various spaces of Tampere University (TAU), Finland, for a fixed receiver position and multiple source positions per room, along with separate recordings of spatial ambient noise captured at the same recording point. The dataset is intended for emulation of spatial multichannel recordings for evaluation and/or training of multichannel processing algorithms in realistic reverberant conditions and over multiple rooms. The major distinct properties of the database compared to other databases of room impulse responses are:

    Capturing in a high resolution multichannel format (32 channels) from which multiple more limited application-specific formats can be derived (e.g. tetrahedral array, circular array, first-order Ambisonics, higher-order Ambisonics, binaural).

    Extraction of densely spaced SRIRs along measurement trajectories, allowing emulation of moving source scenarios.

    Multiple source distances, azimuths, and elevations from the receiver per room, allowing emulation of complex configurations for multi-source methods.

    Multiple rooms, allowing evaluation of methods at various acoustic conditions, and training of methods with the aim of generalization on different rooms.

    The RIRs were collected by staff of TAU between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.

    NOTE: This database is a work-in-progress. We intend to publish additional rooms, additional formats, and potentially higher-fidelity versions of the captured responses in the near future, as new versions of the database in this repository.

    REPORT AND REFERENCE

    A compact description of the dataset, recording setup, recording procedure, and extraction can be found in:

    Politis., Archontis, Adavanne, Sharath, & Virtanen, Tuomas (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan.

    available here. A more detailed report specifically focusing on the dataset collection and properties will follow.

    AIM

    The dataset can be used for generating multichannel or monophonic mixtures for testing or training of methods under realistic reverberation conditions, related to e.g. multichannel speech enhancement, acoustic scene analysis, and machine listening, among others. It is especially suitable for the follow application scenarios:

    monophonic and multichannal reverberant single- or multi-source speech in multi-room reverberant conditions

    monophonic and multichannel polyphonic sound events in multi-room reverberant conditions

    single-source and multi-source localization in multi-room reverberant conditions, in static or dynamic scenarios

    single-source and multi-source tracking in multi-room reverberant conditions, in static or dynamic scenarios

    sound event localization and detection in multi-room reverberant conditions, in static or dynamic scenarios

    SPECIFICATIONS

    The SRIRs were captured using an Eigenmike spherical microphone array. A Genelec G Three loudspeaker was used to playback a maximum length sequence (MLS) around the Eigenmike. The SRIRs were obtained in the STFT domain using a least-squares regression between the known measurement signal (MLS) and far-field recording independently at each frequency. In this version of the dataset the SRIRs and ambient noise are downsampled to 24kHz for compactness.

    The currently published SRIR set was recorded at nine different indoor locations inside the Tampere University campus at Hervanta, Finland. Additionally, 30 minutes of ambient noise recordings were collected at the same locations with the IR recording setup unchanged. SRIR directions and distances differ with the room. Possible azimuths span the whole range of $\phi\in[-180,180)$, while the elevations span approximately a range between $\theta\in[-45,45]$ degrees. The currently shared measured spaces are as follows:

    Large open space in underground bomb shelter, with plastic-coated floor and rock walls. Ventilation noise. Circular source trajectory.

    Large open gym space. Ambience of people using weights and gym equipment in adjacent rooms. Circular source trajectory.

    Small classroom (PB132) with group work tables and carpet flooring. Ventilation noise. Circular source trajectory.

    Meeting room (PC226) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.

    Lecture hall (SA203) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.

    Small classroom (SC203) with group work tables and carpet flooring. Ventilation noise. Linear source trajectory.

    Large classroom (SE203) with hard floor and rows of desks. Ventilation noise. Linear source trajectory.

    Lecture hall (TB103) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.

    Meeting room (TC352) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.

    The measurement trajectories were organised in groups, with each group being specified by a circular or linear trace at the floor at a certain distance from the z-axis of the microphone. For circular trajectories two ranges were measured, a close and a far one, except room TC352, where the same range was measured twice, but with different furniture configuration and open or closed doors. For linear trajectories also two ranges were measured, close and far, but with linear paths at either side of the array, resulting in 4 unique trajectory groups, with the exception of room SA203 where 3 ranges were measured resulting on 6 trajectory groups. Linear trajectory groups are always parallel to each other, in the same room.

    Each trajectory group had multiple measurement trajectories, following the same floor path, but with the source at different heights.

    The SRIRs are extracted from the noise recordings of the slowly moving source across those trajectories, at an angular spacing of approximately every 1 degree from the microphone. Instead of extracting SRIRs at equally spaced points along the path (e.g. every 20cm), this extraction scheme was found more practical for synthesis purposes, making emulation of moving sources at an approximately constant angular speed easier.

    More details on the trajectory geometries can be found in the README file and the measinfo.mat file.

    RECORDING FORMATS

    As with the DCASE2019-2021 datasets, currently the database is provided in two formats, first-order Ambisonics, and a tetrahedral microphone array - both derived from the Eigenmike 32-channel recordings. For more details on the format specifications, check the README.

    We intend to add additional formats of the database, of both higher resolution (e.g. higher-order Ambisonics), or lower resolution (e.g. binaural).

    REFERENCE DOAs

    For each extracted RIR across a measurement trajectory there is a direction-of-arrival (DOA) associated with it, which can be used as the reference direction for sound source spatialized using this RIR, for training or evaluation purposes. The DOAs were determined acoustically from the extracted RIRs, by windowing the direct sound part and applying a broadband version of the MUSIC localization algorithm on the windowed multichannel signal.

    The DOAs are provided as Cartesian components [x, y, z] of unit length vectors.

    SCENE GENERATOR

    A set of routines is shared, here termed scene generator, that can spatialize a bank of sound samples using the SRIRs and noise recordings of this library, to emulate scenes for the two target formats. The code is similar to the one used to generate the TAU-NIGENS Spatial Sound Events 2021 dataset, and has been ported to Python from the original version written in Matlab.

    The generator can be found here, along with more details on its use.

    The generator at the moment is set to work with the NIGENS sound event sample database, and the FSD50K sound event database, but additional sample banks can be added with small modifications.

    The dataset together with the generator has been used by the authors in the following public challenges:

    • DCASE 2019 Challenge Task 3, to generate the TAU Spatial Sound Events 2019 dataset (development/evaluation)

    • DCASE 2020 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2020 dataset

    • DCASE2021 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2021 dataset

    • DCASE2022 Challenge Task 3, to generate additional SELD synthetic mixtures for training the task baseline

    NOTE: The current version of the generator is work-in-progress, with some code being quite "rough". If something does not work as intended or it is not clear what certain parts do, please contact us.

    DATASET STRUCTURE

    The dataset contains a folder of the SRIRs (TAU-SRIR_DB), with all the SRIRs per room in a single MAT file. The file rirdata.mat contains some general information such as sample rate, format specifications, and most importantly the DOAs of every extracted SRIR. The file measinfo.mat contains measurement and recording information in each room. Finally, the dataset contains a folder of spatial ambient noise recordings (TAU-SNoise_DB), with one subfolder per room having two audio recordings fo the spatial ambience, one for each format, FOA or MIC. For more information on how to SRIRs and DOAs are organized, check the README.

    DOWNLOAD

    The files TAU-SRIR_DB.z01, ..., TAU-SRIR_DB.zip contain the SRIRs and measurement info files.

    The files TAU-SNoise_DB.z01, ..., TAU-SNoise_DB.zip

  16. Z

    Public Utility Data Liberation Project (PUDL) Data Release

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653158
    Explore at:
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Catalyst Cooperative
    Authors
    Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL v2025.2.0 Data Release

    This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.

    One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.

    Some potentially breaking changes to be aware of:

    In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.

    We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.

    Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.

    New Data

    EIA 176

    Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.

    Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.

    EIA 860

    Added EIA 860 Multifuel table. See #3438 and #3946.

    FERC 1

    Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:

    out_ferc1_yearly_detailed_income_statements

    out_ferc1_yearly_detailed_balance_sheet_assets

    out_ferc1_yearly_detailed_balance_sheet_liabilities

    SEC Form 10-K Parent-Subsidiary Ownership

    We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.

    See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:

    out_sec10k_parents_and_subsidiaries

    core_sec10k_quarterly_filings

    core_sec10k_quarterly_exhibit_21_company_ownership

    core_sec10k_quarterly_company_information

    Expanded Data Coverage

    EPA CEMS

    Added 2024 Q4 of CEMS data. See #4041 and #4052.

    EPA CAMD EIA Crosswalk

    In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.

    The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.

    EIA 860M

    Added EIA 860m through December 2024. See #4038 and #4047.

    EIA 923

    Added EIA 923 monthly data through September 2024. See #4038 and #4047.

    EIA Bulk Electricity Data

    Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.

    EIA 930

    Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.

    Bug Fixes

    Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.

    Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.

    Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.

    Quality of Life Improvements

    We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.

    Other PUDL v2025.2.0 Resources

    PUDL v2025.2.0 Data Dictionary

    PUDL v2025.2.0 Documentation

    PUDL in the AWS Open Data Registry

    PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/

    PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/

    Zenodo archive of the PUDL GitHub repo for this release

    PUDL v2025.2.0 release on GitHub

    PUDL v2025.2.0 package in the Python Package Index (PyPI)

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

    Follow us on GitHub

    Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter

    GitHub Discussions is where we provide user support.

    Watch our GitHub Project to see what we're working on.

    Email us at hello@catalyst.coop for private communications.

    On Mastodon: @CatalystCoop@mastodon.energy

    On BlueSky: @catalyst.coop

    On Twitter: @CatalystCoop

    Connect with us on LinkedIn

    Play with our data and notebooks on Kaggle

    Combine our data with ML models on HuggingFace

    Learn more about us on our website: https://catalyst.coop

    Subscribe to our announcements list for email updates.

  17. C

    China CN: Generator & Generator Set: Total Liability

    • ceicdata.com
    Updated Dec 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2020). China CN: Generator & Generator Set: Total Liability [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-total-liability
    Explore at:
    Dataset updated
    Dec 15, 2020
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 1, 2014 - Oct 1, 2015
    Area covered
    China
    Variables measured
    Economic Activity
    Description

    China Generator & Generator Set: Total Liability data was reported at 299.834 RMB bn in Oct 2015. This records an increase from the previous number of 294.039 RMB bn for Sep 2015. China Generator & Generator Set: Total Liability data is updated monthly, averaging 181.089 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 299.834 RMB bn in Oct 2015 and a record low of 20.835 RMB bn in Dec 2003. China Generator & Generator Set: Total Liability data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.

  18. C

    China CN: Generator & Generator Set: No of Employee: Average

    • ceicdata.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2024). China CN: Generator & Generator Set: No of Employee: Average [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-no-of-employee-average
    Explore at:
    Dataset updated
    Dec 15, 2024
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2012 - Dec 1, 2013
    Area covered
    China
    Variables measured
    Economic Activity
    Description

    China Generator & Generator Set: Number of Employee: Average data was reported at 246.196 Person th in Dec 2013. This records an increase from the previous number of 212.926 Person th for Dec 2012. China Generator & Generator Set: Number of Employee: Average data is updated monthly, averaging 151.600 Person th from Dec 2003 (Median) to Dec 2013, with 64 observations. The data reached an all-time high of 246.196 Person th in Dec 2013 and a record low of 69.115 Person th in Dec 2003. China Generator & Generator Set: Number of Employee: Average data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.

  19. C

    China CN: Generator & Generator Set: YoY: Account Receivable

    • ceicdata.com
    Updated Sep 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2020). China CN: Generator & Generator Set: YoY: Account Receivable [Dataset]. https://www.ceicdata.com/en/china/motor-generator-and-generator-set/cn-generator--generator-set-yoy-account-receivable
    Explore at:
    Dataset updated
    Sep 15, 2020
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 1, 2014 - Oct 1, 2015
    Area covered
    China
    Variables measured
    Economic Activity
    Description

    China Generator & Generator Set: YoY: Account Receivable data was reported at 12.139 % in Oct 2015. This records an increase from the previous number of 11.472 % for Sep 2015. China Generator & Generator Set: YoY: Account Receivable data is updated monthly, averaging 27.840 % from Jan 2006 (Median) to Oct 2015, with 89 observations. The data reached an all-time high of 87.380 % in Mar 2011 and a record low of -7.849 % in May 2013. China Generator & Generator Set: YoY: Account Receivable data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIA: Motor: Generator and Generator Set.

  20. v

    Global import data of Generator

    • volza.com
    csv
    Updated Oct 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global import data of Generator [Dataset]. https://www.volza.com/imports-global/global-import-data-of-generator-from-martinique
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 31, 2025
    Dataset authored and provided by
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
    Description

    12 Global import shipment records of Generator with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yvan Brito (2019). data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning [Dataset]. http://doi.org/10.17632/2j3hg4j6tc.1

data for: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning

Explore at:
Dataset updated
Mar 12, 2019
Authors
Yvan Brito
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data model to generate datasets used in the tests of the article: Synthetic Datasets Generator for Testing Techniques and Tools of Information Visualization and Machine Learning.

Search
Clear search
Close search
Google apps
Main menu