19 datasets found
  1. Fake Employee Dataset

    • kaggle.com
    zip
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oyekanmi Olamilekan (2023). Fake Employee Dataset [Dataset]. https://www.kaggle.com/datasets/oyekanmiolamilekan/fake-employee-dataset
    Explore at:
    zip(162874 bytes)Available download formats
    Dataset updated
    Nov 20, 2023
    Authors
    Oyekanmi Olamilekan
    Description

    Creating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.

    Code Url: https://github.com/intellisenseCodez/faker-data-generator

  2. Parameters for the logistic regression model to predict Name Generator ties....

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke J. Matthews; Peter DeWan; Elizabeth Y. Rula (2023). Parameters for the logistic regression model to predict Name Generator ties. [Dataset]. http://doi.org/10.1371/journal.pone.0055234.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Luke J. Matthews; Peter DeWan; Elizabeth Y. Rula
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *This field indicates a dummy variable was also included. If a data point for the row variable was a 0, the dummy took on a value of 1. Otherwise the dummy was 0. Row variables with blank entries did not exhibit over-dispersion of zeros and so did not require dummy variables.†Variable was log transformed to better meet generalized linear model assumptions.

  3. Finance Dataset by Faker Library

    • kaggle.com
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza Obaydallah (2024). Finance Dataset by Faker Library [Dataset]. https://www.kaggle.com/datasets/hamzazaki/finance-dataset-by-faker-library
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hamza Obaydallah
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9365842%2F5d270d8701f4dc2687f0ae193ee018ae%2F20-Best-Finance-Economic-Datasets-for-Machine-Learning-Social.jpg?generation=1708443878634431&alt=media" alt=""> Finance dataset with fake information such as transaction ID, date, amount, currency, description, category, merchant, customer, city, and country. It can be used for educational purposes as well as for testing.

    This script generates a dataset with fake information such as name, email, phone number, address, date of birth, job, and company. Adjust the num_rows variable to specify the number of rows you want in your dataset. Finally, the dataset is saved to a CSV file named fake_dataset.csv. You can modify the fields or add additional fields according to your requirements.

    `

    Define the number of rows for your dataset

    num_rows = 15000

    Generate fake finance data

    data = { 'Transaction_ID': [fake.uuid4() for _ in range(num_rows)], 'Date': [fake.date_time_this_year() for _ in range(num_rows)],

    'Amount': [round(random.uniform(10, 10000), 2) for _ in range(num_rows)],
    'Currency': [fake.currency_code() for _ in range(num_rows)],
    'Description': [fake.bs() for _ in range(num_rows)],
    'Category': [random.choice(['Food', 'Transport', 'Shopping', 'Entertainment', 'Utilities']) for _ in range(num_rows)],
    'Merchant': [fake.company() for _ in range(num_rows)],
    'Customer': [fake.name() for _ in range(num_rows)],
    'City': [fake.city() for _ in range(num_rows)],
    'Country': [fake.country() for _ in range(num_rows)]
    

    }

    Create a DataFrame

    df = pd.DataFrame(data)

    Save the DataFrame to a CSV file

    df.to_csv('finance_dataset.csv', index=False)

    Display the DataFrame

    df.head()`

  4. Synthetic E-Commerce Relational Datasets

    • kaggle.com
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nael Aqel (2025). Synthetic E-Commerce Relational Datasets [Dataset]. https://www.kaggle.com/datasets/naelaqel/synthetic-e-commerce-relational-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nael Aqel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Synthetic E-Commerce Relational Dataset

    This dataset is synthetically generated fake data designed to simulate a realistic e-commerce environment.

    Purpose

    To provide large-scale relational datasets for practicing database operations, analytics, and testing tools like DuckDB, Pandas, and SQL engines. Ideal for benchmarking, educational projects, and data engineering experiments.

    Entity Relationship Diagram (ERD) - Tables Overview

    1. Customers

    • customer_id (int): Unique identifier for each customer
    • name (string): Customer full name
    • email (string): Customer email address
    • gender (string): Customer gender ('Male', 'Female', 'Other')
    • signup_date (date): Date customer signed up
    • country (string): Customer country of residence

    2. Products

    • product_id (int): Unique identifier for each product
    • product_name (string): Name of the product
    • category (string): Product category (e.g., Electronics, Books)
    • price (float): Price per unit
    • stock_quantity (int): Available stock count
    • brand (string): Product brand name

    3. Orders

    • order_id (int): Unique identifier for each order
    • customer_id (int): ID of the customer who placed the order (foreign key to Customers)
    • order_date (date): Date when order was placed
    • total_amount (float): Total amount for the order
    • payment_method (string): Payment method used (Credit Card, PayPal, etc.)
    • shipping_country (string): Country where the order is shipped

    4. Order Items

    • order_item_id (int): Unique identifier for each order item
    • order_id (int): ID of the order this item belongs to (foreign key to Orders)
    • product_id (int): ID of the product ordered (foreign key to Products)
    • quantity (int): Number of units ordered
    • unit_price (float): Price per unit at order time

    5. Product Reviews

    • review_id (int): Unique identifier for each review
    • product_id (int): ID of the reviewed product (foreign key to Products)
    • customer_id (int): ID of the customer who wrote the review (foreign key to Customers)
    • rating (int): Rating score (1 to 5)
    • review_text (string): Text content of the review
    • review_date (date): Date the review was written

    Visual EDR

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9179978%2F7681afe8fc52a116ff56a2a4e179ad19%2FEDR.png?generation=1754741998037680&alt=media" alt="">

    Notes

    • All data is randomly generated using Python’s Faker library, so it does not reflect any real individuals or companies.
    • The data is provided in both CSV and Parquet formats.
    • The generator script is available in the accompanying GitHub repository for reproducibility and customization.

    Output

    The script saves two folders inside the specified output path:

    csv/    # CSV files
    parquet/  # Parquet files
    

    License

    MIT License

    References

  5. c

    /GluGluToHToZZTo4L_M-550_7TeV-minloHJJ-pythia6-tauola/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM...

    • opendata.cern.ch
    Updated 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CMS collaboration (2016). /GluGluToHToZZTo4L_M-550_7TeV-minloHJJ-pythia6-tauola/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM [Dataset]. http://doi.org/10.7483/OPENDATA.CMS.VPJH.JZHB
    Explore at:
    Dataset updated
    2016
    Dataset provided by
    CERN Open Data Portal
    Authors
    CMS collaboration
    Description

    Simulated dataset GluGluToHToZZTo4L_M-550_7TeV-minloHJJ-pythia6-tauola in AODSIM format for 2011 collision data (SM Higgs)

    See the description of the simulated dataset names in: About CMS simulated dataset names.

    These simulated datasets correspond to the collision data collected by the CMS experiment in 2011.

  6. G

    AI-Generated Product Naming Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). AI-Generated Product Naming Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-generated-product-naming-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI-Generated Product Naming Market Outlook



    According to our latest research, the AI-Generated Product Naming market size reached USD 612.4 million in 2024, reflecting a robust adoption curve across industries worldwide. With a compound annual growth rate (CAGR) of 17.8% from 2025 to 2033, the market is forecasted to attain a value of USD 2,183.6 million by 2033. The principal growth factor driving this expansion is the increasing demand for rapid, creative, and data-driven branding solutions that can keep pace with product proliferation and global market entry.




    The primary growth driver for the AI-Generated Product Naming market is the exponential rise in product launches across diverse sectors, especially in retail, FMCG, and technology. As businesses strive to differentiate themselves in saturated markets, the need for unique, memorable, and linguistically appropriate product names has intensified. AI-powered naming solutions leverage natural language processing, machine learning, and big data analytics to generate names that resonate with target audiences, are culturally sensitive, and are optimized for search engines. This capability not only accelerates time-to-market but also minimizes the risk of legal or cultural missteps, making AI-based naming indispensable for global enterprises and startups alike.




    Another significant factor contributing to the market’s growth is the shift towards digitalization and automation in branding processes. Traditional product naming often involves lengthy brainstorming sessions, focus groups, and iterative testing, leading to time delays and increased costs. AI-Generated Product Naming tools streamline these workflows by instantly generating hundreds of name options that can be filtered by language, tone, industry relevance, and domain availability. The integration of AI solutions with branding agencies’ and enterprises’ existing marketing stacks further enhances efficiency and enables data-driven decision-making. This technological advancement is particularly valuable in highly competitive sectors such as pharmaceuticals and technology, where speed and compliance are critical.




    Furthermore, the increasing investment in artificial intelligence and machine learning technologies by both established companies and innovative startups is fueling the development of more sophisticated and context-aware naming solutions. These platforms are becoming adept at understanding brand values, target demographics, and even emotional triggers, resulting in names that are not only creative but also strategically aligned with broader marketing goals. As AI algorithms evolve, their ability to generate names that pass linguistic, legal, and SEO checks will only improve, further solidifying their role in the product development lifecycle.




    From a regional perspective, North America currently dominates the AI-Generated Product Naming market, accounting for the largest share due to its advanced technological infrastructure, high adoption rate of AI-powered marketing tools, and the presence of leading branding agencies and multinational companies. Europe follows closely, driven by its vibrant FMCG and e-commerce sectors, while Asia Pacific is emerging as the fastest-growing region, propelled by the rapid digital transformation of retail and consumer goods industries in China, India, and Southeast Asia. Latin America and the Middle East & Africa are also witnessing steady growth, supported by increasing entrepreneurial activity and digitalization efforts.





    Component Analysis



    The Component segment of the AI-Generated Product Naming market is bifurcated into Software and Services. The software sub-segment encompasses AI-powered platforms and tools that autonomously generate product names based on user inputs, industry context, and linguistic guidelines. These solutions are increasingly leveraging advanced natural language generation and deep learning algorithms to produce names that are no

  7. H

    Replication Data for: The Dynamics of Partisan Identification when Party...

    • dataverse.harvard.edu
    • dataone.org
    Updated Sep 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2018). Replication Data for: The Dynamics of Partisan Identification when Party Brands Change: The Case of the Workers Party in Brazil [Dataset]. http://doi.org/10.7910/DVN/XSCFX5
    Explore at:
    docx(12684), application/x-stata-syntax(44213), doc(1231860), tsv(53049450), text/plain; charset=us-ascii(338338)Available download formats
    Dataset updated
    Sep 24, 2018
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Replications materials for "The Dynamics of Partisan Identification when Party Brands Change: The Case of the Workers Party in Brazil" "Two-City, Six-Wave Panel Survey, Brazil" (2002, 2004, 2006). Sample: Representative samples of (1) Caxias do Sul, Rio Grande do Sul and (2) Juiz de Fora, Minas Gerais. Topic areas: Neighborhood quality of life, worst problems, economic assessments, political participation, media and campaign attention, civil society and neighborhood involvement, political discussion frequency, trust in government and institutions, vote choice, core values, interpersonal persuasion, feeling thermometers of groups and politicians, party identification, ideology, candidate trait assessments, candidate ideological and issues placement, issues self-placement, evaluation of Lula's government, political knowledge, discussant name generator. Sample size: About 25,000 interviews. Special features: Interviews with named political discussants, 100 interviews per neighborhood.

  8. SPIDER (v2): Synthetic Person Information Dataset for Entity Resolution

    • figshare.com
    csv
    Updated Oct 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praveen Chinnappa; Rose Mary Arokiya Dass; yash mathur (2025). SPIDER (v2): Synthetic Person Information Dataset for Entity Resolution [Dataset]. http://doi.org/10.6084/m9.figshare.30472712.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Praveen Chinnappa; Rose Mary Arokiya Dass; yash mathur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPIDER (v2) – Synthetic Person Information Dataset for Entity Resolution provides researchers with ready-to-use data for benchmarking Duplicate or Entity Resolution algorithms. The dataset focuses on person-level fields typical in customer or citizen records. Since real-world person-level data is restricted due to Personally Identifiable Information (PII) constraints, publicly available synthetic datasets are limited in scope, volume, or realism.SPIDER addresses these limitations by providing a large-scale, realistic dataset containing first name, last name, email, phone, address, and date of birth (DOB) attributes. Using the Python Faker library, 40,000 unique synthetic person records were generated, followed by 10,000 controlled duplicate records derived using seven real-world transformation rules. Each duplicate record is linked to its original base record and rule through the fields is_duplicate_of and duplication_rule.Version 2 introduces major realism and structural improvements, enhancing both the dataset and generation framework.Enhancements in Version 2New cluster_id column to group base and duplicate records for improved entity-level benchmarking.Improved data realism with consistent field relationships:State and ZIP codes now match correctly.Phone numbers are generated based on state codes.Email addresses are logically related to name components.Refined duplication logic:Rule 4 updated for realistic address variation.Rule 7 enhanced to simulate shared accounts among different individuals (with distinct DOBs).Improved data validation and formatting for address, email, and date fields.Updated Python generation script for modular configuration, reproducibility, and extensibility.Duplicate Rules (with real-world use cases)Duplicate record with a variation in email address.Use case: Same person using multiple email accounts.Duplicate record with a variation in phone numbers.Use case: Same person using multiple contact numbers.Duplicate record with last-name variation.Use case: Name changes or data entry inconsistencies.Duplicate record with address variation.Use case: Same person maintaining multiple addresses or moving residences.Duplicate record with a nickname.Use case: Same person using formal and informal names (Robert → Bob, Elizabeth → Liz).Duplicate record with minor spelling variations in the first name.Use case: Legitimate entry or migration errors (Sara → Sarah).Duplicate record with multiple individuals sharing the same email and last name but different DOBs.Use case: Realistic shared accounts among family members or households (benefits, tax, or insurance portals).Output FormatThe dataset is available in both CSV and JSON formats for direct use in data-processing, machine-learning, and record-linkage frameworks.Data RegenerationThe included Python script can be used to fully regenerate the dataset and supports:Addition of new duplication rulesRegional, linguistic, or domain-specific variationsVolume scaling for large-scale testing scenariosFiles Includedspider_dataset_v2_6_20251027_022215.csvspider_dataset_v2_6_20251027_022215.jsonspider_readme_v2.mdSPIDER_generation_script_v2.pySupportingDocuments/ folder containing:benchmark_comparison_script.py – script used for derive F-1 score.Public_census_data_surname.csv – sample U.S. Census name and demographic data used for comparison.ssa_firstnames.csv – Social Security Administration names dataset.simplemaps_uszips.csv – ZIP-to-state mapping data used for phone and address validation.

  9. DoH-Gen-F-CCDDD

    • data.niaid.nih.gov
    Updated Feb 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeřábek, Kamil; Hynek, Karel; Čejka, Tomáš; Ryšavý, Ondřej (2022). DoH-Gen-F-CCDDD [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5957420
    Explore at:
    Dataset updated
    Feb 5, 2022
    Dataset provided by
    CESNEThttp://www.cesnet.cz/
    FIT BUT
    FIT CTU
    Authors
    Jeřábek, Kamil; Hynek, Karel; Čejka, Tomáš; Ryšavý, Ondřej
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of DNS over HTTPS traffic from Firefox (Comcast, CZNIC, DNSForge, DSNSB, DOHli) The dataset contains DoH and HTTPS traffic that was captured in a virtualized environment (Docker) and generated automatically by Firefox browser with enabled DoH towards 5 different DoH servers (Comcast, CZNIC, DNSForge, DSNSB, DOHli) and a web page loads towards a sample of web pages taken from Majestic Million dataset. The data are provided in the form of PCAP files. However, we also provided TLS enriched flow data that are generated with opensource ipfixprobe flow exporter. Other than TLS related information is not relevant since the dataset comprises only encrypted TLS traffic. The TLS enriched flow data are provided in the form of CSV files with the following columns:

        Column Name
        Column Description
    
    
    
    
        DST_IP
        Destination IP address
    
    
        SRC_IP
        Source IP address
    
    
        BYTES
        The number of transmitted bytes from Source to Destination
    
    
        BYTES_REV
        The number of transmitted bytes from Destination to Source
    
    
        TIME_FIRST
        Timestamp of the first packet in the flow in format YYYY-MM-DDTHH-MM-SS
    
    
        TIME_LAST
        Timestamp of the last packet in the flow in format YYYY-MM-DDTHH-MM-SS
    
    
        PACKETS
        The number of packets transmitted from Source to Destination
    
    
        PACKETS_REV
        The number of packets transmitted from Destination to Source
    
    
        DST_PORT
        Destination port
    
    
        SRC_PORT
        Source port
    
    
        PROTOCOL
        The number of transport protocol
    
    
        TCP_FLAGS
        Logic OR across all TCP flags in the packets transmitted from Source to Destination
    
    
        TCP_FLAGS_REV
        Logic OR across all TCP flags in the packets transmitted from Destination to Source
    
    
        TLS_ALPN
        The Value of Application Protocol Negotiation Extension sent from Server
    
    
        TLS_JA3
        The JA3 fingerprint
    
    
        TLS_SNI
        The value of Server Name Indication Extension sent by Client
    

    The DoH resolvers in the dataset can be identified by IP addresses written in doh_resolver_ip.csv file.

    The main part of the dataset is located in DoH-Gen-F-CCDDD.tar.gz and has the following structure:

    . └─── data | - Main directory with data └── generated | - Directory with generated captures ├── pcap | - Generated PCAPs │ └── firefox └── tls-flow-csv | - Generated CSV flow data └── firefox

    Total stats of generated data:

        Name
        Value
    
    
    
    
        Total Data Size
        40.2 GB
    
    
        Total files
        10
    
    
        DoH extracted tls flows
        ~100 K
    
    
        Non-DoH extracted tls flows
        ~315 K
    

    DoH Server information

        Name
        Provider
        DoH query url
    
    
    
    
        Comcast
        https://corporate.comcast.com
        https://doh.xfinity.com/dns-query
    
    
        CZNIC
        https://www.nic.cz
        https://odvr.nic.cz/doh
    
    
        DNSForge
        https://dnsforge.de
        https://dnsforge.de/dns-query
    
    
        DNSSB
        https://dns.sb/doh/
        https://doh.dns.sb/dns-query
    
    
        DOHli
        https://doh.li
        https://doh.li/dns-query
    
  10. c

    /MinBias_TuneD6T_2760GeV_pythia6/HiWinter13-STARTHI53_V26-v1/GEN-SIM-RECO

    • opendata.cern.ch
    Updated 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CMS Collaboration (2023). /MinBias_TuneD6T_2760GeV_pythia6/HiWinter13-STARTHI53_V26-v1/GEN-SIM-RECO [Dataset]. http://doi.org/10.7483/OPENDATA.CMS.LRVU.HHYP
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    CERN Open Data Portal
    Authors
    CMS Collaboration
    Description

    Simulated dataset MinBias_TuneD6T_2760GeV_pythia6 in GEN-SIM-RECO format for 2013 collision data.

    See the description of the simulated dataset names in: About CMS simulated dataset names.

    These simulated datasets correspond to the pp collision data, needed as reference data for heavy-ion data analysis, at energy 2.76TeV collected by the CMS experiment in 2013.

  11. 100 TeV pp collisions, Exotics type, PYTHIA8 generator:...

    • osti.gov
    Updated Nov 14, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HepSim Monte Carlo Event Repository, Argonne National Laboratory (ANL) (2016). 100 TeV pp collisions, Exotics type, PYTHIA8 generator: tev100pp_qstar_pythia8_mbins_slim [Dataset]. http://doi.org/10.34664/1575510
    Explore at:
    Dataset updated
    Nov 14, 2016
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    United States Department of Energyhttp://energy.gov/
    Argonne National Laboratory (ANL), Argonne, IL (United States)
    HepSim Monte Carlo Event Repository, Argonne National Laboratory (ANL)
    Description

    Excited Fermions in mass range pT=5-40 TeV. 10000 events per file, 100 files per mass. Compositeness scale (Lambda) is set to the mass of the fermion, so the width is expected to be small (see the log files for details) Cross sections are included in the log files (mass dependent) Note that data are slimmed (see the log file). How to decode name: Name: tev100_pythia8_qstar_m[MASS]_[NUMBER] where [MASS] is generator-level mass as given below: Mass bins (in GeV) m[1]=5000 m[2]=10000 ......... How to use: To get a sample with a given mass, use "glob" regular expressions. Slimmed as: Particle records are slimmed (all stable with pT>0.3 GeV) and (PID=5 || PID=6) or PID>22 && PID<38) or PID>10 && PID<17).

  12. Z

    Albero study: a longitudinal database of the social network and personal...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maya Jariego, Isidro; Holgado Ramos, Daniel; Alieva, Deniza (2021). Albero study: a longitudinal database of the social network and personal networks of a cohort of students at the end of high school [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3532047
    Explore at:
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    Management Development Institute of Singapore in Tashkent
    Universidad de Sevilla
    Authors
    Maya Jariego, Isidro; Holgado Ramos, Daniel; Alieva, Deniza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT

    The Albero study analyzes the personal transitions of a cohort of high school students at the end of their studies. The data consist of (a) the longitudinal social network of the students, before (n = 69) and after (n = 57) finishing their studies; and (b) the longitudinal study of the personal networks of each of the participants in the research. The two observations of the complete social network are presented in two matrices in Excel format. For each respondent, two square matrices of 45 alters of their personal networks are provided, also in Excel format. For each respondent, both psychological sense of community and frequency of commuting is provided in a SAV file (SPSS). The database allows the combined analysis of social networks and personal networks of the same set of individuals.

    INTRODUCTION

    Ecological transitions are key moments in the life of an individual that occur as a result of a change of role or context. This is the case, for example, of the completion of high school studies, when young people start their university studies or try to enter the labor market. These transitions are turning points that carry a risk or an opportunity (Seidman & French, 2004). That is why they have received special attention in research and psychological practice, both from a developmental point of view and in the situational analysis of stress or in the implementation of preventive strategies.

    The data we present in this article describe the ecological transition of a group of young people from Alcala de Guadaira, a town located about 16 kilometers from Seville. Specifically, in the “Albero” study we monitored the transition of a cohort of secondary school students at the end of the last pre-university academic year. It is a turning point in which most of them began a metropolitan lifestyle, with more displacements to the capital and a slight decrease in identification with the place of residence (Maya-Jariego, Holgado & Lubbers, 2018).

    Normative transitions, such as the completion of studies, affect a group of individuals simultaneously, so they can be analyzed both individually and collectively. From an individual point of view, each student stops attending the institute, which is replaced by new interaction contexts. Consequently, the structure and composition of their personal networks are transformed. From a collective point of view, the network of friendships of the cohort of high school students enters into a gradual process of disintegration and fragmentation into subgroups (Maya-Jariego, Lubbers & Molina, 2019).

    These two levels, individual and collective, were evaluated in the “Albero” study. One of the peculiarities of this database is that we combine the analysis of a complete social network with a survey of personal networks in the same set of individuals, with a longitudinal design before and after finishing high school. This allows combining the study of the multiple contexts in which each individual participates, assessed through the analysis of a sample of personal networks (Maya-Jariego, 2018), with the in-depth analysis of a specific context (the relationships between a promotion of students in the institute), through the analysis of the complete network of interactions. This potentially allows us to examine the covariation of the social network with the individual differences in the structure of personal networks.

    PARTICIPANTS

    The social network and personal networks of the students of the last two years of high school of an institute of Alcala de Guadaira (Seville) were analyzed. The longitudinal follow-up covered approximately a year and a half. The first wave was composed of 31 men (44.9%) and 38 women (55.1%) who live in Alcala de Guadaira, and who mostly expect to live in Alcala (36.2%) or in Seville (37.7%) in the future. In the second wave, information was obtained from 27 men (47.4%) and 30 women (52.6%).

    DATE STRUCTURE AND ARCHIVES FORMAT

    The data is organized in two longitudinal observations, with information on the complete social network of the cohort of students of the last year, the personal networks of each individual and complementary information on the sense of community and frequency of metropolitan movements, among other variables.

    Social network

    The file “Red_Social_t1.xlsx” is a valued matrix of 69 actors that gathers the relations of knowledge and friendship between the cohort of students of the last year of high school in the first observation. The file “Red_Social_t2.xlsx” is a valued matrix of 57 actors obtained 17 months after the first observation.

    The data is organized in two longitudinal observations, with information on the complete social network of the cohort of students of the last year, the personal networks of each individual and complementary information on the sense of community and frequency of metropolitan movements, among other variables.

    In order to generate each complete social network, the list of 77 students enrolled in the last year of high school was passed to the respondents, asking that in each case they indicate the type of relationship, according to the following values: 1, “his/her name sounds familiar"; 2, "I know him/her"; 3, "we talk from time to time"; 4, "we have good relationship"; and 5, "we are friends." The two resulting complete networks are represented in Figure 2. In the second observation, it is a comparatively less dense network, reflecting the gradual disintegration process that the student group has initiated.

    Personal networks

    Also in this case the information is organized in two observations. The compressed file “Redes_Personales_t1.csv” includes 69 folders, corresponding to personal networks. Each folder includes a valued matrix of 45 alters in CSV format. Likewise, in each case a graphic representation of the network obtained with Visone (Brandes and Wagner, 2004) is included. Relationship values range from 0 (do not know each other) to 2 (know each other very well).

    Second, the compressed file “Redes_Personales_t2.csv” includes 57 folders, with the information equivalent to each respondent referred to the second observation, that is, 17 months after the first interview. The structure of the data is the same as in the first observation.

    Sense of community and metropolitan displacements

    The SPSS file “Albero.sav” collects the survey data, together with some information-summary of the network data related to each respondent. The 69 rows correspond to the 69 individuals interviewed, and the 118 columns to the variables related to each of them in T1 and T2, according to the following list:

     • Socio-economic data.
    
    
     • Data on habitual residence.
    
    
     • Information on intercity journeys.
    
    
     • Identity and sense of community.
    
    
     • Personal network indicators.
    
    
     • Social network indicators.
    

    DATA ACCESS

    Social networks and personal networks are available in CSV format. This allows its use directly with UCINET, Visone, Pajek or Gephi, among others, and they can be exported as Excel or text format files, to be used with other programs.

    The visual representation of the personal networks of the respondents in both waves is available in the following album of the Graphic Gallery of Personal Networks on Flickr: .

    In previous work we analyzed the effects of personal networks on the longitudinal evolution of the socio-centric network. It also includes additional details about the instruments applied. In case of using the data, please quote the following reference:

    Maya-Jariego, I., Holgado, D. & Lubbers, M. J. (2018). Efectos de la estructura de las redes personales en la red sociocéntrica de una cohorte de estudiantes en transición de la enseñanza secundaria a la universidad. Universitas Psychologica, 17(1), 86-98. https://doi.org/10.11144/Javeriana.upsy17-1.eerp

    The English version of this article can be downloaded from: https://tinyurl.com/yy9s2byl

    CONCLUSION

    The database of the “Albero” study allows us to explore the co-evolution of social networks and personal networks. In this way, we can examine the mutual dependence of individual trajectories and the structure of the relationships of the cohort of students as a whole. The complete social network corresponds to the same context of interaction: the secondary school. However, personal networks collect information from the different contexts in which the individual participates. The structural properties of personal networks may partly explain individual differences in the position of each student in the entire social network. In turn, the properties of the entire social network partly determine the structure of opportunities in which individual trajectories are displayed.

    The longitudinal character and the combination of the personal networks of individuals with a common complete social network, make this database have unique characteristics. It may be of interest both for multi-level analysis and for the study of individual differences.

    ACKNOWLEDGEMENTS

    The fieldwork for this study was supported by the Complementary Actions of the Ministry of Education and Science (SEJ2005-25683), and was part of the project “Dynamics of actors and networks across levels: individuals, groups, organizations and social settings” (2006 -2009) of the European Science Foundation (ESF). The data was presented for the first time on June 30, 2009, at the European Research Collaborative Project Meeting on Dynamic Analysis of Networks and Behaviors, held at the Nuffield College of the University of Oxford.

    REFERENCES

    Brandes, U., & Wagner, D. (2004). Visone - Analysis and Visualization of Social Networks. In M. Jünger, & P. Mutzel (Eds.), Graph Drawing Software (pp. 321-340). New York: Springer-Verlag.

    Maya-Jariego, I. (2018). Why name generators with a fixed number of alters may be a pragmatic option for personal network analysis. American Journal of

  13. Invoices Dataset

    • kaggle.com
    zip
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cankat Saraç (2022). Invoices Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/invoices
    Explore at:
    zip(574249 bytes)Available download formats
    Dataset updated
    Jan 18, 2022
    Authors
    Cankat Saraç
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.

  14. MyMart: A Comprehensive Sales Dataset

    • kaggle.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dave Darshan (2024). MyMart: A Comprehensive Sales Dataset [Dataset]. https://www.kaggle.com/datasets/davedarshan/mymart-a-comprehensive-sales-dataset
    Explore at:
    zip(277198 bytes)Available download formats
    Dataset updated
    Apr 8, 2024
    Authors
    Dave Darshan
    Description

    This data is artificially generated. It can be used for practicing data visualization and analysis skills. Please note that since the data is generated randomly, it may not reflect real-world sales data accurately. However, it should serve as a good starting point for practicing data analysis and visualization.

    Description :

    Sales Date: This column contains the date of each sale. The dates are generated for a period of 120 days starting from January 1, 2023. • Category: This column contains the category of the product sold. The categories include ‘Electronics’, ‘Clothing’, and ‘Home & Kitchen’. • Subcategory: This column contains the subcategory of the product sold. Each category has its own set of subcategories. For example, the ‘Electronics’ category includes subcategories such as ‘Communication’, ‘Computers’, and ‘Wearables’. • ProductName: This column contains the name of the product sold. Each subcategory has its own set of products. For example, the ‘Communication’ subcategory includes products such as ‘Walkie Talkie’, ‘Cell Phone’, and ‘Smart Phone’. • Salesperson: This column contains the name of the salesperson who made the sale. There are different salespersons assigned to each category. • Gender: This column contains the gender of the salesperson. The gender is determined based on the salesperson’s name. • Unit sold: This column contains the number of units of the product sold in the sale. The number of units sold is a random number between 1 and 100. • Original Price: This column contains the original price of the product. The original price is a random number between 10 and 1000. • Sales Price: This column contains the sales price of the product. The sales price is calculated as a random fraction of the original price, ensuring that the sales price is always slightly higher than the original price.

    For information on 'How to generate a dataset', click here.

  15. 70 Small Business Ideas to Start in 2025

    • kaggle.com
    zip
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnthonyTherrien (2025). 70 Small Business Ideas to Start in 2025 [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/70-small-business-ideas-to-start-in-2025
    Explore at:
    zip(1165 bytes)Available download formats
    Dataset updated
    Aug 2, 2025
    Authors
    AnthonyTherrien
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    📊 70 Small Business Ideas to Start in 2025

    This dataset features a curated list of 70 small business ideas that are relevant and potentially profitable for aspiring entrepreneurs in 2025. Each entry includes the business name, difficulty rating, and its corresponding category to help users analyze and choose ideas based on their interests, expertise, and available resources.

    📁 Dataset Structure

    Filename: small_business_ideas_2025.csv

    Column NameDescription
    NameThe name or title of the small business idea
    DifficultyEstimated difficulty level to start the business (Low, Medium, or High)
    CategoryThe general type of service or industry (e.g., Financial Services, Creative Work, Manual Labor, etc.)

    🗂 Categories Overview

    The dataset covers diverse categories including:

    • 💼 Financial Services
    • 🛠️ Manual Labor
    • 🎨 Creative Work
    • 🏘️ Property & Real Estate
    • 📈 Planning & Coaching
    • 🍽️ Hospitality
    • 🐾 Other Services
    • 🌐 Online Business

    🎯 Use Cases

    • 📚 Market research for startup consultants
    • 💡 Inspiration for new entrepreneurs
    • 🤖 Training data for idea recommendation models
    • 📊 Exploratory data analysis (EDA) on industry trends
    • 📝 Project or portfolio ideas for business/data students

    🏁 Sample Preview

    NameDifficultyCategory
    Accounting and Tax ServicesHighFinancial Services
    Dog WalkingLowOther Services
    Web DevelopmentHighCreative Work
    Food TruckHighHospitality
  16. WikiSQL (Questions and SQL Queries)

    • kaggle.com
    zip
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). WikiSQL (Questions and SQL Queries) [Dataset]. https://www.kaggle.com/datasets/thedevastator/dataset-for-developing-natural-language-interfac
    Explore at:
    zip(21491264 bytes)Available download formats
    Dataset updated
    Nov 25, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    WikiSQL (Questions and SQL Queries)

    80654 hand-annotated questions and SQL queries on 24241 Wikipedia tables

    By Huggingface Hub [source]

    About this dataset

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to develop natural language interfaces for relational databases. The data fields are the same among all splits, and the file contains information on the phase, question, table, and SQL for each interface

    Research Ideas

    • This dataset can be used to develop natural language interfaces for relational databases.
    • This dataset can be used to develop a knowledge base of common SQL queries.
    • This dataset can be used to generate a training set for a neural network that translates natural language into SQL queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: train.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    File: test.csv | Column name | Description | |:--------------|:---------------------------------------------------------| | phase | The phase of the data collection. (String) | | question | The question asked by the user. (String) | | table | The table containing the data for the question. (String) | | sql | The SQL query corresponding to the question. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  17. c

    /Pyquen_DiJet_Pt30_TuneZ2_Unquenched_Hydjet1p8_2760GeV/HiFall13DR53X-NoPileUp_STARTHI53_LV1-v2/GEN-SIM-RECO...

    • opendata.cern.ch
    Updated 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CMS Collaboration (2023). /Pyquen_DiJet_Pt30_TuneZ2_Unquenched_Hydjet1p8_2760GeV/HiFall13DR53X-NoPileUp_STARTHI53_LV1-v2/GEN-SIM-RECO [Dataset]. http://doi.org/10.7483/OPENDATA.CMS.67F0.HMSJ
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    CERN Open Data Portal
    Authors
    CMS Collaboration
    Description

    Simulated dataset Pyquen_DiJet_Pt30_TuneZ2_Unquenched_Hydjet1p8_2760GeV in GEN-SIM-RECO format for 2013 collision data.

    See the description of the simulated dataset names in: About CMS simulated dataset names.

    These simulated datasets correspond to the PbPb collision data at energy 2.76TeV collected by the CMS experiment during Run1.

  18. c

    /MinBias_TuneZ2star_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM

    • opendata.cern.ch
    • opendata-dev.cern.ch
    Updated 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CMS collaboration (2018). /MinBias_TuneZ2star_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM [Dataset]. http://doi.org/10.7483/OPENDATA.CMS.07II.3X1D
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    CERN Open Data Portal
    Authors
    CMS collaboration
    Description

    Simulated pile-up event dataset MinBias_TuneZ2star_8TeV-pythia6 in GEN-SIM format. Events were sampled from this dataset and added to simulated data to make them comparable with the 2012 collision data, see the guide to pile-up simulation.

    See the description of the simulated dataset names in: About CMS simulated dataset names.

  19. c

    /QCD_Pt_460_TuneZ2_5p02TeV/HiWinter13-pp_STARTHI53_V25-v1/GEN-SIM-RECO

    • opendata.cern.ch
    Updated 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CMS Collaboration (2023). /QCD_Pt_460_TuneZ2_5p02TeV/HiWinter13-pp_STARTHI53_V25-v1/GEN-SIM-RECO [Dataset]. http://doi.org/10.7483/OPENDATA.CMS.SSH2.4DL8
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    CERN Open Data Portal
    Authors
    CMS Collaboration
    Description

    Simulated dataset QCD_Pt_460_TuneZ2_5p02TeV in GEN-SIM-RECO format for 2013 collision data.

    See the description of the simulated dataset names in: About CMS simulated dataset names.

    These simulated datasets correspond to the pPb collision data at energy 5.02TeV collected by the CMS experiment in 2013.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Oyekanmi Olamilekan (2023). Fake Employee Dataset [Dataset]. https://www.kaggle.com/datasets/oyekanmiolamilekan/fake-employee-dataset
Organization logo

Fake Employee Dataset

Generating fake or synthetic data

Explore at:
zip(162874 bytes)Available download formats
Dataset updated
Nov 20, 2023
Authors
Oyekanmi Olamilekan
Description

Creating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.

Code Url: https://github.com/intellisenseCodez/faker-data-generator

Search
Clear search
Close search
Google apps
Main menu